The fastest way to detect a vowel in a string

31 points by azhenley


0x2ba22e11

This matches old Python performance lore. Specifically: replacing an explicit loop with a call to a C function that contains exactly the same loop (like numpy style vectorisation) works for string processing too.

JordiGH

Haha, the primes example is ridiculous. It reminds me of Gödel numbering, likely the inspiration.

Also, using Wilson’s theorem for primality testing, haha, this is hilarious (because it’s just about the worst possible way to test for primes, but it’s fine for such small primes).

gmorling

Was hoping to see some cool SIMD trick actually. Is this possible with Python?

sknebel

Matches my experience. Years ago in uni I was trying to performance-tweak the Python microformats2 parser because it was a big chunk of the processing time in brid.gy, and using regex (and sometimes combining multiple regex and then using simple checks to decide which it was) made a difference in many cases. Although overall “parsing HTML into Python objects” was the big whammy of course, at some point optimizing Python is not the most efficient use of your time…

olliej

Regex being the fastest is exactly what I’d expect - I don’t know why the author would assume similar performance to the for loop. Fundamentally this is comparing the performance of a for loop in Python to a for loop in C (I can’t recall which regex engine Python uses).

This is similar to those articles that compare “Python” performance to C by comparing a naive C implementation of something (matrix multiplies or similar) to Python code that … calls a library written in C and assembly specifically for the purpose of numeric processing.

There’s a real gap in understanding by some people when comparing Python perf that remains confusing to me - the same gap does not seem to exist in the JS community (there’s much more understanding of when something is implemented in something other than js even if it is being called from js)

frontsideair

I wonder if it can be made faster with magic numbers and bitmasking. A loop is necessary and since the check is static, it can be micro optimized. Even moving “e”s to the start can improve performance for most real world cases.

dsr

Falsehoods people believe about English: