The fastest way to detect a vowel in a string
31 points by azhenley
31 points by azhenley
This matches old Python performance lore. Specifically: replacing an explicit loop with a call to a C function that contains exactly the same loop (like numpy style vectorisation) works for string processing too.
Haha, the primes example is ridiculous. It reminds me of Gödel numbering, likely the inspiration.
Also, using Wilson’s theorem for primality testing, haha, this is hilarious (because it’s just about the worst possible way to test for primes, but it’s fine for such small primes).
Was hoping to see some cool SIMD trick actually. Is this possible with Python?
Only similarly to the regex part, by calling into a native library doing it (I’m guessing something from the scientific ecosystem would have it, Numpy or pandas?). Python itself doesn’t expose anything like that.
Matches my experience. Years ago in uni I was trying to performance-tweak the Python microformats2 parser because it was a big chunk of the processing time in brid.gy, and using regex (and sometimes combining multiple regex and then using simple checks to decide which it was) made a difference in many cases. Although overall “parsing HTML into Python objects” was the big whammy of course, at some point optimizing Python is not the most efficient use of your time…
Regex being the fastest is exactly what I’d expect - I don’t know why the author would assume similar performance to the for loop. Fundamentally this is comparing the performance of a for loop in Python to a for loop in C (I can’t recall which regex engine Python uses).
This is similar to those articles that compare “Python” performance to C by comparing a naive C implementation of something (matrix multiplies or similar) to Python code that … calls a library written in C and assembly specifically for the purpose of numeric processing.
There’s a real gap in understanding by some people when comparing Python perf that remains confusing to me - the same gap does not seem to exist in the JS community (there’s much more understanding of when something is implemented in something other than js even if it is being called from js)
I wonder if it can be made faster with magic numbers and bitmasking. A loop is necessary and since the check is static, it can be micro optimized. Even moving “e”s to the start can improve performance for most real world cases.
magic numbers and bitmasking
This is what I was expecting. Maybe I’ve read too many of that kind of article.
After .lower() and subtracting 65 you get 0, 4, 8, 14, 20 or 000000, 000100, 001000, 001110, 010100. Kind of interesting. You can rule out any byte that is odd. I guess you still have to compare every byte to something, so the comparison would have to be very fast to make up for the overhead and the whole thing would be worthwhile only for very long strings. Since you are looking at English language strings I doubt any of this would matter because you’d find a vowel very quickly.
Falsehoods people believe about English:
Do you believe everything you said here about English or would any other human language also fit here?
I see a lot of folklore about “haha, English is so nonsensical,” but just about any human language is (and also, they’re not, linguistics couldn’t exist if human languages didn’t exhibit remarkable regularity). I wonder if you hold the same beliefs about other languages.
Hmm.
English has heuristics, not rules. For every rule of English, there is at least one reasonable exception. The prescriptivist view is simply wrong.
There are languages where spelling is perfectly regularized, so that if you can read a word, you know how to pronounce it. There are languages with authorities trying to control their usage (notably L’Académie française for French) with some degree of success.
Do I believe what I said about English? Ja, you betcha.
By that standard, all languages have “heuristics not rules”. That’s just how languages work. But at that point, you’re just using “heuristic” as a new word for “rule”.
Sure, some languages might have more regular spelling or top-down prescriptivist systems, but (a) spelling is only part of a language, (b) those languages still have exceptions, and (c) authorities that determine how a language gets used exist in every language - how else do you think schools work?
A better list of falsehoods would probably begin: “some languages are uniquely special snowflakes and that factoid you know about them is definitely true”.
How do I think schools work? Well, for English, the subject in question, the answer is “badly, inaccurately, and often without even straightforward presentation of the more accurate heuristics”. It is a testament to students, not to pedagogical techniques, that so many of them end up reasonably literate.
There is no authority which determines how English works.
I can’t tell which code snippets correspond to which timing results: the headings in the article don’t match the labels in the tables and the function names in the snippets are all the same instead of matching the labels.
The image at the top looks like it should scroll to reveal the cropped text, but it doesn’t. The bytecode seems unrelated to the python so I am not sure why they are shown next to each other. Confusing.
just reiterates how incredibly slow python truly is. Anything it does fast is actually done in C
I really want a graph or two for this.
I graphed the data in a Google Doc.
A little analysis:
I’d try making the vowels set a global constant and use in with the simple for loop, wonder if that would make much of a difference. Probably wouldn’t beat regex, though, which is mildly annoying, because I like sets XD
Python was never meant as a “write it in Pure This” language, but as a so-called glue language, like the shell itself. If you want low-level, ignoring all internationalization niceties and character classification databases, as per the article, Cython makes this easy. On i7-1370P, otherwise idle Linux I get about 27 microseconds with this Cython:
cdef int first(const char *s, int n, char *set):
cdef int i # Location of first in `set`
for 0 <= i < n:
if set[<int>(s[i])] == 1: return i
return -1 # ^^^-- to silence [char] warning
cdef char set[256] # Only 4 64B cache lines
from libc.string cimport memset
memset(&set[0], 0, 256) # Set up a char set
set[ord('a'[0])] = 1; set[ord('e'[0])] = 1
set[ord('i'[0])] = 1; set[ord('o'[0])] = 1
set[ord('u'[0])] = 1# set[ord('y'[0])] = 1
from sys import argv, exit; from time import time
cdef bytes av = bytes(argv[1], encoding="utf8") \
if len(argv)>1 else b''
cdef const char *s = av
cdef int n = len(av)
t0 = time()
cdef int spot = first(s, n, set)
print(time() - t0, " for spot ", spot)
and test code for a 100_000 prefix (L2 CPU cache, basically):
cycc first.pyx; test=$(repeat 100000 printf h)i; repeat 5 ./first $test
with output:
2.7179718017578125e-05 for spot 100000
2.7179718017578125e-05 for spot 100000
2.7179718017578125e-05 for spot 100000
3.0994415283203125e-05 for spot 100000
2.6941299438476562e-05 for spot 100000
Also, if you are going to do multiple runs for something like this, best to use the min(N) { 5 here } with more details over at https://github.com/c-blake/bu/blob/main/doc/tim.md
EDIT: This is about 1 CPU clock cycle per input byte. As for the claim at the end of TFA, the fastest way likely involves SIMD, much as with strlen
or its generalization memchr
for only 5 specific bytes, probably doing a little better depending upon whether you need 1-pass or 5-pass, if strings fit in CPU caches, memory bandwidth, and yadda-yadda. For example, I believe memchr achieves better than 5B/cycle in L1 cache on most modern CPUs.