When Vectorized Arrays Aren't Enough
9 points by itamarst
9 points by itamarst
Note the author is using "vectorized" in the Python sense (data structures that allow batch operations) rather than the low-level sense (SIMD).
… but the Numpy and compiled Rust implementations are vectorized with SIMD. In particular, see the part about np.show_config() and the surprising performance inversion between Numpy with ASIMD and Rust with basic NEON.
I did intend the title to mean 'vectorized' in the Python/R sense, since I was aiming the post at scientific Python users who vaguely know that NumPy does something called 'vectorized operations' but don't know the details. Of course, in this particular toy case the operations are also SIMD vectorized, but we can easily conceive of a case where this does not occur. The term is just ambiguous/overloaded.
Yeah, like the pow() example. I don’t think there’s any unusual ambiguity with Python or R: it’s true in most languages that high-level vector programming interfaces might or might not compile to SIMD instructions depending on the capabilities of the compiler and the target hardware. The important thing is the vectorized programming model that allows the compiler to optimize down to vector hardware; the programming model can be vectorized even if the target is not.
Author here, glad people are finding this interesting! This post and its followup are both being merged into an in-progress notebook that examines this more rigorously and expansively. My original post doesn't give NumExpr and JAX nearly enough attention. Still a WIP, but hopefully of interest.
One note: using Numba with the same NumPy code doesn't actually help, unlike JAX. Numba shines when you do the same thing you're doing in Rust, writing explicit for loops to replace the full array operations.