DumPy: NumPy except it’s OK if you’re dum
34 points by WeetHet
34 points by WeetHet
So, let’s do this:
- Bring back the syntax of loops and indices.
- But don’t actually execute the loops. Just take the syntax and secretly compile it into vectorized operations.
- Also, let’s get rid of all the insanity that’s been added to NumPy because loops were slow.
This is so similar to DuckDBs philosophy of never actually writing vectorized code. Only write code using loops that a compiler can easily vectorize.
DuckDB does not use explicit SIMD (single instruction, multiple data) instructions because they greatly complicate portability and compilation. Instead, DuckDB uses implicit SIMD, where we go to great lengths to write our C++ code in such a way that the compiler can auto-generate SIMD instructions for the specific hardware. As an example why this is a good idea, it took 10 minutes to port DuckDB to the Apple Silicon architecture.
This approach – which I like! vectorizing stuff often hurts my brain – reminds me a bit of ArangoDB’s AQL, which is their SQL variant (focused on graphs and documents) that similarly makes the looping explicit.
This looks a bit like the halide bindings for python
What about the joy of [None, :, ..., None]
puzzles? Just kidding, this looks neat.
When vectorizing operations makes my brain hurt too much (or is impossible) I found numba to be of great help. It looks like taichi is somewhat similar but I have not tried. If adding a runtime dependency is out of the way, cython is an option too.
This is great, I’ve always wanted my program to have a DumPy
/j
This is quite beautiful, and this implicit indexing totally opens my mind to some interesting APIs.
I would definitely go to a conference talk that just talks about implementing this. Very very very cool
This looks like it’s mostly Einstein summation notation? Definitely neat for some problems!
The author says that they think this can be done in Julia and they’re right: