Why Object of Arrays (SoA pattern) beat interleaved arrays: a JavaScript performance rabbit hole

24 points by aapoalas

As you figured out, the cost of for-loop iterations dominates this microbenchmark. You can massively improve performance by using a while-loop instead:

    let sumInterleavedWhile = 0;
    const startInterleavedWhile = performance.now();
    for (let iter = 0; iter < 10; iter++) {
        let i = 0;
        while (i < ARRAY_SIZE) {
            sumInterleavedWhile +=
                interleaved[i] +
                interleaved[i + 1] +
                interleaved[i + 2];
            i += 3;
        }
    }
    const timeInterleavedWhile = performance.now() - startInterleavedWhile;

This trick also significantly speeds up the SoA approach (but not AoS):

❯ node main.js
AoS:         745.95ms
AoS (while): 748.82ms
SoA:         427.49ms
SoA (while): 411.45ms
Interleaved: 662.16ms
Interleaved (while): 185.67ms

masklinn
Your interleaved (while) takes a third the time because it does only a third the iterations. ARRAY_SIZE is the number of items in the array, but you use that as the bounds for i which you increment by 3 each iteration so you're only traversing a third the iterleaved array before stopping. The original code also iterates on ARRAY_SIZE but increments i by just 1 per iteration (and multiplies it by 3 in order to get the array index).

If you iterate until ARRAY_SIZE * 3 to traverse the entire array, you end up with a much more modest improvement, which you pretty much get by changing the original code from multiplying i by 3 on every iteration to iterating on every third element of ARRAY_SIZE * 3 in the for loop.

Also funny story, the sum is sufficiently far outside the integer range of float64 (it ends up around 2^56) that batching the increments breaks it:
```
                for (let i = 0; i < ARRAY_SIZE; i += 4) {
                    sumSoA += soa.x[i] + soa.y[i] + soa.z[i]
                     + soa.x[i+1] + soa.y[i+1] + soa.z[i+1]
                     + soa.x[i+2] + soa.y[i+2] + soa.z[i+2]
                     + soa.x[i+3] + soa.y[i+3] + soa.z[i+3];
                }
```
is 20% faster than the original SoA but yields a result that's quite a bit off (at ARRAY_SIZE = 10_000_000 it's still fine, and a 4x unroll increases throughput by ~25%)

Incidentally just unrolling the loop (without further batching the addition) has a fractionally negative effect on runtime.
- alexjurkiewicz
  
  ha ha... whoops...! Thanks for catching it
- aapoalas
  
  Nice catch, I was wondering why the parent commenter's interleaved while case would somehow be massively better!