Why Object of Arrays (SoA pattern) beat interleaved arrays: a JavaScript performance rabbit hole
24 points by aapoalas
24 points by aapoalas
As you figured out, the cost of for-loop iterations dominates this microbenchmark. You can massively improve performance by using a while-loop instead:
let sumInterleavedWhile = 0;
const startInterleavedWhile = performance.now();
for (let iter = 0; iter < 10; iter++) {
let i = 0;
while (i < ARRAY_SIZE) {
sumInterleavedWhile +=
interleaved[i] +
interleaved[i + 1] +
interleaved[i + 2];
i += 3;
}
}
const timeInterleavedWhile = performance.now() - startInterleavedWhile;
This trick also significantly speeds up the SoA approach (but not AoS):
❯ node main.js
AoS: 745.95ms
AoS (while): 748.82ms
SoA: 427.49ms
SoA (while): 411.45ms
Interleaved: 662.16ms
Interleaved (while): 185.67ms
Your interleaved (while) takes a third the time because it does only a third the iterations. ARRAY_SIZE is the number of items in the array, but you use that as the bounds for i which you increment by 3 each iteration so you're only traversing a third the iterleaved array before stopping. The original code also iterates on ARRAY_SIZE but increments i by just 1 per iteration (and multiplies it by 3 in order to get the array index).
If you iterate until ARRAY_SIZE * 3 to traverse the entire array, you end up with a much more modest improvement, which you pretty much get by changing the original code from multiplying i by 3 on every iteration to iterating on every third element of ARRAY_SIZE * 3 in the for loop.
Also funny story, the sum is sufficiently far outside the integer range of float64 (it ends up around 2^56) that batching the increments breaks it:
for (let i = 0; i < ARRAY_SIZE; i += 4) {
sumSoA += soa.x[i] + soa.y[i] + soa.z[i]
+ soa.x[i+1] + soa.y[i+1] + soa.z[i+1]
+ soa.x[i+2] + soa.y[i+2] + soa.z[i+2]
+ soa.x[i+3] + soa.y[i+3] + soa.z[i+3];
}
is 20% faster than the original SoA but yields a result that's quite a bit off (at ARRAY_SIZE = 10_000_000 it's still fine, and a 4x unroll increases throughput by ~25%)
Incidentally just unrolling the loop (without further batching the addition) has a fractionally negative effect on runtime.
Nice catch, I was wondering why the parent commenter's interleaved while case would somehow be massively better!