When float division beats integer division

9 points by nrposner

bitshift

Very counterintuitive, very fun!

It's interesting they got a speedup on both Intel and AMD. I wonder if there's something fundamental about floating point that makes division easier. Obviously exponents are easier, but the harder part is the mantissa.

My best guess is because floating point allows iterative methods like Newton-Raphson, which Wikipedia calls "Fast division methods". Whereas for integer, you'd have to use SRT division which is "slow"?

You're always doing integer math (pointer arithmetic), so it would seem like that choosing integer math would load the int math part of the cpu, while if you used floats for the actual data you could use more of the silicon to get the job done

I really like this idea: it's a cool fact and I want it to be the explanation—and it probably is the explanation for different benchmarks. But I don't think it explains why an individual IDIVQ instruction has a worse official throughput than DIVSD.

fanf

When I was experimenting with divisionless random numbers a few years ago, I noticed that Apple’s CPUs have much faster integer division than x86 CPUs. This masters thesis says Apple’s performance cores have a throughput of 2/2, and the efficiency cores have 7/21, which is much more like typical x86 performance. (Compare 0.5/0.5 and 1/1 for multiplication.) (It isn’t clear to me what the two numbers mean: I couldn’t find where the thesis explains them.)

So I think it’s just a matter of where CPU designers decide to spend their silicon budget: float division is (I guess) relatively more common than integer division. Compilers are reasonably good at avoiding integer division, and libdivide helps speed up repeated division by the same runtime value, which tends to take the pressure off optimizing hardware division. But clearly Apple think it’s worth the effort to make it fast; I wonder what software made it worthwhile.

olliej

It’s worth noting that when performing floating point division the dominant cost is the integer division of the significand. I wonder how much of the performance win is “fp math gets more division hardware” (e.g fp division is much more likely to be perf sensitive than integer div), and how much is the reduced width of the division?