Even Faster asin() Was Staring Right At Me

3 points by nemin


Previously: https://lobste.rs/s/bunmdv/faster_asin_was_hiding_plain_sight

Corbin

This was a disappointingly incurious followup. I'm glad that they're satisfied, but they're leaving a lot of accuracy on the table. Read previous discussion for more links.

I can explain at least one of the patterns here. The author's focus on performance doesn't include the fact that sometimes generating and fixing up a bad approximation takes more time than generating a good approximation. This is a mirror of a problem throughout industry where capitalization and horizontal scaling, being an effective way to turn investors' liquid resources into outputs, are chosen as the only way to improve productivity. At scale, choosing germane examples, this turns into cheap pseudo-GPUs for evaluating quantized LLM weights, or Pixar's farm of supersamplers which spends about a million-fold extra effort per pixel; in both cases, the maths for cheaply computing a good approximation isn't legible to capital. (I do not think that the author has the tendency of capitalists to therefore prefer slow methods in order to raise the cost of competition.)

Another pattern is the focus on orthodoxy. Fourier analysis is great, but Chebyshev analysis is also great. Optimizing this sort of polynomial approximation to transcendental functions can be done with either flavor of decomposition, but it turns out that Chebyshev analysis is the preferable approach because Taylor series are slow to converge and Remez exchange exists. It's not mentioned anywhere in a mainstream four-year degree and a student wouldn't know to elect for numerical methods; worse, the maths likely scares them away. I also think that their specific wording about "an approximation of arcsine, not the actual method" indicates a fundamental misunderstanding of IEEE 754 and what a transcendental function is; thanks to the table-maker's dilemma, our methods are necessarily approximate. This latter fact is taught in a standard course but easily forgotten or dismissed as nuance.

Finally, treating the standard library as a black box is good practice for correctness but bad practice for performance. LLVM's asin() helpers have Taylor polynomials with too many terms, bracketed by error-handling cases. Fixing up the bad approximation takes extra iterations and has conditional branching. The main reason that switching to any five-term approximation is faster is that it does less work and doesn't require as many branch predictions. This is a standard tradeoff that arises when implementing a standard library for users who may care about correctness; the implication is that performance-savvy users will make their own correctness tradeoffs. To be fair, the author did that in the first post and I like their graphs.

Sorry if I sound sour that they didn't take my advice. That's fine, but they didn't take the advice when it appeared on Reddit either, nor when it appeared on HN, including four top-level comments, one of which also recommended lolremez. Multiple folks doing numerical methods have endorsed these tools and it's surprising that the author didn't even mention or consider them.