Floating-Point Printing and Parsing Can Be Simple And Fast
50 points by rsc
50 points by rsc
There has been a lot of performance improvement on Żmij in recent weeks. I would be interested to see how these two algorithms compare!
It seems for printing (usually the slower part), this work uses the same 2 bytes at a time "00 01 02.." (spaces for readability) 200 byte kind of array. (I'm not sure who first observed this optimization.) While Russ probably knows this, the article does not dwell on how much of L1/L2/L3 CPU cache will be available. As with so many things, that is really "CPU/Systems-Complete" (after "NP-Complete").
However, it bears note that if you are willing to use 3000 bytes, the loop stride can be 3 at a time. 4 digit bunches (40e3 bytes) can do 4 at a time. 5 digits (500e3 bytes) can do 5 at a time, etc. E.g., 3 digits is "000 001 002...". These larger values are probably more than your 32/48 kB L1 CPU cache, but perhaps still in "CPU private" memory of 256..2048 kB L2.
So, if you are willing to hog a little more of the small, fast memory, you can do fewer loops, trading off fast memory for loop count. Of course, there is no way to know (often even at run-time) how much of a CPU cache hog your code should be. This is a big way in which "doing-nothing-else microbenchmarks" can mislead, actually. So, there is no perfect answer here. As usual, CPUs, OSes, and PLangs/run-times will duke it out. It seems Go's CPU package still does not standardize how a program should even query core CPU characteristics like cache sizes. I've never had good insight into why something like that never caught on, but it probably relates to "working at all" vs. "working well" (and trusting the CPU (or the compiler (or ...))).
If your PLang has good compile-time computation/pre-computation capability like Nim, it is very easy (and perhaps among the easiest examples?) to just build this array (or even all relevant arrays) at compile-time, e.g. like this little Nim thing used to format "uncertain numbers" (aka any "measurement") according to (generalized) particle data group (PDG) guidelines. If you parameterize your formatting nicely enough then there is no reason to messily muck about with strings after the fact. It would not be hard to run-time dispatch depending on cache size and "how much of the computer do we devote to formatting at one time" heuristics.
Of course, for saving in files or transmitting over the network where perfect round-trips to/from the marshalling format are desired, it's more efficient to just stay in binary, leaving formatting and parsing for "debugging modes" which often takes little more than a trivial filename extension syntax { Two's complement, the 8-bit byte, and IEEE single/double have all won. So, the only wrinkle is endian-ness, but I'm starting to think the writing on the wall is for little endian to also have won, but for network packet header type stuff. } Formally, this can change the scaling to zero-copy constant time (or at least to memcpy() time which is often almost as good). If you must use ASCII, hex floats are an option, but I'm sure Russ also knows that (and Go has hexfloats since 1.13). People just don't like to read hex. Anyway, debugging/human friendly and computer friendly things have always been somewhat at odds. Of course, people also don't like to read a bunch of meaningless precision (hence the PDG guidelines), but there are different population tolerances for different things. :-)
Wow I like the comprehensive review of prior art -- I put this in my queue to read!
But FWIW most of the internal links in the table of contents don't work (tried on 2 browsers):
Only these work:
https://research.swtch.com/fp#numbers
https://research.swtch.com/fp#scale
But links like these don't:
https://research.swtch.com/fp#unround
https://research.swtch.com/fp#fixedwidth
Yeah these don't match:
<a class=anchor href="#fixed-width_printing"><h2 id="fixed-width_printing">Fixed-Width Printing</h2>
Also maybe just me, but if I see "simple" then I skim to find the code first. The parsing does seem simple:
func ParseText(s []byte) (f float64, ok bool) {
The printing is presumably Fmt(), but that is in the "Performance" section. For presentation purposes I may like to get a teaser of all the functions you need for printing, and how long they are (in the same spirit as the table of contents)
i.e. at least for this audience, I think having the code first and later the math the justify it could help. It does look short!
Thanks for pointing out the bad internal anchors. Will fix shortly. As for the code, what you found is actually the least important code in the article. The code I would highlight starts in the "Fixed-Width Printing" section. You can also click on the file name in any of the code displays and just read the whole file on GitHub. :-)
Ah yes this is what I was looking for: https://github.com/rsc/fpfmt/blob/blog1/fpfmt.go
Yes I didn't realize the lower right of each code example links to the file
And OK I see Short() is the core of the algorithm, while Fmt() is a text wrapper