Premature Optimization is Fun Sometimes
59 points by invlpg
59 points by invlpg
I think the day I no longer find exercises like this one fun to think about is the day I need to get out of programming.
Premature optimisation is fun all the time!
It's dealing with the consequences afterwards when you've learned why that optimisation was premature that usually isn't fun...
Exactly! I have to stop myself from doing it because I know it could come back to bite me, but I do it because it is fun.
I’m a bit confused by the 43 bits for timestamp: it sounds to me like 24 bits should be enough.
With the talk of a 512-element ring buffer, I presume it’s sending a new ping every two seconds, and tracking the most recent 17 minutes and 4 seconds of pings.
So, step one: use delta encoding relative to an ideal timer/sequence. You can easily determine when a packet should be sent based on a set last-sent time (which increments by two seconds) and the index in the ring buffer. Then, write whether it was sent exactly on time, 0.1ms late, 2.3ms late, &c.
Then, for time elapsed, I don’t suppose you need to go beyond 17 minutes and 4 seconds, since the ping will be expired after that; that’s 512 × 2s = 10240000 × 100μs, about 23.3 bits required at that precision. Round up to 24 if you like, though you may still be able to use the remaining invalid bit patterns (~6536216 of them!) for other things.
As a bonus, 24 bits allows you to increase the “sent” precision a lot, to reduce quantisation error—at microsecond precision, a ping can be sent up to 16 seconds late, which should be oodles (right?).
I can’t comment on the performance effects of reducing the sample from 64 bits to 48, whether it would help or hinder. Wouldn’t surprise me if the results were different across x86 and ARM 32- and 64-bit.
I can’t comment on the performance effects of reducing the sample from 64 bits to 48, whether it would help or hinder. Wouldn’t surprise me if the results were different across x86 and ARM 32- and 64-bit.
AArch64, AArch32 (since ARMv5, I think) and modern x86-64 all have bitfield extract / insert instructions, even without that they're small.
But the original size would fit very easily in the data cache of even a moderately old processor, so I doubt the memory saving would make a difference.
David, you're almost certainly right on all accounts (as always; I love your comments and have no idea how you manage to be so prolific both in the comment section here and in the real world).
This was a fun (and pointless) exercise to try minimize memory usage one bored Friday, that then turned into (in the aftermath) an attempt to reduce instruction count on Armv7—generated by an old clang that we cannot update yet (yay embedded!)
Premature optimization is honestly one of my favorite things when designing a system, or working in lower level systems languages, at the very least I have the hope that it will save me time (and memory!) in the future, the midway is that it gives me a slightly worse headache of figuring out why it was done that way, and the worst (and sometimes for the better) is that it stops me from even working the project at all o because the optimization work is such a big task when designing - "ugh, this is so convoluted why do any of it anyway?" closes program