When Impressive Performance Gains Do Not Matter
54 points by lalitm
54 points by lalitm
But many times I have seen engineers disappointed when they improve a single stage by many orders of magnitude only to see it have no effect on the overall throughput.
Worth mentioning Amdahl's law here.
Is there a glossary for all the various "laws" that exist around compsci in general? Because I just keep eharing about new ones that I never knew existed and then forgetting them again because they are often times niche enough to not come up regularly. By which I do not mean to say that they are not valid useful bits of generational knowledge.
There is also a Wikipedia category it seems: https://en.wikipedia.org/wiki/Category:Computer_architecture_statements
You mean this? https://deviq.com/laws/
Or this: https://github.com/dwmkerr/hacker-laws
(not affiliated)
But why was someone optimizing a part of the system that takes up an insignificant slice of the time pie? Was it poor guidance? Was it poor performance tooling? Surely people don't set out consciously to make minimal impact. It's usually a bigger issue.
Curses, I wrote a detailed example of one path to this happening.
The gist is: you're working on one problem, while working on it, you see some function show up as significant in the sample. After looking briefly you see that it easy to make a real improvement on the implementation. But you're working on some else so you just think to yourself, I'll deal with that later.
When later comes around, you remember you put off fixing the easy thing that showed up in the profile, so start working on it. Even if it turns out the fix is non-trivial tunnel vision is something that happens, and we like to solve puzzles/problems, so we end up spending a bunch of time on it.
But of course that actual result is irrelevant: the performance problem is minor, you only saw it because it showed up as significant within the context of what you were looking at. That might simply mean it was proportionally significant, but the absolute times weren't relevant, or it might be that you're blocked on io not cpu (note I mean the io is concurrent with the computation in this case).
@kghose said below that @peter's example sounded like nerd sniping - I think my example kind of falls into that bucket as well -- "auto-nerd-snipe" maybe?
Could be nerd sniping if the joy of making the improvement dominates.
The premature optimization rule is a pretty good rule. Optimized code is often ugly, violating a bunch of good practices like modularity and DRY, so it’s good to keep it contained.
No idea. One time at Google someone proposed an optimization to part of our pipeline, and multiple people told him the thing he was optimizing was less than 0.1% of the pipeline’s compute, so the gains were by definition bounded to less than 0.1%. He ignored everyone, did the optimization anyway, and then was baffled when the overall gains were less than 0.1%.
The thing was also obviously, intuitively not that expensive to anyone with domain expertise, but we helped him measure the real performance cost anyway to dissuade him from wasting his time on the project.
That instance sounds like someone got nerd sniped. Perhaps it was just awesome to optimize that bit of code. But it’s unprofessional since the impact was quantitatively clear.
They may guess or have internalized “best practices” that they always apply. They may have seen a real problem in a limited circumstance and overgeneralized. Or they may have a misleading benchmark.
Not all systems have an easy way to show the cost of every method/process in production.
All those explanations are more or less dysfunctional. Some of them are more about the individual, some are more about the environment.
IME there's a few possible reasons:
You're a talented baby engineer who's been trained to solve problems, but not to evaluate which problems are worth solving. You attack every "inefficiency" you find with no regard for its end-to-end impact. If you're lucky someone teaches you better; if you're not, you stall out as the kind of senior engineer PMs keep on a tight leash.
You don't have good visibility into end-to-end performance. Maybe there's degradation somewhere in the system and management is on your ass about it but you can't get a mandate to implement proper monitoring, so you take a guess and tackle different components until something works. Then there's no urgency anymore and it's back to shipping user-facing features for you.
You know this is not a problem at the moment, but you're worried it might become one soon. If you have enough cachet (or overtime) to push it through, you won't have any immediate visible impact. Great engineers will nonetheless sometimes burn some of their political capital on this kind of project if they believe it will be important enough in the future. Worse ones will burn all of it - see point 1.
First, contrary to popular belief, many systems aren’t formed of a few bottlenecks. The most popular programming practices tend to be more or less uniformly slow (OOP with a crapton of pointers and an unholy amount of heap allocations using the general allocator comes to mind). Anything you fix anywhere is an insignificant slice of the time pie. Worst case, nothing short of a complete rewrite can fix this. Though if you’re lucky the rewrite may be done incrementally.
Second, even if you have only two bottlenecks, if they’re within one order of magnitude of each other, then fixing the worst one perfectly won’t even get you an order of magnitude. And in many cases, as the article says, you need to blow way past that to get a significant benefit.
Third, the pipeline thing: a computer is a distributed systems running in parallel: a couple CPUs, the GPU, disks, Ethernet… The speed of your process is limited by the slowest stage of your pipeline. Fix that stage, and you’re limited by the next slowest. Worst case, you have more than one slowest stages, and fixing only one will net you absolutely nothing.
Now that was the charitable explanations I could come up with. Sometimes we just get caught up in the optimisation game and lost track of priorities. Or we just mess up.
I personally fell into this trap a few days ago. I was debugging why Warp Terminal was slow, taking over a second to start. I asked an agent to either add timing statements throughout my .zshrc or run each step in my .zshrc to see how long it takes. I improved performance by ~90%, going from 1.6s to 0.2s.
Then I opened a new Warp Terminal. It still felt slow. Turns out Warp wraps my .zshrc with some of its own magic, and my company uses CloudStrike, which pegs my CPU at 100% for a moment.
So, for me, I thought I found the smoking gun because 1.6s in my .zshrc felt wrong. But it's only a part of the problem.
It's often unclear how a system will behave. What parts are parallel? What parts are serial? What parts are applying backpressure? Is slowness in one part of the system a symptom or a root cause? Oftentimes, it's faster to experiment by optimizing something away and observing how the system behaves.
Of course, even if the users don't notice, it is still good to reduce computing time for software, as it can reduce costs and make scaling easier.
That depends on the complexity cost: if making the code faster comes with excessive complexity you get a bunch of downstream consequences that hurt performance in the long run (leaving issues of maintainability, etc aside): basically the code complexity means a structural change now requires what would become unnecessary, but to understand that, or to remove it, requires complete understanding of the complexity.
Basically: “it goes faster” should not be treated as a get-out-jail-free card for complexity or maintainability concerns.
Basically: “it goes faster” should not be treated as a get-out-jail-free card for complexity or maintainability concerns.
Here’s one such example with added security implications: https://www.phoronix.com/news/Linux-7.2-MD5-Generic-Only
Haha, a two for one: non trivial implementation of something that used to be used for security :D
I am in this boat right now and we are doing multiple (small) projects where we are seeing 5 minute queries reducing to under 30 seconds. I tell the team it’s not good enough in the long term - but it is definitely an improvement and does have a big impact.
First, of course, for the customer, the wait goes from infuriating to simply annoying.
But my current focus is overall performance, not per user performance. When you optimise dozens of 5/10/30 minute processes, you significantly reduce contention with other parts of the system. 10 minutes is a very long time to be smashing the database. Everything goes faster and everyone benefits.