Tail Latency Might Matter More Than You Think (2021)
20 points by wofo
20 points by wofo
This didn’t turn out to be “more important than I think”, since I think quite a lot about tail latency as is, but is decent advice regardless.
I tend to take into consideration worst case latency across the board (what if every component happens to act in the worst way possible). If you can make that scenario be in the realm of acceptability, you don’t have to worry about latency after this point.
This article is all about parallel fanout and has some tricks to address it:
The Tail at Scale (2013) - https://cacm.acm.org/research/the-tail-at-scale/
(I think it somehow influenced TailScale the company, though I’m a bit fuzzy on that)
I love this article. It’s pretty wild to look at how helpful these insights remain after more than a decade for anyone working up to large scale systems. I’ve found hedged requests baked into some horizontally scalable databases by default lately (i.e. ScyllaDB) and other strategies are being adopted at protocol levels of things like HTTP/3. Lot of timeless knowledge in there.
Yup there is a lot of knowledge there. One thing that is not widely known is that Google picked up many skilled engineers and researchers from Digital / Compaq research in the early 2000’s. Including the two authors:
https://en.wikipedia.org/wiki/Jeff_Dean
https://en.wikipedia.org/wiki/Luiz_Andr%C3%A9_Barroso
And including the people who built Alta Vista, the best search engine before Google:
https://en.wikipedia.org/wiki/Louis_Monier
https://en.wikipedia.org/wiki/Michael_Burrows_(computer_scientist)
And I’m pretty sure Mike Burrow’s Chubby was the first widely deployed distributed consensus system, based on Lamport’s Paxos. These days people reference Raft / etcd etc. (ZooKeeper is of a similar vintage, though I don’t remember the details at the moment)
Jeff along with Sanjay Ghemawat built a surprising portion of Google’s early systems, including the search results server (e.g. for posting lists):
https://www.newyorker.com/magazine/2018/12/10/the-friendship-that-made-google-huge
Also I will have to find it, but I remember looking at Baidu and Yandex open source code, and I recall thinking that it looks like “Jeff / Sanjay C++ 98”, which is basically the Google C++ style guide. Yes they built all that stuff in C++ 98
In particular that code used protobufs early on, which I believe Sanjay designed and originally implemented as a Perl script. (protobufs have suffered from design issues over 25 years as you would imagine, even though the modern use cases are actually easier than the original ones)
Another way to look at this is by thinking of services as queues, and by thinking of queues as stateful. A queue performs differently depending on what specifically is in the queue now, as well as what came before it.
If you have 2 processors, and your p99 is 5 seconds, one processor is completely busy if one of those 5 second requests comes in. If another higher latency request comes in during that processing time, now every request behind it is going to have additional latency due to the waiting.
This means that the tail requests always have the potential to cause the biggest issues, because queuing time very frequently dominates overall latency.
Head-of-line blocking and concurrency-limiting issue are big drivers of tail latency for sure. I think that’s a slightly different issue from the “multiple service calls samples the distribution multiple times” issue, but still an important one.
Incidentally, concurrency-limiting is one of the reasons I like to use average/mean as my primary summary statistic for latency: because it allows using Little’s law to calculate mean concurrency.
(This post is from 2021, I was excited there might be a new Marc post and was confused since the title sounded familiar.)