Meet Alice. Alice is impatient
35 points by hyperpape
35 points by hyperpape
This is interesting, but the author doesn’t explain why this is true, other than the term “t-weighted” and an equation, neither of which make sense to me. I wish they’d explained it more clearly for people like me who’ve long since forgotten their college statistics class.
Say you have a monitoring dashboard. There, you'd normally compute the mean latency by treating every request equally:
mean = (sum of all latencies) / (number of requests)
That's E[X]. Each request gets weight 1/N.
Alice's experience of the latency is in terms of time (which is where I think the "t-weighted" is coming from).
Say Alice makes M requests with latencies x_1, x_2, ..., x_M.
If you ask "during a randomly chosen second of Alice's total waiting time, what's the latency of the request she's waiting on?" each request i contributes a fraction x_i / (Σ_j x_j). (so longer requests contribute more)
So the time-weighted average is:
E_a[X] = Σ_i x_i (x_i / Σ_j x_j) = Σ_i x_i^2 / Σ_j x_j
= (Σ_i x_i^2 / N) / (Σ_j x_j / N)
= E[X^2] / E[X]
As a simple worked out example: if there are two requests, one with latency 1s and one with latency 10s, then the average is 5.5s, but the time-weighted average would be 1 * (1 / 11) + 10 * 10 / 11 = 9.18s.
This is a good explanation of how "t-weighted" can correspond to the formula in the article, thanks! However, the central claim in the article is that this formula is somehow reasonable for human perception, and that's still unclear.
A quote from Wikipedia on renewal processes, which are completely adequately explained by this diagram and the note that each S_i is some independent, random value:
A curious feature of renewal processes is that if we wait some predetermined time t and then observe how large the renewal interval containing t is, we should expect it to be typically larger than a renewal interval of average size.
And, presumably, for some set of additional assumptions, you can strengthen that "larger" to E[X^2]/E[X]. So the post's point is that the user observes at a random point in time whereas your mean outage duration statistic observes at a random outage index (and averages over that). I think it would've been nice had this been the main point of the article.
Perhaps needs retitling to something more informative like "Latency and the inspection paradox"? The current title basically conveys 0 information 😅
This is actually something I also thought about as well.
I think the easiest example to grasp the problem is surveying the number of passengers on a bus by asking the passengers. You would get a lot more people telling you the bus was full, but not single person, when the bus was empty. Every bus could be empty, but one. I believe this is the correct statistics if the goal is to analyze a situation from the perspective of a user.