Big O
77 points by cgrinds
77 points by cgrinds
In the grey box below the bubbleSort
it states:
You may see the average case described using the Greek letter “theta”, Θ, instead of the letter O. Big Theta is similar in that you can have Θ(n) and Θ(1) and so on, the only difference is that it is talking about average performance, not worst-case.
This seems to be based on a misconception Θ has nothing to do with average run-time, it describes that the runtime is bounded both upwards and downwards.
O(g(n)) will tell you that the function will at most grow as fast g(n), Θ(g(n)) means that it grows exactly with that speed, it is a stronger statement to make.
For example if we have a simple function to find the smallest element in a array we could say that it was O(n), but that could mean that there maybe exist a function that can do it faster. If we instead say that it is Θ(n) we will say that you cannot do it faster.
Usually if you are talking about average case complexity you still use the normal big-O notation, see for example quick sort which have an average case of O(n log n), but a worst case perforance of O(n^2): https://en.wikipedia.org/wiki/Quicksort
Ah, thank you! I’ve corrected that section, how does it read now?
I don’t think it’s correct in this shape either. O-notation (and theta and omega) are just a notation, they simply denote a set of related functions. I could use big-O on a physics equation for whatever reason, they have nothing to do with best-case/worst-case - that’s a property of an algorithm. We just use this notation to describe them, if you wish, the ‘type’ of best/worst/average-case of an algorithm is O-notation.
E.g. bubble sort’s best-case runtime is O(n), which is also O(n^10) (since it only means that the actual function describing the runtime semantics of the function will after a certain n
be smaller than the n^10 function, given some constant factor). Its best case is also theta(1), which is a stronger claim, and actually no other non-constant could be written here (though a mathematician may correct me).
Average cases can be tricky (what is an average input?), but for the worst-case we can say that bubble sort is omega(1), but it is also omega(n). But it is theta(n^2) all at the same time.
As you correctly describe in your (otherwise beautiful) post, the best way to understand it is to simply visualize - O means that the graph of a function you care about will at some point be “overtaken” by the function inside the O parens, and will from that point the latter will always be above (though this is not mathematically rigorous).
The very best resource on the actual mathematically rigorous definition I have seen is Introduction to Algorithms by Cormen, 3rd chapter. I absolutely recommend anyone to have a read of it, because this is indeed a very often misunderstood concept.
Generally agree with your comment, but a minor nit: I’m not sure it’s correct to say that bubble-sort’s best-case runtime is O(1). If we let T(n)
be the best-case running time of bubble sort on an input list of size n
, then T(n)
is O(n)
, not O(1)
. Perhaps you are using a different definition of best-case runtime (if so, could you specify?)
That said, re: the original text in the OP, I honestly feel it might be fine as is given that it seems to be intended as a practical introduction to the topic as opposed to a mathematically rigorous treatment. Editing the post toward the latter goal without compromising its accessible lens is a bit difficult.
You are right, corrected! Thanks for catching that.
Dropping mathematical rigor is fine, but best case/average case, etc has nothing to do with the distinction between theta and O. So either leave theta completely out of the picture, or use it in a correct context - at least that’s my opinion.
I agree that, formally, the distinction between theta and O is unrelated to best/average/worst cases. Colloquially, however, in the context of discussing how algorithms scale, I think it’s reasonable to say that O is usually used to describe the worst case, and likewise that theta is applicable when the best and worst running times have the same order of growth.
With this in mind, IMO, given that the article is broadly concerned with big O in an algorithms context (and aims to be an accessible introduction), it’s fine to just make the simpler claims “big O notation always describes the worst-case scenario” and “[theta] is used when the best and worst case have the same order of growth”. FWIW, I read these statements with an implicit “in the context of analyzing how algorithms scale asymptotically”, which might account for our difference in perspective here?
On the section about binary search, if and only if your correct number is 69 then the “correct” text reads Nice.
Outstanding work.
I’m a big fan of the bubble sort visualization that lets you see the code as it runs. That’s really cool.
It took a very long time to make it work 🫠 under the hood it parses the AST of the JavaScript and transforms the code into a sequence of generator expressions. While it does this it keeps track of what variables are in scope and uses that information to inject yield statements between every line so that the current scope is available every time you call next on the generator.
I’m very glad you like it 😁
I love this interactive form of education; hence, an obligatory shoutout to Nicky Case: https://ncase.me/
Please don’t use bubble sort as an example. People who don’t know any better will assume it’s a real sorting algorithm, and then go on to actually implement it in programs (I teach programming, and I promise you this is true). There is no point teaching anyone an egregiously terrible algorithm that should never be used—not even for explanatory purposes.
Sorry John, it’s too late.
Like insertion sort, bubble sort is adaptive, which can give it an advantage over algorithms like quicksort. This means that it may outperform those algorithms in cases where the list is already mostly sorted (having a small number of inversions), despite the fact that it has worse average-case time complexity. For example, bubble sort is O ( n ) {\displaystyle O(n)} on a list that is already sorted, while quicksort would still perform its entire O ( n log n ) {\displaystyle O(n\log n)} sorting process.
From https://en.wikipedia.org/wiki/Bubble_sort.
I’m not so much into teaching algorithms, but n^2 sorting algorithms seem valuable to me; they serve as an example of what’s worse than n log n, and, as the Wikipedia snippet illustrates, that n^2 might be better than n log n in some places.
Great work, love the visualizations.
Random note: in the binary search section, “your number” is used to describe two different concepts:
This makes this line (and the graph) confusing:
Below is a graph of the number of guesses I would need in order to figure out your number for all of the numbers between 1 and 1000.
“The numbers between 1 and 1000” here represent the upper bound of the numbers being chosen, but that’s not clear, and the graph labels the X axis “Your number” which adds to the confusion.
Thank you for the feedback! Pushed a change to that section, let me know if that’s better.
I’d say it’s still confusing because “target number” still sounds like the number actually chosen. Here’s my attempt at rewording it for whatever it’s worth:
Below is a graph of the number of guesses I would need in order to figure out any number for ranges that have an upper bound of 1 to 1000.
and I’d label the X axis of the graph something like “Number upper bound” or “Highest possible number” or “Max target number” or “Max chosen number”, etc.
Every time the upper bound of the target number doubles, […]
First time seeing a blog from you, and I just wanted to say that I love the visual design (especially the cute little non-AI dogs you can pet!).
Instant RSS subscription :)
Keep in mind that when searching, a linear search quite often outperforms a binary search. A thing that is O(n) is faster due to cache locality and a possibility to optimize with SIMD. Not always, but quite often with arrays that fit to the cache, you avoid branching which helps the compiler to optimize.
I love @pkhuong’s old post, binary search eliminates branch mispredictions which shows that a properly-implemented binary search can do very well. But I expect the vector/scalar tradeoff is quite different these days.
Much love to you for this:
init() {
// Look, I know. This is a fucking monstrosity. I agree with you and I hate
// myself. However, for reasons beyond my understanding, Firefox on iOS
// would not load the hero without this convoluted sequence of waits. So
// they're staying. I don't like it any more than you do.
requestAnimationFrame(() => {
setTimeout(() => {
this.createSVG();
requestAnimationFrame(() => {
setTimeout(() => {
this.startAnimation();
}, 50);
});
}, 50);
});
}
Beautiful article, as always.
Just a couple of thought I had after reading it:
as you also note, the only way to actually know which algorithm is better, is by measuring it. The algorithmically relevant information here is that what these notations do is “swallow” the constant factor themselves. But in real life they are absolutely there and for small n values they may change the results - though very importantly, for big enough n, the better algorithm will always end up winning. An interesting practical experiment might be writing a very optimized C/rust whatever bubble sort, and comparing it against a very inefficiently written merge sort in a much slower language.
my another note, which may be more important - people often think of these notations as a magic token they have to learn. I found it very enlightening when I got the task of writing the pseudo code for a simple algorithm, and simply trying to write an expression for its runtime. Probably nothing new for most people, but in the slim chance it is interesting for someone here is how it could go:
You take each line of the code, and assign it a constant assuming that the “line” takes that amount of time, e.g. this Addition will take c1 time, this array element move c2, etc. You can then simply sum these all up and get a formula (when you have a loop, you might have to make a decision on which runtime metric you want to calculate. E.g. the loop for bubble sort will execute different number of times depending on the input data. But we can do an upper bound in the worst case scenario, so multiply the inner blocks constants’ sum with that number).
For a O(n^2)
algorithm it might look as (c1+c2)*n^2+(c3+c4)*n+c5
. This is a general formula where replacing the constant values with some measured value, we could even get an estimate on the actual runtime - for a given implementation in a given language. [1]
It’s just that we can just replace c1+c2 as a new constant, find a bigger number (e.g. c3+c4+1) and add that to it as well, calling it c. We can easily see that c*n^2 will always be larger than n, so we can replace our function with an asymptotic upper bound like this. The constant c5 can be similarly dealt with, resulting in c'*n^2
. The O notation will remove the constant, leaving us with the well-known formula.
[1] it’s an interesting topic that this type of formulas don’t take caches, branch prediction, etc into account - mostly because it would make it prohibitively difficult , though there are better models.