The Paxos algorithm, when presented in plain English, is very simple (2021)
10 points by eatonphil
10 points by eatonphil
Could someone help me understand what the consensus problem is? I've never understood it, and the formal definitions haven't helped.
The consensus problem is defined as follows: we have a group of participants. These participants must decide on a value of a register such that the following properties are satisfied:
- Agreement: The value decided by the participants must be the same for all participants. Integrity: Once a participant decided, it cannot change its decision.
- Validity: The value decided by the participants must be one of the values suggested by the participants.
- Termination: A value must be eventually decided by the participants.
Here's a simple algorithm that obeys all these properties: the participants always all pick 0. It has Agreement, Validity, and Termination. Obviously that's not the right problem.
Perhaps "the values suggested by the participants" are an input to the algorithm, not an output, so the algorithm doesn't get to pick them? But then say the group is just Alice and Bob, and Alice only ever suggests 1 or 2 and Bob only ever suggests 173. Does that work, and the algorithm is allowed to settle on 1 or 2 or 173, as long as they all know that's the value settled on? This would mean that if you were allowed to inject a participant who always suggests 0, the the problem becomes trivial, is that right?
The input is a set of suggested values. The output is one specific value of the set. The algorithm should work for any set of input values! What you are saying is a bit like "id is a valid implementation of sort for input 1, 2, 3".
For mathematically inclined, it is tremendously useful to get to the bottom here, to the formal statement of the problem being solved. This turns out to be a statement about behaviors of discrete non-deterministic system. We have a sort-of automaton, that can move from state A to B, C, or D, non-deterministically, in a single step. And what we are proving is that, no matter the non-deterministic choices, the automaton won't get into a bad state.
In this formalism, the non-determinism of initial suggestions and the non-determinism of subsequent state transitions are essentially the same thing, the same ∀ quantifier.
That still admits a trivial solution, though? You pick the smallest suggested value. Even if the values don't have a natural order, they likely have an in memory representation, so you can pick the byte-wise least.
Is there partial information going on, like not all the participants know all the suggested values?
Yup! There are different ways you can model this. For example, each participant might have its own value, or each has a set, but the sets are not necessarily equal, but the most direct way is perhaps to say that suggested values themselves arrive one by one, non-deterministically, to some participants.
Aha, thank you, that gets me to the point where I can e.g. understand the Wikipedia entry now, and see the hard problem here!
One more essential piece is that some participants may be faulty: they might suddenly stop responding, or be maliciously controlled. Without this stipulation, I think it gets easy again: the leader is the participant whose name (pid, whatever) is lexicographically first; the value you pick is the leader's value, which you know because the leader tells you. (Or if the leader doesn't have a suggested value, they ask around until they find one, then they tell everyone else.)
One more essential piece is that some participants may be faulty
Perhaps a more clear way to think about this is that the network can be faulty. In the formalization, the state of the network is a set of all the messages ever sent, and, on a given step, the network delivers arbitrary message, without fairness guarantees. "participant being faulty" is a subset of this failure mode, when a network flat-out refuses to deliver messages to/from a particular node.
be maliciously controlled.
So this one is tricky, if you allow malice (network can invent arbitrary messages), then you need a slightly different algorithm. It's still the same consensus at the core, but you need extra fluff to cryptographically prove things you say as a participant, including your knowledge about other participants.
I think it gets easy again: the leader is the participant whose name (pid, whatever) is lexicographically first; the value you pick is the leader's value, which you know because the leader tells you.
Yeah, this is essentially how you "invent" consensus: you start with simple majority voting, and notice that it can lead to split ballots (one third voted for r, one third voted for g, one third voted for b). To fix that, you add a leader which first selects which value everyone is voting for (so, as long as turnout is high enough, there's agreement). That breaks because any particular leader can get partitioned/crash. The next step is to duplicate the whole thing ℕ times and let each node be a leader in separate ballot, where all ballots happen concurrently. That way, even if some ballots get stuck, some ballots do progress (if you allow for weak fairness guarantee, where at least half nodes are not faulty). The final step is to arrange it so that, if two concurrent ballots complete, they do not contradict each other.
I think there's various ways of stating this requirement and they're all ways to say "you can't just bake in the decision to the algorithm." I think with this formulation you have to interpret it the way you said: you are forced to choose between a value that was provided to the system externally (you could imagine distinguishing between the "deciding" participants and the "proposing" participants, and you are only in control of the "deciding" participants).
As an example of a different way to avoid it, the FLP paper resolves this by saying that among the two outcomes ("yes" or "no"), both outcomes must be possible given some configuration.
Here's a simple algorithm that obeys all these properties: the participants always all pick 0. It has Agreement, Validity, and Termination. Obviously that's not the right problem.
That is proper solution to the problem, however that is not a useful solution.
not sure if it helps, but we had this as a part of our Big Data Processing course. here are my notes (the relevant sections):
https://share.note.sx/xfncghki#kHsEQyZ7pSceJGwdnc5exWFHzRF58QBibgv/UnNW5s8
disclaimer: i do not guarantee everything here is 100% correct.
Consensus is simple, tricky, and bedeviled by poor explanations. It's great that the article starts with deconfusing FLP:
consensus algorithms do not guarantee termination, but safety is always guaranteed.
Sadly, it doesn't actually explain why Paxos works. It's a line-by-line description of code for binary search, without appeal to idea of dichotomy.
At the heart of consensus there's this tricky but beautiful bit of circular inter-temporal reasoning: when a leader selects a value to propose, it carefully does it in such a way as to not conflict with any "past" or "future" leaders. The way we deal with the "past" is by requiring a quorum of voters to ignore (a finite set of) "past" leaders, making sure that none of the "past" ballots can complete unexpectedly. And we don't really need to deal with (infinite) "future", because we are "past" for any "future" leader, and we trust them to not mess up.
And this key idea is missing from the TFA.
For positive recommendations:
I also really enjoyed this blog https://maheshba.bitbucket.io/blog/2021/11/15/Paxos.html on Paxos.