Woe Is Me!
Due to computer issues (i.e. failing backlight on the monitor) and the lack of a save draft feature, I have subsequently lost my original article that I have been penning over the last few days (although, it has now been a month since I have returned to this saved draft) on an overview of the Byzantine Generals Problem (BGP) as it relates to distributed ledger technologies. The post was largely inspired by @steemed and his first post on the series of Steem Power and Governance -- albeit, I will admit that I have had this particular post in mind for quite some time. It seemed like a good idea at the time to write it while he was also discussing the structural governance of Steem as the subjects are tangentially related.
This new post may or may not be better, depending on your point of view. Since no one, apart from @edgeland and myself, has ever seen the draft, the only opinions that technically matter are ours (and mine really means nothing in the grand scheme of things). Unless you have a keystroke logger on my laptop, then the post is lost to the abyss (I even searched in Firefox's about:cache location to no avail). Alternatively, you could believe that the post does exist in that all possible combinations of the English alphabet with punctuation could potentially be reconstructed. I assure you that this is not how I decided to redo this topic. I hope that I still do it justice, even though I was 90% done beforehand and am afraid that I will not convey the ideas fully.
In addition, I could see how a 'saved draft' feature could become immensely useful as this has happened to me numerous times. I better copy what I have now again, before I lose it!
Introduction
The majority of the references that I will use come from Wikipedia and two early academic papers that describe the original problem. Much later academic papers exist, as the Byzantine Generals Problem is a classical information theory problem that has applications to networking and distributed computing. Applications, if you recall, are the bread and butter (no matter how pie in the sky they may sound) to obtain funding from nation-states or super-national funding agencies.
The BGP is stated in terms of the Byzantine Army attempting to attack a city. Some number of generals encircle the city and must coordinate an attack or sound a withdrawal. Each general commands his own regiment of troops.
Success is defined if all the generals agree on the same plan of action. However, no radio communication exist (this is Byzantine and Marconi -- or was it Tesla -- had not been born and as such had not discovered radio waves) and there is no guarantee that some number of generals are traitorous to the Byzantine Empire.
The generals, can, on principle, pass the messages that they've received from others to another general. Two schemes can be considered in this scenario. Either messages can be forged, i.e. General A tells General B to withdraw and General B tells General C that General A's message is to attack, or messages can be unforged, for instance, with the use of cryptographic digital signatures. We will see, in upcoming sections, how disaster can strike the Empire via backstabbing.
Byzantine Fault Tolerance, Agreement, and Failure
A situation is said to be Byzantine Fault Tolerant if for a fixed number of generals n
and a fixed number of traitors t
, that a Byzantine Agreement can be determined as to whether or not to retreat or attack. Byzantine Failure then occurs in the case of an unsuccessful attack.
We reiterate that success is defined as a full retreat or a full attack as anything else would destroy the Byzantine army due to its forces being divided.
Wikipedia gives a great overview of this problem. If one is diligent, one can follow the links to the original paper that outlines two solutions to the BGP one with oral messages and one with signed messages. We will discuss one of the solutions as well as detail how a failure occurs when 1/3 or more of the generals are treasonous.
For completeness, I invite you to check out the 2 Generals Problem originally posed Akkoyunlu, Ekanadham, and Huber, and popularized by Jim Gray, a computer scientist at Microsoft Research who is presumably lost at sea, in his "Notes on data base operating systems" (unfortunately, I do not have a copy). In 1985, Fischer, Lynch, and Paterson showed that the 2 Generals Problem cannot be solved in bounded time. Note that this does not mean that consensus cannot be reached, but not in finite time, which makes no sense from a practical point of view, so for all intents and purposes, the practical 2GP is solved.
Regarding the Generalized 2 Generals Problem, Lamport, et al, showed that one can consider all the Generals as being equal decision makers in the army, that is to say, one general is not necessarily the ranking officer, and that any random general issuing an initial order of attack or retreat with a specified time for the following day can be considered the same as the situation as a commander sending out the same type of order.
From now on, we will consider the problem as if all the Generals have the same rank, we will say that the first message is given by the "Issuing General." The goal is to understand how the other generals, after receiving a message from the Issuing General, can make a decision on whether to attack or retreat even in the presence of malicious generals.
We first make the following assumptions :
- All loyal generals obey the same order.
- If the Issuing General is loyal, then all loyal generals performs the order.
3t = n Generals has no solution
Under the above conditions, we will see how there is no solution for oral messages when the number of disloyal generals is equal to or more than 1/3 of the total generals.
An oral message is one whose contents are completely under the control of the sender, so a traitorous sender can transmit any possible message.
Oral messages are ones that are the type that one computer can send to another.
It suffices to reduce the problem to exactly 3 Generals.
The only decisions we allow are "retreat" or "attack" with conditions 1 and 2 being satisfied.
Suppose the Issuing General (we'll call him General 0 or G0) is loyal and one of the two remaining generals is traitorous. G0 sends an order of "attack" (or "retreat") to both Generals 1 and 2 (G1 and G2, respectively). Suppose G1 is malicious. G1 then sends the message to G2 that G0 told him to "retreat" (or "attack"). G2 sends the proper message to G1. For Condition 2 to be satisfied, G2 must "attack" (or "retreat").
Consider the situation where G0 is disloyal and both G1 and G2 are loyal. G0 sends "attack" to G1 and "retreat" to G2. G1 and G2 send the other a message that tells him G0's original order.
In both cases, if the traitor is consistent in his messages, G2 is incapable of knowing whether G0 or G1 is disloyal to the empire and so both scenarios appear the same to G2. As such, G2 must follow G0's order (regardless of what it is), otherwise Condition 2 is not satisfied.
In the case of both G1 and G2 being loyal, they will carry out two different orders, violating condition 1.
Therefore, it is impossible to achieve Byzantine Agreement in the case of oral messages and at least 1/3 malicious generals in the case of 3 Generals.
From the above argument, Lamport, et. al., show, by way of contradiction, that no solution exists with fewer than 3t + 1 total generals and t traitorous generals.
Assume a solution exists for 3t or fewer generals, i.e. 3t > n.
Let 3 generals simulate approximately 1/3 of the total generals, i.e. if there are 3t = n total generals, where t is the number of traitorous generals, then all the traitorous generals are simulated by the one traitorous general in the 3 Generals case, and the other generals are split between the other two loyal generals.
But since it was shown that no solution exists in the 3 Generals case, then no solution exists when there are t traitors and 3t or fewer total generals.
Note that this does not necessarily mean a solution exists for 3t + 1 total generals and t traitorous generals, only that there is no solution for 3t total generals in the presence of t traitors. We will see in the next section how Lamport, et al, provide solutions for when the number of traitors is less than 1/3 of the total number of generals.
2 Solutions with 3t < n Generals
Lamport, Shostak, and Pease give 2 different solutions when the number of traitorous generals is strictly less than 1/3.
A Solution with Oral Messages
Suppose there are 3t + 1 generals in the midst of t traitors. The generals follow some algorithm for sending messages to other generals. Assume that loyal generals correctly follow this algorithm (whatever it happens to be).
Assume the following :
- Every message that is sent is sent correctly.
- The receiver of a message knows who sent it.
- The absence of a message can be detected.
Note that 1) and 2) prevent a traitor from interfering with a message between any two generals since 1) does not allow the traitor to interfere with messages at all and 2) guarantees a traitor cannot insert extraneous messages between the two to befuddle their conversation. 3) prevents a traitor from achieving success simply by not sending a message.
Lamport, et al, initially consider the situation where any two generals have a direct link, but later on in their paper, they relax this requirement and allow some intermediary generals to relay the message.
Since a traitorous general can choose not to send an order, the default order if a general does not send a message is "retreat."
Lamport, et al, define a recursive algorithm, for all nonnegative integers t, for the Issuing General to send n - 1 messages, and then they show that OM(t) solves the BGP when n > 3t + 1 in the presence of at most t traitors.
The Recursive Oral Message (OM) Algorithm
Their algorithm assumes a function, called majority, with the following property: if a majority of values {v_1, ..., v_{n-1}} is v then majority(v_1, ..., v_{n-1}) = v.
They note that there are two choices for the value of majority(v_1, ..., v_{n-1}):
- The actual majority value amongst the v_i if it exists; otherwise the return value is "retreat,"
- The median of the v_i, assuming they come from an ordered set.
Base case, t = 0
- The Issuing General sends his value to each of the other generals.
- Each general uses the value he receives or uses "retreat" if he receives on value.
Recursion Step, t > 0
- The Issuing General Sends his value to each of the other generals.
- For each i < n, let v_i be the value General i received from the Issuing General or "retreat" if he received no message. General i acts as an Issuing General for OM(t-1) to send the value v_i to the n - 2 other generals.
- For each i < n, and each j not equal to i, let v_j be the value General i received from General j in step 2 and from OM(t-1) or else "retreat" if he received no such value. General i uses the value majority(v_1, ..., v_{n-1}).
t = 1, n = 4
Let us illustrate how this algorithm works with 1 traitor amongst a total of 4 generals and assume the traitor is General 1.
In the first step, OM(1), the Issuing General sends the message v to all three of the other generals.
Let us now see what General 2 (and General 3) will receive in the second steps.
Since General 1 is traitorous, he sends a command that is contrary to v, the command that General 0 sent. Say that the order he sends to both Generals 2 and 3 is x.
Because Generals 2 and 3 are loyal, they relay General 0's initial value v to each other.
In Step 3, General 2 has v_1 = x, and v_2 = v_3 = v. Similarly, in Step 3, General 3 has v_1 = x and v_2 = v_3 = v.
And so both the loyal generals, General 2 and General 3, obtain the correct majority value v.
What happens if the Issuing General is disloyal to the empire?
Good question! Let's find out, shall we?
Suppose that the Issuing General sends 3 arbitrary commands to each of the other 3 generals. Say that the command is x_i to General i in Step 1.
In Step 2, each of the generals relays the order from the issuing general and by the end, in Step 3, each of the 3 generals will have the correct value v = majority(x_1,x_2,x_3).
Some Analysis of OM(t)
The recursive algorithm for OM(t) given above will invoke n - 1 separate executions of OM(t-1) and each of these will invoke n - 2 separate executions of OM(t - 2), etc. This means that there will be lots of duplicate messages between any two generals depending on the number of traitorous generals.
This ambiguity can be removed if each General i attaches i to the value v_i he sends in Step 2. And as the recursive function propagates, the algorithm OM(t - k) will be called (n-1)(n-2) ... (n-k) to send a value that is prefixed by a sequence of k generals' numbers.
Putting It All Together
In order to prove the correctness of OM(t), the authors first prove the following little lemma.
Lemma 1: For any t and k, the algorithm OM(t) satisfies the property that if the Issuing General is loyal, all loyal generals follow his order.
The following theorem proves that OM(t) solves the Byzantine Generals Problem.
Theorem 1: For any t, Algorithm OM(t) satisfies both properties that all loyal generals execute the same order and that if the Issuing General is loyal, then all loyal generals follow his order.
For proofs of both Lemma 1 and Theorem 1: See paper.
A Solution with Signed Messages
Needless to say, Lamport, Shostak, and Pease considered the situation of cryptographically secure signed messages between any two generals.
I am not going to go over their discussion in detail (as I've severely have gone overboard with this topic already), but I refer all interested readers to Section 4 of their paper.
And for those interested in indirect communication paths, Section 5 outlines that scenario.
Applications to Blockchains
I assure you, dear reader, that a valid reason exists to discuss the Byzantine General's Problem, namely that consensus defined is an application that is Byzantine Fault Tolerant, i.e. a blockchain protocol that defines consensus in a way that is a solution to the Byzantine's General Problem is more robust and not subject to certain types of attacks as one that is not BFT. This is why all distributed ledger technology consensus models should be BFT.
Steem is BFT! Wazooooo!
First off, I want to note that the Steem blockchain which uses a Delegated Proof of Stake consensus model is Byzantine Fault Tolerant due to the fact that 17 out of 19 witnesses (or is it 21, with the miner and the random witness?) are required for a hardfork as 17 / 19 (and 17 / 21) > 2/3 at a particular block.
In other words, a hardfork, which updates what code is being used, i.e. a hardfork includes the code for new rules regarding the underlying dynamics of the game theoretic aspects of Steem or bug fixes, can be seen as a decision to 'attack' (move forward with the next version) or 'retreat' (stay with the current codebase) and is Byzantine Fault Tolerant.
How cool is that?!
I should also note that Bitcoin uses a 'simple' (there is nothing simple about it) longest chain wins consensus protocol combined with a costly proof of work to prevent Sybil attacks.
Other Consensus Models
In the 1989, a consensus model by the name of Paxos (after a fictional legislative consensus based on the Greek island) was developed.
The Lockstep Protocol is used in many real-time strategy games.
Another fairly recent paper on a variant of Raft consensus describes a project called Tangorea. In addition, a fork of Tangorea (and one that also uses a variant of Raft consensus) is being used in the private blockchain smart contract system Juno.
And tangles (which I find to be quite fascinating from a mathematical perspective, for more, check out iotatoken) have their own consensus protocol.
Conclusion
Consensus has a long history in the computer science literature and is a fundamental component of blockchains.
As new consensus models are constructed (or discovered, depending on your view), we should keep these in mind and understand how they can apply to distributed ledger technologies.
Finally, I would like to link to another blog post at Brave New Geek that gave a lot of insight (and links) to how to apply these techniques to distributed computing systems in practice. Many topics and academic papers were described in that post and give a little more history behind the subject of consensus. In addition, it is an easy-read (although, admittedly, the various academic papers may not be).