The first practical Byzantine Fault Tolerance (PBFT) algorithm was introduced in a 1999 academic paper by Miguel Castro and Barbara Liskov. The pBFT algorithm is robust in order to survive Byzantine faults in asynchronous environments and is optimized to have high processing speeds / low latency such that it can be used in everyday applications.
Prior to Castro and Liskov's work, existing BFT algorithms assumed that networks were synchronous and also typically had slow processing speeds, making them impractical for real world applications. With industry and government becoming increasingly reliant on internet-based services (i.e. asynchronous systems) in the late 1990's, malicious attacks and software errors could have potentially serious consequences.
The Castro-Liskov pBFT algorithm itself is a form of state machine replication, meaning that a system state is copied and distributed across multiple servers. The benefit of replicating and distributing a state machine is that it makes the system more robust, because failures such as Byzantine faults occur in isolation on one server rather than causing the entire system to fail. In other words, a system that is not replicated is only as fault tolerant as the single centralized server being used, whereas a single server failure in a replicated system won't cause the entire system to fail.
How the algorithm works can be summarized in four steps:
- A client sends a request to invoke a service operation to the primary replica.
- The primary replica sends the request to all of the involved backup replicas at the same time.
- The replicas execute the request and send a reply to the client
- The client waits for f + 1 replies from different replicas with the same result, where f is the maximum number of replicas that may be faulty in the system.
Although it was a big step forward in cryptography, the pBFT algorithm still has some shortcomings:
- pBFT is very communication heavy, meaning that the network has significant computational overhead during consensus and isn't scalable past a certain number of nodes.
- pBFT can be attacked by an adversary using a simple scheduling mechanism that can either delay or halt the consensus process.
- pBFT chooses a primary state machine replica in round-robin fashion, which requires nodes to agree on a "membership list" from which the primary can be chosen. This makes participation in a pBFT system permissioned rather than permissionless, meaning that it lacks a key property for decentralization.
Byzantine Fault Tolerance is a necessary property for decentralized systems such as blockchain networks. Blockchains are distributed ledgers which are safer than centrally stored ledgers because the networks that host them are robust enough to continue to operate even if one or more servers in the network are maliciously attacked or otherwise fail arbitrarily. However, BFT is typically only a property of the consensus mechanism used in the majority of blockchains. For example, Bitcoin's proof-of-work consensus mechanism is Byzantine fault tolerant and builds off of the pBFT algorithm, but it does not use the pBFT algorithm itself because of the shortcomings mentioned above.
A blockchain project called Hyperledger actually uses the practical byzantine fault tolerance algorithm as one of its primary consensus mechanisms. Unlike Bitcoin and most cryptocurrencies, Hyperledger is a private blockchain which doesn't prioritize being permissionless or scalable to a large quantity of nodes, which makes using the practical Byzantine Fault Tolerance algorithm, well... practical.
Another project, IoTeX, uses an advanced version of pBFT called Scalable Practical Byzantine Fault Tolerance with Short-Lived Signature Schemes, which was described in an academic paper published in CASCON in 2018. The IoTeX algorithm aims to achieve greater scalablility by reducing the amount of computational overhead required.
Introduction to Hyperledger Business Blockchain Design Philosophy and Consensus
IoTeX Published "Scalable Practical Byzantine Fault Tolerance with Short-Lived Signature Schemes"...
Practical Byzantine Fault Tolerance
Miguel Castro, Barbara Liskov
YAC: BFT Consensus Algorithm for Blockchain
Fedor Muratov, Andrei Lebedev, Nikolai Iushkevich, Bulat Nasrulin, Makoto Takemiya
Documentaries, videos and podcasts
Practical Byzantine Fault Tolerance
- Byzantine Generals' problemThe Byzantine General's problem is a war scenario in which several army battalions led by different generals must collectively agree to a strategy but don't have a reliable, trustworthy way of ensuring that the other generals will all act according to the plan.
- Byzantine fault toleranceByzantine fault tolerance is a property of a distributed system such that it can tolerate components of a system failing in arbitrary ways, processing incorrect states, rather than simply stopping or crashing.
- DecentralizationProcess of redistributing or dispersing functions, powers, people or things away from a central location or authority
- HyperledgerHyperledger is an open source effort by the Linux Foundation to create blockchain related tools and standards.
- IoTeXIoTeX is a decentralized network for the internet of things built using blockchain technology.
- Show More