Understanding how distributed nodes reach agreement
A distributed system is a collection of independent computers (nodes) that work together as a single system. Examples include databases like Google Spanner or blockchain networks like Bitcoin and Ethereum.
Since these nodes communicate over potentially unreliable networks, keeping them synchronized is challenging.
Consensus means agreement. In distributed systems, it refers to getting all nodes to agree on a single value or state, even if some nodes fail or send incorrect information.
Goal: All non-faulty nodes should eventually agree on the same result.
Example: Multiple database servers must agree on the order of transactions or the identity of the leader node.
A consensus protocol is an algorithm that ensures nodes agree on a common value (such as a transaction order, leader, or block) even in the presence of failures.
Protocol | Main Use | Key Feature |
---|---|---|
Paxos | Databases, distributed coordination | High fault tolerance |
Raft | Cluster management (e.g., etcd, Kubernetes) | Simpler to understand and implement than Paxos |
PBFT (Practical Byzantine Fault Tolerance) | Blockchain systems | Tolerates malicious (Byzantine) nodes |
Proof of Work / Proof of Stake | Blockchains | Achieve consensus over open, untrusted networks |
Letβs make it intuitive with an example:
If they update separately, their balances may diverge. Consensus ensures all servers agree on the same operation order, maintaining consistency.
Consensus ensures:
Consensus provides:
System: Kubernetes uses etcd (a distributed key-value store) for storing cluster state.
Consensus Protocol: Raft
How it works:
Concept | Meaning |
---|---|
Consensus | Agreement among distributed nodes on a single value or state |
Why Needed | Ensures data consistency and fault tolerance |
Common Protocols | Paxos, Raft, PBFT, PoW/PoS |
Example System | etcd (used by Kubernetes) uses Raft for consistent cluster state |
Letβs see what happens when the leader fails in Raft β this is key to its reliability.
Imagine 5 nodes: A, B, C, D, E. B is the leader. If B crashes or loses network connectivity, followers stop receiving heartbeats from B.
Each follower expects regular heartbeats from the leader. If they stop hearing from it for a timeout period (e.g., 150β300 ms), one follower (say C) assumes the leader is dead and becomes a candidate.
Once C is elected leader:
Even if 1 or 2 nodes fail, the cluster continues to operate normally. Raft only needs a majority to function β in a 5-node cluster, 3 working nodes are enough.
Stage | Description |
---|---|
Failure Detection | Followers stop receiving heartbeats from the leader |
Leader Election | A follower becomes a candidate and requests votes |
New Leader Elected | Majority votes decide the new leader |
Log Synchronization | Uncommitted logs are removed; committed entries retained |
System Recovery | Cluster resumes normal operation |