Replication
Dynamo stores your data on one node. That node dies. Is the data gone? Obviously not -- Dynamo replicates everything. But how it replicates, and what it does when replicas are down, is where the interesting design decisions live.
Optimistic replication
Dynamo replicates each data item on N nodes (where N is the configurable replication factor). Here's the process:
- A key is hashed to determine its coordinator node (the first node clockwise on the ring)
- The coordinator stores the data locally
- The coordinator replicates the data to the next N-1 clockwise nodes on the ring
This replication happens asynchronously -- the coordinator doesn't wait for all replicas to confirm before acknowledging the write. This is called optimistic replication: replicas are not guaranteed to be identical at all times, but they'll converge eventually.
The replication factor (N) is the number of nodes that store copies of each data item. With N=3, every piece of data exists on 3 different nodes. If one node fails, two others still have the data.
Preference list
For each key, Dynamo maintains a preference list -- the ordered list of nodes that should store replicas of that key. The preference list is determined by the consistent hashing ring, but with an important detail: it skips virtual nodes that map to the same physical node. This ensures replicas actually land on different physical machines.
Every node in Dynamo can independently compute the preference list for any key (since every node has a copy of the ring topology via gossip). This means any node can coordinate any request -- no central routing needed.
Sloppy quorum: handling temporary failures
Here's where Dynamo diverges from traditional quorum systems. In a strict quorum, writes must go to the key's designated replica nodes. If one is down, the write fails (or blocks).
Dynamo uses a sloppy quorum instead: reads and writes go to the first N healthy nodes from the preference list. These might not be the key's designated replicas -- if a designated replica is down, a non-designated node steps in.
Example: With N=3, a key's preference list is [Server 1, Server 2, Server 3]. If Server 1 is down:
- The write goes to Server 2, Server 3, and Server 4 (the next healthy node)
- Server 4 stores the data with a hint noting that this data actually belongs on Server 1
- When Server 1 recovers, Server 4 forwards the data to it via hinted handoff
Hinted handoff: "always writeable"
The mechanism where a healthy node temporarily accepts writes for a downed node is called hinted handoff. It's how Dynamo stays "always writeable":
- Node receives data meant for a downed node → stores it locally with a hint
- Periodically scans its hint database
- When the target node recovers (detected via gossip), forwards the data
- After successful delivery, deletes the hint
In the extreme case, even if only a single Dynamo node is alive, it will still accept writes and store hints for all the downed nodes. When they recover, data flows to them.
Because sloppy quorum allows writes to go to non-designated nodes, two concurrent writes to the same key can be accepted by non-overlapping sets of nodes. This means the R + W > N consistency guarantee doesn't strictly hold. Multiple conflicting versions of the same data can coexist. Dynamo handles this using vector clocks.
The key question: "What happens when a replica is down during a write?" In most systems, this means reduced durability or a failed write. Dynamo's answer -- sloppy quorum + hinted handoff -- lets it always accept writes. The trade-off is clear: potential inconsistency (conflicting versions) in exchange for guaranteed availability. Always mention both the benefit and the cost.