Skip to main content

Replication

Dynamo stores your data on one node. That node dies. Is the data gone? Obviously not -- Dynamo replicates everything. But how it replicates, and what it does when replicas are down, is where the interesting design decisions live.

Think first
Dynamo needs to replicate data across multiple nodes for fault tolerance. Should the coordinator wait for ALL replicas to confirm before acknowledging the write to the client? What are the trade-offs?

Optimistic replication

Dynamo replicates each data item on N nodes (where N is the configurable replication factor). Here's the process:

  1. A key is hashed to determine its coordinator node (the first node clockwise on the ring)
  2. The coordinator stores the data locally
  3. The coordinator replicates the data to the next N-1 clockwise nodes on the ring

This replication happens asynchronously -- the coordinator doesn't wait for all replicas to confirm before acknowledging the write. This is called optimistic replication: replicas are not guaranteed to be identical at all times, but they'll converge eventually.

Replication factor

The replication factor (N) is the number of nodes that store copies of each data item. With N=3, every piece of data exists on 3 different nodes. If one node fails, two others still have the data.

Preference list

For each key, Dynamo maintains a preference list -- the ordered list of nodes that should store replicas of that key. The preference list is determined by the consistent hashing ring, but with an important detail: it skips virtual nodes that map to the same physical node. This ensures replicas actually land on different physical machines.

Every node in Dynamo can independently compute the preference list for any key (since every node has a copy of the ring topology via gossip). This means any node can coordinate any request -- no central routing needed.

Think first
The preference list determines which N nodes store replicas of a key. But with virtual nodes, consecutive positions on the ring might map to the same physical node. Why is this a problem, and how should the preference list handle it?

Sloppy quorum: handling temporary failures

Here's where Dynamo diverges from traditional quorum systems. In a strict quorum, writes must go to the key's designated replica nodes. If one is down, the write fails (or blocks).

Dynamo uses a sloppy quorum instead: reads and writes go to the first N healthy nodes from the preference list. These might not be the key's designated replicas -- if a designated replica is down, a non-designated node steps in.

Example: With N=3, a key's preference list is [Server 1, Server 2, Server 3]. If Server 1 is down:

  1. The write goes to Server 2, Server 3, and Server 4 (the next healthy node)
  2. Server 4 stores the data with a hint noting that this data actually belongs on Server 1
  3. When Server 1 recovers, Server 4 forwards the data to it via hinted handoff

Hinted handoff: "always writeable"

The mechanism where a healthy node temporarily accepts writes for a downed node is called hinted handoff. It's how Dynamo stays "always writeable":

  • Node receives data meant for a downed node → stores it locally with a hint
  • Periodically scans its hint database
  • When the target node recovers (detected via gossip), forwards the data
  • After successful delivery, deletes the hint

In the extreme case, even if only a single Dynamo node is alive, it will still accept writes and store hints for all the downed nodes. When they recover, data flows to them.

The cost of sloppy quorum

Because sloppy quorum allows writes to go to non-designated nodes, two concurrent writes to the same key can be accepted by non-overlapping sets of nodes. This means the R + W > N consistency guarantee doesn't strictly hold. Multiple conflicting versions of the same data can coexist. Dynamo handles this using vector clocks.

Quiz
With N=3, W=2, and R=2 in a sloppy quorum setup, Server 1 and Server 2 are both down. A write for a key whose preference list is [Server 1, Server 2, Server 3] arrives. What happens?
Interview angle

The key question: "What happens when a replica is down during a write?" In most systems, this means reduced durability or a failed write. Dynamo's answer -- sloppy quorum + hinted handoff -- lets it always accept writes. The trade-off is clear: potential inconsistency (conflicting versions) in exchange for guaranteed availability. Always mention both the benefit and the cost.