High-Level Architecture
At a high level, Dynamo is a Distributed Hash Table (DHT) that is replicated across the cluster for high availability and fault tolerance.
- DHT -- A distributed storage system that provides key-value lookups. Any node in the DHT can efficiently retrieve the value for a given key.
- Fault tolerance -- The ability to continue operating when components fail. Dynamo achieves this through redundancy: every piece of data exists on multiple nodes.
Dynamo's architecture in one picture
Dynamo combines six techniques into a single coherent system. Here's how they fit together:
| Component | Technique | Pattern |
|---|---|---|
| Data distribution | Consistent Hashing with virtual nodes | Spread data evenly; minimize movement when nodes join/leave |
| Data replication | Optimistic replication → eventual consistency | Replicate asynchronously for speed; tolerate temporary inconsistency |
| Temporary failure handling | Sloppy quorum + Hinted handoff | Always accept writes, even when replicas are down |
| Node communication & failure detection | Gossip protocol | Decentralized -- every node knows the state of every other node |
| Conflict detection | Vector clocks | Track causal history to detect concurrent writes |
| Permanent failure recovery | Merkle trees for anti-entropy | Efficiently find and fix divergence between replicas |
Notice that every component prioritizes availability over consistency. Sloppy quorum keeps writes flowing during failures. Hinted handoff stores data for downed nodes. Optimistic replication avoids waiting for confirmations. Conflict resolution happens later, during reads. This isn't an accident -- it's the direct consequence of Amazon's business requirement that the system must always accept writes.
How they work together
The flow for a typical write:
- Client sends
put(key, value)to any Dynamo node - Consistent hashing determines which nodes should store this key
- The coordinator sends the write to the top N nodes on the preference list
- If a target node is down, a healthy node accepts the write via sloppy quorum and stores a hint
- Once W nodes confirm, the write is acknowledged to the client
- Gossip protocol keeps all nodes informed about who's alive and what ranges they own
- If a node was down, hinted handoff delivers the write when it recovers
- If a node was down for a long time, Merkle trees detect and repair divergence in the background
- If concurrent writes created conflicts, vector clocks detect them during the next read
Each of the following chapters dives deep into one of these components. But keep this big picture in mind -- Dynamo's power comes from how these pieces fit together, not from any single technique.
If asked to describe Dynamo's architecture, walk through this table. It shows you understand not just the individual techniques but why each was chosen and how they compose into a system that's always available for writes.