High-Level Architecture

At a high level, Dynamo is a Distributed Hash Table (DHT) that is replicated across the cluster for high availability and fault tolerance.

Key terms

DHT -- A distributed storage system that provides key-value lookups. Any node in the DHT can efficiently retrieve the value for a given key.
Fault tolerance -- The ability to continue operating when components fail. Dynamo achieves this through redundancy: every piece of data exists on multiple nodes.

Think first

Dynamo needs to be 'always writeable' — even when nodes fail or network partitions occur. What are the key problems a distributed key-value store must solve to achieve this?

Dynamo's architecture in one picture

Dynamo combines six techniques into a single coherent system. Here's how they fit together:

Component	Technique	Pattern
Data distribution	Consistent Hashing with virtual nodes	Spread data evenly; minimize movement when nodes join/leave
Data replication	Optimistic replication → eventual consistency	Replicate asynchronously for speed; tolerate temporary inconsistency
Temporary failure handling	Sloppy quorum + Hinted handoff	Always accept writes, even when replicas are down
Node communication & failure detection	Gossip protocol	Decentralized -- every node knows the state of every other node
Conflict detection	Vector clocks	Track causal history to detect concurrent writes
Permanent failure recovery	Merkle trees for anti-entropy	Efficiently find and fix divergence between replicas

The design philosophy

Notice that every component prioritizes availability over consistency. Sloppy quorum keeps writes flowing during failures. Hinted handoff stores data for downed nodes. Optimistic replication avoids waiting for confirmations. Conflict resolution happens later, during reads. This isn't an accident -- it's the direct consequence of Amazon's business requirement that the system must always accept writes.

How they work together

The flow for a typical write:

Client sends put(key, value) to any Dynamo node
Consistent hashing determines which nodes should store this key
The coordinator sends the write to the top N nodes on the preference list
If a target node is down, a healthy node accepts the write via sloppy quorum and stores a hint
Once W nodes confirm, the write is acknowledged to the client
Gossip protocol keeps all nodes informed about who's alive and what ranges they own
If a node was down, hinted handoff delivers the write when it recovers
If a node was down for a long time, Merkle trees detect and repair divergence in the background
If concurrent writes created conflicts, vector clocks detect them during the next read

Think first

In Dynamo's write flow, what happens if one of the designated replica nodes is down when a write arrives?

Each of the following chapters dives deep into one of these components. But keep this big picture in mind -- Dynamo's power comes from how these pieces fit together, not from any single technique.

Quiz

A Dynamo node goes down for 2 hours and then recovers. Which mechanisms will help bring its data back up to date?

Only Merkle trees — they handle all anti-entropy in Dynamo

Only hinted handoff — hints stored on other nodes are delivered on recovery

Hinted handoff delivers stored writes, read repair fixes stale data on access, and Merkle trees catch any remaining divergence in the background

Vector clocks automatically synchronize the data once the node rejoins

Interview angle

If asked to describe Dynamo's architecture, walk through this table. It shows you understand not just the individual techniques but why each was chosen and how they compose into a system that's always available for writes.

Dynamo's architecture in one picture​

How they work together​

Dynamo's architecture in one picture

How they work together