The Life of Dynamo's put() & get() Operations
Now that we understand the building blocks -- consistent hashing, replication, vector clocks -- let's trace how a real put() and get() request flows through the system.
Choosing the coordinator node
When a client wants to read or write, it first needs to reach a coordinator -- the node that will manage the operation. Dynamo supports two strategies:
| Strategy | How it works | Pros | Cons |
|---|---|---|---|
| Load balancer | Client sends request to a generic load balancer, which forwards to any Dynamo node | Client is decoupled from ring topology (simpler client) | The selected node might not be on the preference list → extra hop |
| Partition-aware client | Client maintains a copy of the ring and routes directly to the right node | Lower latency (zero-hop DHT) | Client must stay updated on ring changes; less control over load distribution |
With a partition-aware client, Dynamo is called a "zero-hop DHT" -- the client contacts the correct node directly, without any intermediate routing. This is the fastest path but requires the client to track the ring state.
The consistency protocol: N, R, W
Dynamo uses configurable quorum parameters:
| Parameter | Meaning |
|---|---|
| N | Total number of replicas for each key |
| R | Minimum number of nodes that must respond to a read |
| W | Minimum number of nodes that must acknowledge a write |
The typical configuration is (N=3, W=2, R=2), which gives:
- R + W > N → guarantees overlap between read and write sets (strong consistency if you use strict quorum)
- But remember: Dynamo uses sloppy quorum, so the R + W > N guarantee doesn't strictly hold
Other configurations and their trade-offs:
| Config | Behavior |
|---|---|
| (3, 2, 2) | Balanced -- strong consistency with moderate latency |
| (3, 3, 1) | Fast reads, slow writes, highly durable |
| (3, 1, 3) | Fast writes, slow reads, less durable |
Latency note: The latency of any operation is determined by the slowest of the R (or W) nodes that must respond. This is why R and W are typically set less than N -- you don't want to wait for the slowest replica.
The put() process
- The coordinator generates a new vector clock component for this write
- Saves the data locally (the coordinator is itself one of the N replicas)
- Sends the write to the top N-1 healthy nodes on the preference list
- Waits for W-1 acknowledgments (it already counts itself as one)
- Returns success to the client
The get() process
- The coordinator sends read requests to the top N healthy nodes on the preference list
- Waits for R responses
- If all responses have the same vector clock → return the data
- If responses have conflicting versions → return all versions to the client with their vector clocks for reconciliation
- Triggers read repair in the background if any node returned stale data
Under the hood: the state machine
Each client request creates a state machine on the coordinator node. This state machine handles the complete lifecycle:
- Identify the responsible nodes for the key
- Send requests to those nodes
- Wait for the minimum required responses (with timeout)
- If too few replies → fail the request
- Gather all data versions, perform reconciliation
- Package and return the response to the client
- After returning: wait briefly for any late responses, then trigger read repair on any nodes that returned stale data
The "read-your-writes" optimization
A subtle but important detail: when a put() follows a get(), the coordinator for the write is chosen as the node that responded fastest to the preceding read. This node likely already has the data in its cache, which increases the chance that the client will see their own write on a subsequent read ("read-your-writes" consistency).
- All N nodes are down: Write fails entirely. This is extremely rare with N=3 and geographic distribution, but it's the availability limit.
- Fewer than W nodes respond within timeout: Write fails. Increasing W improves durability but increases the chance of this failure.
- Network partition during write: Sloppy quorum kicks in -- writes go to healthy non-designated nodes. Conflicting versions may emerge, resolved by vector clocks on the next read.
- Vector clock grows too large: Dynamo truncates it, potentially losing causal information. A known weakness.
When explaining Dynamo's request flow, emphasize the sequence: hash the key → find coordinator → replicate to N nodes → wait for W/R acknowledgments → handle conflicts via vector clocks → trigger read repair. This shows you understand the complete end-to-end flow, not just individual components.