Skip to main content

The Life of Dynamo's put() & get() Operations

Now that we understand the building blocks -- consistent hashing, replication, vector clocks -- let's trace how a real put() and get() request flows through the system.

Think first
In a fully decentralized system like Dynamo, any node can receive a client request. But not every node is responsible for every key. How should the system route requests to the right node, and what are the trade-offs?

Choosing the coordinator node

When a client wants to read or write, it first needs to reach a coordinator -- the node that will manage the operation. Dynamo supports two strategies:

StrategyHow it worksProsCons
Load balancerClient sends request to a generic load balancer, which forwards to any Dynamo nodeClient is decoupled from ring topology (simpler client)The selected node might not be on the preference list → extra hop
Partition-aware clientClient maintains a copy of the ring and routes directly to the right nodeLower latency (zero-hop DHT)Client must stay updated on ring changes; less control over load distribution
Zero-hop DHT

With a partition-aware client, Dynamo is called a "zero-hop DHT" -- the client contacts the correct node directly, without any intermediate routing. This is the fastest path but requires the client to track the ring state.

The consistency protocol: N, R, W

Dynamo uses configurable quorum parameters:

ParameterMeaning
NTotal number of replicas for each key
RMinimum number of nodes that must respond to a read
WMinimum number of nodes that must acknowledge a write

The typical configuration is (N=3, W=2, R=2), which gives:

  • R + W > N → guarantees overlap between read and write sets (strong consistency if you use strict quorum)
  • But remember: Dynamo uses sloppy quorum, so the R + W > N guarantee doesn't strictly hold

Other configurations and their trade-offs:

ConfigBehavior
(3, 2, 2)Balanced -- strong consistency with moderate latency
(3, 3, 1)Fast reads, slow writes, highly durable
(3, 1, 3)Fast writes, slow reads, less durable

Latency note: The latency of any operation is determined by the slowest of the R (or W) nodes that must respond. This is why R and W are typically set less than N -- you don't want to wait for the slowest replica.

The put() process

  1. The coordinator generates a new vector clock component for this write
  2. Saves the data locally (the coordinator is itself one of the N replicas)
  3. Sends the write to the top N-1 healthy nodes on the preference list
  4. Waits for W-1 acknowledgments (it already counts itself as one)
  5. Returns success to the client

The get() process

  1. The coordinator sends read requests to the top N healthy nodes on the preference list
  2. Waits for R responses
  3. If all responses have the same vector clock → return the data
  4. If responses have conflicting versions → return all versions to the client with their vector clocks for reconciliation
  5. Triggers read repair in the background if any node returned stale data

Under the hood: the state machine

Each client request creates a state machine on the coordinator node. This state machine handles the complete lifecycle:

  1. Identify the responsible nodes for the key
  2. Send requests to those nodes
  3. Wait for the minimum required responses (with timeout)
  4. If too few replies → fail the request
  5. Gather all data versions, perform reconciliation
  6. Package and return the response to the client
  7. After returning: wait briefly for any late responses, then trigger read repair on any nodes that returned stale data
Think first
During a get() operation, the coordinator receives responses from multiple replicas. Some responses might have different vector clocks. How should the coordinator handle this before responding to the client?

The "read-your-writes" optimization

A subtle but important detail: when a put() follows a get(), the coordinator for the write is chosen as the node that responded fastest to the preceding read. This node likely already has the data in its cache, which increases the chance that the client will see their own write on a subsequent read ("read-your-writes" consistency).

What could go wrong?
  • All N nodes are down: Write fails entirely. This is extremely rare with N=3 and geographic distribution, but it's the availability limit.
  • Fewer than W nodes respond within timeout: Write fails. Increasing W improves durability but increases the chance of this failure.
  • Network partition during write: Sloppy quorum kicks in -- writes go to healthy non-designated nodes. Conflicting versions may emerge, resolved by vector clocks on the next read.
  • Vector clock grows too large: Dynamo truncates it, potentially losing causal information. A known weakness.
Quiz
You configure a Dynamo cluster with N=3, R=1, W=3. What is the practical impact of this configuration?
Interview angle

When explaining Dynamo's request flow, emphasize the sequence: hash the key → find coordinator → replicate to N nodes → wait for W/R acknowledgments → handle conflicts via vector clocks → trigger read repair. This shows you understand the complete end-to-end flow, not just individual components.