15 Vector Clocks
Two clients write to the same key at the same time, on different nodes. Both writes succeed. Now you have two versions of the data. Which one is "correct"? Is one newer than the other, or are they genuinely conflicting?
This is one of the hardest problems in distributed systems, and wall clocks can't solve it.
Background
On a single machine, ordering events is trivial: use the system clock. Write at t=1, then write at t=2 -- the second write is clearly newer.
In a distributed system, this breaks down because of clock skew. Different machines have different clocks, and those clocks drift apart over time. Even with NTP synchronization, clocks can differ by milliseconds or more. This means:
- Time
ton Node A does not mean "before" timet+1on Node B - Two events that appear simultaneous might not be
- Two events that appear sequential might actually be concurrent
Network Time Protocol synchronizes clocks across machines, but can't guarantee perfect synchronization. Typical NTP accuracy is 1-10ms on a LAN, and can be worse across WANs. For ordering events, milliseconds of uncertainty is enough to make timestamp-based ordering unreliable.
Without reliable timestamps, how do you determine causal ordering -- whether one event happened before, after, or concurrently with another?
Definition
Use vector clocks to track causal history across distributed nodes. A vector clock lets you determine whether two versions of data are:
- Causally ordered (one happened before the other -- safe to discard the older one)
- Concurrent (neither happened before the other -- conflict resolution needed)
How it works
A vector clock is a list of (node, counter) pairs -- one entry per node that has written to the data.
On write at Node N:
- Increment N's counter in the vector clock
- Attach the updated vector clock to the data
Comparing two vector clocks (V1 and V2):
- If every counter in V1 ≤ the corresponding counter in V2: V1 is an ancestor of V2 (V2 is newer, V1 can be discarded)
- If every counter in V2 ≤ the corresponding counter in V1: V2 is an ancestor of V1 (V1 is newer)
- If neither condition holds (some counters are greater in V1, some in V2): the versions are concurrent -- they represent conflicting writes that need reconciliation
Example walkthrough:
| Step | Event | Vector clock |
|---|---|---|
| 1 | Node A writes value "v1" | [A:1] |
| 2 | Node A updates to "v2" | [A:2] |
| 3 | Nodes B and C both read [A:2] and write concurrently | |
| 3a | Node B writes "v3" | [A:2, B:1] |
| 3b | Node C writes "v4" | [A:2, C:1] |
| 4 | A read sees both versions | [A:2, B:1] vs [A:2, C:1] -- concurrent! |
In step 4, neither version is an ancestor of the other (B:1 > C:0 but C:1 > B:0), so these are concurrent writes that must be reconciled.
Vector clocks work exactly like Git's commit history. If one branch is a direct descendant of another, Git auto-merges (fast-forward). If two branches diverged, Git can't auto-merge -- you need to resolve the conflict manually. Vector clocks detect the same situations for distributed data.
Conflict resolution strategies
When vector clocks detect a conflict, something must resolve it:
| Strategy | How it works | Used by |
|---|---|---|
| Client resolution | Return all conflicting versions to the client; the client merges them | Dynamo (shopping cart: union of all items) |
| Last-write-wins (LWW) | Use wall-clock timestamps as tiebreaker; latest timestamp wins | Cassandra |
| CRDTs | Use conflict-free data structures that can always be merged automatically | Riak |
Last-write-wins is simpler but dangerous: if two clients write different values at nearly the same time, one write is silently discarded. This is acceptable for some workloads (counters, timestamps) but devastating for others (shopping carts, user profiles). Cassandra chose this trade-off; Dynamo did not.
Examples
Dynamo
Dynamo is the canonical example of vector clocks in production. Each put() includes a context (containing the vector clock from the previous get()). Dynamo uses this to detect concurrent writes. When conflicts are detected, all versions are returned to the client during get(), and the client must merge them.
For detailed mechanics, see Vector Clocks and Conflicting Data.
Vector clocks answer the interview question: "How do you handle concurrent writes in an eventually consistent system?" The key insight: vector clocks don't resolve conflicts -- they detect them. Resolution is a separate problem (client-side merging, LWW, CRDTs). Always distinguish detection from resolution in your answer.