15 Vector Clocks

Two clients write to the same key at the same time, on different nodes. Both writes succeed. Now you have two versions of the data. Which one is "correct"? Is one newer than the other, or are they genuinely conflicting?

This is one of the hardest problems in distributed systems, and wall clocks can't solve it.

Think first

Two clients write different values to the same key on different nodes simultaneously. Both writes succeed. You cannot use wall-clock timestamps because of clock skew. How would you determine whether one write happened before the other, or whether they are genuinely concurrent conflicts?

Background

On a single machine, ordering events is trivial: use the system clock. Write at t=1, then write at t=2 -- the second write is clearly newer.

In a distributed system, this breaks down because of clock skew. Different machines have different clocks, and those clocks drift apart over time. Even with NTP synchronization, clocks can differ by milliseconds or more. This means:

Time t on Node A does not mean "before" time t+1 on Node B
Two events that appear simultaneous might not be
Two events that appear sequential might actually be concurrent

NTP

Network Time Protocol synchronizes clocks across machines, but can't guarantee perfect synchronization. Typical NTP accuracy is 1-10ms on a LAN, and can be worse across WANs. For ordering events, milliseconds of uncertainty is enough to make timestamp-based ordering unreliable.

Without reliable timestamps, how do you determine causal ordering -- whether one event happened before, after, or concurrently with another?

Definition

Use vector clocks to track causal history across distributed nodes. A vector clock lets you determine whether two versions of data are:

Causally ordered (one happened before the other -- safe to discard the older one)
Concurrent (neither happened before the other -- conflict resolution needed)

How it works

A vector clock is a list of (node, counter) pairs -- one entry per node that has written to the data.

On write at Node N:

Increment N's counter in the vector clock
Attach the updated vector clock to the data

Comparing two vector clocks (V1 and V2):

If every counter in V1 ≤ the corresponding counter in V2: V1 is an ancestor of V2 (V2 is newer, V1 can be discarded)
If every counter in V2 ≤ the corresponding counter in V1: V2 is an ancestor of V1 (V1 is newer)
If neither condition holds (some counters are greater in V1, some in V2): the versions are concurrent -- they represent conflicting writes that need reconciliation

Example walkthrough:

Step	Event	Vector clock
1	Node A writes value "v1"	`[A:1]`
2	Node A updates to "v2"	`[A:2]`
3	Nodes B and C both read `[A:2]` and write concurrently
3a	Node B writes "v3"	`[A:2, B:1]`
3b	Node C writes "v4"	`[A:2, C:1]`
4	A read sees both versions	`[A:2, B:1]` vs `[A:2, C:1]` -- concurrent!

In step 4, neither version is an ancestor of the other (B:1 > C:0 but C:1 > B:0), so these are concurrent writes that must be reconciled.

The Git analogy

Vector clocks work exactly like Git's commit history. If one branch is a direct descendant of another, Git auto-merges (fast-forward). If two branches diverged, Git can't auto-merge -- you need to resolve the conflict manually. Vector clocks detect the same situations for distributed data.

Conflict resolution strategies

When vector clocks detect a conflict, something must resolve it:

Strategy	How it works	Used by
Client resolution	Return all conflicting versions to the client; the client merges them	Dynamo (shopping cart: union of all items)
Last-write-wins (LWW)	Use wall-clock timestamps as tiebreaker; latest timestamp wins	Cassandra
CRDTs	Use conflict-free data structures that can always be merged automatically	Riak

LWW silently loses data

Last-write-wins is simpler but dangerous: if two clients write different values at nearly the same time, one write is silently discarded. This is acceptable for some workloads (counters, timestamps) but devastating for others (shopping carts, user profiles). Cassandra chose this trade-off; Dynamo did not.

Examples

Dynamo

Dynamo is the canonical example of vector clocks in production. Each put() includes a context (containing the vector clock from the previous get()). Dynamo uses this to detect concurrent writes. When conflicts are detected, all versions are returned to the client during get(), and the client must merge them.

For detailed mechanics, see Vector Clocks and Conflicting Data.

Interview angle

Vector clocks answer the interview question: "How do you handle concurrent writes in an eventually consistent system?" The key insight: vector clocks don't resolve conflicts -- they detect them. Resolution is a separate problem (client-side merging, LWW, CRDTs). Always distinguish detection from resolution in your answer.

Quiz

Cassandra uses last-write-wins (LWW) with wall-clock timestamps instead of vector clocks. Two clients write to the same key at nearly the same time. Client A's clock is 5ms ahead of Client B's. What happens?

Both writes are preserved and returned to the client for reconciliation.

Client A's write always wins regardless of which write was 'logically' last, because LWW uses wall-clock timestamps that are subject to clock skew -- Client B's write is silently discarded even though it may have been the more recent logical update.

Cassandra detects the clock skew and adjusts timestamps automatically.

The write order doesn't matter because both writes are eventually consistent.

Background​

Definition​

How it works​

Conflict resolution strategies​

Examples​

Dynamo​