The Consistency Spectrum: From Eventually Consistent to Linearizable
Consistency is not a binary choice. The seven systems in this course span the entire consistency spectrum, and each chose its position for specific, defensible reasons. Understanding why each system landed where it did -- not just what consistency it offers -- is what separates a surface-level answer from a deep one in an interview.
The spectrum at a glance
Eventual consistency: Dynamo
Position: Furthest toward availability on the spectrum.
The business reason: Dynamo was built for Amazon's shopping cart. The core insight was brutally pragmatic: a customer who sees a slightly stale cart still buys things; a customer who sees an error page leaves. Amazon quantified this -- even brief unavailability correlated with measurable revenue loss.
How it works: Dynamo uses vector clocks to track causality between writes. When concurrent writes create divergent versions, Dynamo stores all versions and presents them to the application during the next read. The application (not the database) decides how to reconcile. For shopping carts, Amazon uses a "merge all items" strategy -- the union of conflicting carts.
The cost: Application-level conflict resolution is complex and error-prone. Every application team must understand distributed systems well enough to write correct merge logic. This is a significant operational burden that Cassandra later tried to reduce.
See: Dynamo's vector clocks and conflicting data
Tunable consistency: Cassandra
Position: Spans a range -- from eventually consistent to strongly consistent per-query -- depending on the R and W parameters chosen by the client.
The business reason: Cassandra was built at Facebook for inbox search, then evolved into a general-purpose database. Different use cases within the same cluster need different consistency levels. A notification counter can tolerate staleness; a financial ledger cannot.
How it works: Cassandra uses quorum reads and writes with configurable consistency levels. The key formula:
- W + R > N guarantees strong consistency (any read quorum overlaps with any write quorum)
- W + R ≤ N allows eventual consistency for lower latency
Where N is the replication factor, W is the number of write acknowledgments required, and R is the number of replicas read.
| Cassandra Setting | W | R | Consistency | Latency | Use Case |
|---|---|---|---|---|---|
| ONE/ONE | 1 | 1 | Eventual | Lowest | Analytics, logging |
| QUORUM/QUORUM | N/2+1 | N/2+1 | Strong | Medium | General purpose |
| ALL/ONE | N | 1 | Strong | Highest write | Rare writes, frequent reads |
| LOCAL_QUORUM | Majority in local DC | Majority in local DC | Strong within DC | Low within DC | Multi-datacenter apps |
The advancement over Dynamo: Cassandra uses last-write-wins (LWW) with timestamps as the default conflict resolution, eliminating the need for application-level merge logic. This is simpler but introduces a different problem: clock skew between nodes can cause "later" writes to overwrite "earlier" ones incorrectly.
See: Cassandra consistency levels
Relaxed consistency: GFS
Position: Stronger than eventual, but weaker than traditional strong consistency. GFS defines a nuanced model with three states: consistent, defined, and undefined.
The business reason: GFS serves Google's batch processing infrastructure. Throughput on multi-gigabyte files matters far more than read-your-writes semantics. The applications (MapReduce, web crawlers) are designed to tolerate duplicate and out-of-order records.
How it works: GFS distinguishes between:
- Record append (atomic, at-least-once): the primary assigns an offset and all replicas write there. If a replica fails mid-write, the append is retried, potentially creating duplicates -- but each successful append is defined (all replicas agree on its content at that offset).
- Random write (concurrent): multiple clients writing to the same region produces consistent-but-undefined data. Replicas agree on the bytes, but the content is an unpredictable interleaving.
Strong consistency: BigTable, HDFS, Kafka
These three systems provide strong consistency but scoped to different units -- a tablet, a file, or a partition.
BigTable: strong per-tablet
The business reason: BigTable serves Google's latency-sensitive applications (web search indexing, Google Earth, Google Finance). Users querying a table must see the latest data. Stale search results are acceptable for web search (seconds-old index is fine), but the index build itself must be strongly consistent to avoid corruption.
How it works: Each tablet (a range of rows) is served by exactly one tablet server at a time. All reads and writes for a tablet go through that single server, which provides strong consistency trivially -- there is only one writer. Chubby ensures that the assignment of tablets to servers is itself consistent (no two servers believe they own the same tablet).
The single-server-per-tablet model means BigTable's consistency is a consequence of its architecture rather than a distributed consensus protocol running per operation.
See: BigTable components, Working with tablets
HDFS: strong per-file (single-writer)
The business reason: HDFS stores input and output for MapReduce and other batch processing frameworks. Each file is written once by a single writer and read many times. The single-writer model eliminates write conflicts entirely.
How it works: The NameNode controls all metadata and block placement. A file being written has a single client holding a lease. Writes are pipelined to DataNodes in a chain, and a write is acknowledged only after all replicas confirm. Once a file is closed, it is immutable. This write-once-read-many model provides strong consistency without any consensus protocol during normal writes.
Kafka: strong per-partition
The business reason: Kafka was built for LinkedIn's activity stream and operational metrics. Within a single partition, messages must be strictly ordered -- if a user likes a post and then unlikes it, consumers must see those events in order or the derived state will be wrong.
How it works: Each partition has a single leader broker that handles all reads and writes. The leader maintains a high-water mark -- the last offset that has been replicated to all in-sync replicas (ISR). Consumers only see messages up to the high-water mark, ensuring they never read data that could be lost if the leader fails.
| Kafka Setting | Guarantee | Trade-off |
|---|---|---|
| acks=0 | No durability | Fastest |
| acks=1 | Leader has written | Leader failure can lose data |
| acks=all, min.insync.replicas=2 | Majority-replicated | Strongest, higher latency |
Linearizable consistency: Chubby
Position: Furthest toward consistency on the spectrum.
The business reason: Chubby is a distributed lock service. If two processes both believe they hold the same lock, the invariant that the lock is supposed to protect is violated. Stale reads are not just inconvenient -- they are correctness failures. A DNS entry served from a stale cache might direct traffic to a decommissioned server. A master election with inconsistent lock state might produce two masters.
How it works: Chubby uses Paxos consensus among typically five replicas. All reads and writes go through the elected master, which is the only replica that can serve requests. The Paxos protocol ensures that all replicas agree on the sequence of operations, even in the presence of failures. Client sessions include a grace period to handle master failover without immediately invalidating all locks.
See: Chubby sessions and events, Locks and sequencers
The PACELC extension
The CAP theorem tells us what happens during a network partition: choose between consistency and availability. But partitions are rare. What about normal operation?
The PACELC theorem extends CAP: if there is a Partition, choose Availability or Consistency; Else (normal operation), choose Latency or Consistency.
| System | During Partition (PAC) | Normal Operation (ELC) | PACELC Classification |
|---|---|---|---|
| Dynamo | Availability | Latency | PA/EL |
| Cassandra | Availability (default) | Latency (tunable) | PA/EL (default) |
| GFS | Consistency | Consistency | PC/EC |
| HDFS | Consistency | Consistency | PC/EC |
| BigTable | Consistency | Consistency | PC/EC |
| Kafka | Consistency (acks=all) | Consistency | PC/EC |
| Chubby | Consistency | Consistency | PC/EC |
Notice the pattern: all leader-based systems in this course choose consistency, while the peer-to-peer systems choose availability. This is not a coincidence -- it is a direct consequence of the architecture. A system with a single leader naturally provides consistency (one authoritative copy) and naturally sacrifices availability when the leader is unreachable. A peer-to-peer system naturally provides availability (any node can serve requests) and naturally sacrifices consistency (multiple copies may diverge).
Practical guidance: choosing a consistency level
When an interviewer asks you to design a system, the consistency choice should flow from three questions:
- What is the cost of a stale read? (Inconvenience? Data corruption? Financial loss?)
- What is the cost of unavailability? (User retry? Lost revenue? Safety hazard?)
- What is the read/write ratio and latency budget?
Map the answers to the spectrum:
- Stale reads are harmless, unavailability is costly: eventual consistency (Dynamo model)
- Depends on the operation: tunable consistency (Cassandra model)
- Stale reads are costly, unavailability is tolerable: strong consistency (BigTable/Kafka model)
- Stale reads are correctness violations: linearizable (Chubby model)
Decision table
| If your application... | Choose | Example system model |
|---|---|---|
| Can merge conflicting writes | Eventual | Shopping cart (Dynamo) |
| Has mixed read/write patterns with varying freshness needs | Tunable | Social media with analytics + messaging (Cassandra) |
| Does large sequential writes, single writer | Relaxed / strong single-writer | Log processing (GFS, HDFS) |
| Needs strict ordering within a stream | Strong per-partition | Event streaming, change data capture (Kafka) |
| Needs mutual exclusion or coordination | Linearizable | Leader election, distributed locking (Chubby) |
The hidden cost of strong consistency
Strong consistency is not free. Every system that chooses it pays a price:
-
Latency: Every write must be acknowledged by multiple replicas (or the single authoritative server) before returning. In GFS, the pipeline write to three chunkservers adds latency. In Chubby, Paxos requires a majority round-trip.
-
Availability: If the leader or a quorum of replicas is unreachable, the system cannot serve writes (and often cannot serve reads either). GFS's single master is a well-known availability bottleneck.
-
Throughput: Serializing all operations through a single leader limits write throughput to what one node can handle. BigTable mitigates this by partitioning into tablets (one leader per tablet), and Kafka mitigates it by partitioning into partitions (one leader per partition). But within a single tablet or partition, throughput is bounded by one machine.
The peer-to-peer systems avoid all three costs -- any node can handle any request, there is no quorum delay, and throughput scales with node count. The price they pay is the complexity of conflict resolution.