16 CAP Theorem

The CAP theorem is probably the most cited -- and most misunderstood -- concept in distributed systems. It sounds simple, but the implications are subtle and shape every major design decision in the systems we study in this course.

Think first

A network partition splits your distributed database into two halves. Clients on both sides are still sending reads and writes. You have to make a choice: do you reject some requests to maintain consistency, or serve all requests knowing some may return stale data?

Background

In a distributed system, your data lives on multiple machines connected by a network. Networks are unreliable -- cables get cut, switches fail, data centers lose connectivity. When this happens, parts of your system can't talk to each other. This is called a network partition, and it's not a theoretical concern -- it will happen in any system that runs long enough at scale.

The question becomes: when your system is partitioned, what do you sacrifice?

Definition

The CAP theorem (proposed by Eric Brewer in 2000, proven by Gilbert and Lynch in 2002) states that a distributed system can provide at most two of these three guarantees simultaneously:

Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time. It's as if you had a single copy of the data.

Availability (A): Every request to a non-failing node receives a response -- no errors, no timeouts. The system is always usable.

Partition tolerance (P): The system continues to operate despite network partitions -- messages being lost or delayed between nodes.

Why it's really a two-choice problem

Here's what most explanations get wrong: this isn't "pick any two from a menu." In any real distributed system, partitions will happen. You don't get to choose "no partitions" -- the network will eventually fail. So partition tolerance isn't optional. It's a given.

That means the real choice is:

When a network partition occurs, do you sacrifice consistency or availability?

Choice	What happens during a partition	Systems that make this choice
CP (Consistency + Partition tolerance)	The system refuses to respond to requests it can't guarantee are consistent. Some requests get errors or timeouts.	BigTable, HBase, MongoDB, Chubby/ZooKeeper
AP (Availability + Partition tolerance)	The system responds to every request, but some responses may return stale data. Different nodes may return different values.	Dynamo, Cassandra, CouchDB, Riak

The "CA" trap

You'll sometimes see "CA systems" mentioned -- systems that have consistency and availability but not partition tolerance. In a truly distributed system, this doesn't exist. A "CA" system is just a single-node database (like a traditional RDBMS that doesn't replicate). The moment you distribute data across a network, partitions become possible, and you must choose.

What CAP actually means in practice

CAP is about behavior during a partition, not during normal operation. When everything is healthy, you can absolutely have both consistency and availability. The theorem only constrains you during failures.

This is a crucial distinction. Most of the time, your system is running fine. CAP tells you what your system does in the worst case -- and that's exactly what interviewers want to hear you reason about.

Common misconception

CAP doesn't mean you get zero consistency in an AP system or zero availability in a CP system. It's a spectrum. Dynamo (AP) still resolves most inconsistencies quickly. BigTable (CP) is still highly available in practice -- it just might reject requests during partitions. The question is: what's the guarantee under the worst conditions?

Examples from this course

Dynamo (AP): Amazon's philosophy is "the shopping cart must always be writable." During a partition, Dynamo keeps accepting writes on both sides. This creates conflicting versions, which are reconciled later using vector clocks. The customer's experience is uninterrupted, but they might briefly see stale data.

BigTable (CP): Google chose strong consistency for BigTable. Every read returns the latest write, guaranteed. During a partition, some tablets may become unavailable until the partition heals. Google's use case (indexing, analytics) values correctness over constant availability.

Cassandra (tunable): Cassandra is interesting because it lets you choose per-query. With a quorum read (R + W > N), you get consistency. With lower consistency levels, you get higher availability. This makes Cassandra a powerful case study for understanding CAP as a spectrum, not a binary choice.

Interview angle

When asked about CAP in an interview, don't just recite the definition. Show that you understand:

Partitions are inevitable, so it's really "C or A during partitions"
It's about guarantees during failure, not normal operation
Real systems treat it as a spectrum, not a binary switch (see PACELC for the full picture)
The right choice depends on the business requirement -- there's no universally "correct" answer

CAP doesn't tell the whole story

CAP is a useful mental model, but it's incomplete. It says nothing about what happens when there is no partition -- yet real systems still have to choose between latency and consistency during normal operation. That's where the PACELC theorem comes in, extending CAP to cover the full picture.

Quiz

Cassandra lets you choose consistency level per query (ONE, QUORUM, ALL). A developer uses consistency level ONE for all reads and writes. During a network partition, what guarantee does the application have?

Strong consistency, because Cassandra handles partitions internally.

The application has no consistency guarantee -- reads may return stale data even during normal operation (not just during partitions), because consistency level ONE means R + W does not exceed N and there is no guaranteed overlap between write and read replica sets.

Eventual consistency with a guarantee that all replicas converge within seconds.

The same guarantee as using consistency level ALL, just with higher performance.

Background​

Definition​

Why it's really a two-choice problem​

What CAP actually means in practice​

Examples from this course​

CAP doesn't tell the whole story​