Introduction: System Design Patterns

If Part 1 of this course is about studying specific houses, Part 2 is about understanding the bricks, beams, and blueprints they're all built from.

Every distributed system -- no matter how different it looks on the surface -- is assembled from a surprisingly small set of recurring techniques. Dynamo uses consistent hashing for data distribution. So does Cassandra. Kafka uses write-ahead logs for durability. So does GFS. BigTable uses Bloom filters to avoid unnecessary disk reads. So does Cassandra.

These techniques keep showing up because distributed systems keep facing the same fundamental problems:

How do you spread data across machines? (Consistent Hashing)
How do you agree on the state of the world? (Quorum, Leader and Follower)
How do you survive crashes? (Write-ahead Log, Checksum)
How do you detect failures? (Heartbeat, Gossip Protocol, Phi Accrual)
How do you handle conflicts? (Vector Clocks, Read Repair, Merkle Trees)
How do you reason about trade-offs? (CAP Theorem, PACELC)

We call these System Design Patterns -- reusable solutions to the problems that come up in every distributed system. Knowing them gives you a vocabulary for system design conversations and a toolkit for interviews.

The patterns

#	Pattern	One-line summary
1	Bloom Filters	Probabilistically check set membership without reading data
2	Consistent Hashing	Distribute data so adding/removing nodes moves minimal keys
3	Quorum	Require a minimum number of nodes to agree on reads/writes
4	Leader and Follower	Elect one node to coordinate, others replicate
5	Write-ahead Log	Log every change before applying it, so you can recover
6	Segmented Log	Split logs into segments to manage size and enable cleanup
7	High-Water Mark	Track how far replication has progressed
8	Lease	Grant time-limited ownership to prevent conflicts
9	Heartbeat	Periodic signals to prove a node is alive
10	Gossip Protocol	Spread information by telling random neighbors
11	Phi Accrual Failure Detection	Probabilistic failure detection that adapts to network conditions
12	Split-brain	When a network partition makes two nodes think they're both leader
13	Fencing	Prevent stale leaders from corrupting data
14	Checksum	Detect data corruption with hash verification
15	Vector Clocks	Track causal ordering across distributed writes
16	CAP Theorem	You can't have consistency, availability, and partition tolerance all at once
17	PACELC Theorem	Even without partitions, there's a latency vs. consistency trade-off
18	Hinted Handoff	Temporarily store writes meant for a downed node
19	Read Repair	Fix stale replicas when you detect inconsistency during reads
20	Merkle Trees	Efficiently detect which data differs between replicas

How patterns connect to systems

Each pattern page includes Examples showing which real systems use it. Here's a preview of how densely connected these patterns are:

Pattern	Dynamo	Cassandra	BigTable	Kafka	GFS	HDFS	Chubby
Consistent Hashing	✔	✔
Quorum	✔	✔
Write-ahead Log			✔	✔	✔	✔
Leader and Follower			✔	✔	✔	✔	✔
Gossip Protocol	✔	✔
Vector Clocks	✔
Bloom Filters		✔	✔
Heartbeat	✔	✔	✔	✔	✔	✔	✔
Checksum	✔	✔	✔	✔	✔	✔
Lease					✔	✔	✔

Notice how the same patterns appear in system after system. That's the whole point -- learn the pattern once, recognize it everywhere.

Quick review: do you know these patterns?

Flashcards

1 / 10

How do you distribute data across nodes so that adding/removing a node moves minimal keys?

Space/Enter to flip, arrows to navigate

The patterns​

How patterns connect to systems​

Quick review: do you know these patterns?​

The patterns

How patterns connect to systems

Quick review: do you know these patterns?