How to Choose: A Decision Framework

The problem behind the problem

In a system design interview, the hardest part is not remembering how Dynamo works or how Kafka partitions data. The hardest part is hearing a set of requirements and knowing which system's architecture to reach for. This page gives you a framework for that decision.

Every distributed system in this course makes a set of fundamental tradeoffs. Understanding those tradeoffs -- not memorizing architectures -- is what lets you design new systems from first principles.

The decision flowchart

Start with the most fundamental question: what kind of data are you storing, and how will it be accessed?

The comparison table

Use this table to compare systems side by side once you have narrowed the candidates.

Dimension	Dynamo	Cassandra	BigTable	Kafka	GFS/HDFS	Chubby/ZK
Primary use case	Key-value store	Wide-column store	Wide-column store	Event streaming	File storage	Coordination
CAP choice	AP	AP (tunable)	CP	CP (per partition)	CP	CP
Consistency	Eventual	Tunable	Strong	Strong (per partition)	Strong (metadata)	Strong
Data model	Key -> blob	Row key -> column families	Row key -> column families	Topic -> partitions -> messages	File -> chunks	Path -> small data
Write pattern	Random key-value	Random by row key	Random by row key	Append-only log	Append-only file	Rare writes
Read pattern	Point lookup by key	Row key + range scan	Row key + range scan	Sequential scan by offset	Sequential file read	Point read by path
Architecture	Decentralized ring	Decentralized ring	Master + tablet servers	Controller + brokers	Master + chunk servers	Leader + replicas
Single point of failure?	No	No	Master (mitigated)	Controller (mitigated)	Master (mitigated)	Leader (mitigated)
Typical data size	KBs per value	KBs-MBs per row	KBs-MBs per row	Bytes-KBs per message	MBs-GBs per file	Bytes-KBs per node
Scale	Millions of keys	Billions of rows	Billions of rows	Millions of msgs/sec	Petabytes of files	Small metadata

Three fundamental tradeoffs

Every system design interview comes down to understanding these tradeoffs.

1. Consistency vs. availability (CAP/PACELC)

This is the first fork in the road. Ask yourself: if a network partition happens, should the system keep serving (possibly stale) responses, or should it refuse to serve until the partition heals?

Choose AP (Dynamo, Cassandra) when: losing a write or showing stale data is costly but not dangerous. Shopping carts, social media feeds, user preferences.
Choose CP (Chubby, ZooKeeper, BigTable) when: inconsistency causes correctness violations. Locks, leader election, financial transactions, metadata coordination.
Choose tunable (Cassandra) when: different operations in the same system have different consistency needs.

Deep dive: CAP Theorem | PACELC Theorem

2. Centralized vs. decentralized architecture

Centralized (single master): GFS, HDFS, BigTable, Chubby. Simpler to reason about, strong consistency is easier, but the master is a potential bottleneck and single point of failure.
Decentralized (peer-to-peer): Dynamo, Cassandra. No single point of failure, but consistency is harder and conflict resolution is the application's problem.

The pattern: systems that need strong consistency tend toward centralized architectures (it is easier to be consistent when one node is the authority). Systems that need maximum availability tend toward decentralized architectures.

3. Storage engine: log-structured vs. page-oriented

Log-structured (LSM tree): Cassandra, BigTable, Kafka. Writes are sequential appends -- extremely fast. Reads may need to merge data from multiple levels. Best for write-heavy workloads.
Page-oriented (B-tree): Traditional databases. Writes require random I/O to update pages in place. Reads are fast (single lookup). Best for read-heavy workloads.

The systems in this course overwhelmingly use log-structured storage because they are designed for write-heavy, large-scale workloads.

Pattern-to-system mapping

When you identify that a problem needs a specific pattern, this table tells you which systems use it.

Pattern	Dynamo	Cassandra	BigTable	Kafka	GFS/HDFS	Chubby/ZK
Consistent Hashing	Yes	Yes	--	--	--	--
Quorum	Yes	Yes	--	--	--	--
Vector Clocks	Yes	--	--	--	--	--
Gossip Protocol	Yes	Yes	--	--	--	--
Write-Ahead Log	--	Yes	Yes	Yes	Yes	Yes
Bloom Filters	--	Yes	Yes	--	--	--
Segmented Log	--	--	--	Yes	--	--
High-Water Mark	--	--	--	Yes	--	--
Leader and Follower	--	--	Yes	Yes	Yes	Yes
Heartbeat	--	--	Yes	Yes	Yes	Yes
Lease	--	--	--	--	Yes	Yes
Fencing	--	--	--	--	Yes	Yes
Split Brain	--	--	--	Yes	Yes	Yes
Hinted Handoff	Yes	Yes	--	--	--	--
Merkle Trees	Yes	Yes	--	--	--	--
Read Repair	Yes	Yes	--	--	--	--

How to use this in an interview

When you hear a design problem in an interview, follow this process:

Identify the data model: Is it key-value? Wide-column? A stream of events? Files? Small metadata?
Identify the consistency requirement: Can you tolerate stale reads? Can you tolerate lost writes? Is exactly-once critical?
Identify the access pattern: Point lookups? Range scans? Sequential reads? Append-only writes?
Identify the scale: How much data? How many operations per second? How many machines?
Map to a reference architecture: Use the flowchart and table above to find the closest match.
Adapt: No real problem maps perfectly to one system. Combine patterns from multiple systems. Explain your tradeoffs.

Think first

You are asked to design a real-time leaderboard for an online game with 10 million concurrent players. Scores update every second. Users query 'what is my rank?' and 'who are the top 100?' Which system(s) from this course would you draw from, and which patterns would you combine?

Design Challenge

Synthesis: Design a social media news feed

Design the backend for a social media news feed. Users post content (text, images). Each user follows hundreds of other users. The feed shows the latest posts from followed users, ranked by relevance. The system serves 500 million users with 50,000 posts per second and 500,000 feed reads per second.

Hints (0/5)

The problem behind the problem​

The decision flowchart​

The comparison table​

Three fundamental tradeoffs​

1. Consistency vs. availability (CAP/PACELC)​

2. Centralized vs. decentralized architecture​

3. Storage engine: log-structured vs. page-oriented​

Pattern-to-system mapping​

How to use this in an interview​