Skip to main content

Summary: Cassandra

The big picture

Cassandra is a hybrid: Dynamo's architecture powering BigTable's data model. It takes the peer-to-peer, leaderless replication strategy from Dynamo and combines it with the column-family storage model and SSTable-based storage engine from BigTable. The result is a system that offers the best of both worlds -- decentralized scalability with rich, structured data access.

What makes Cassandra distinctive is its tunable consistency. Unlike Dynamo (which defaults to eventual consistency) or BigTable (which enforces strong consistency), Cassandra lets you choose per-query where you land on the consistency-availability spectrum. This flexibility is why it's adopted so widely -- different parts of the same application can make different trade-offs.

How Cassandra uses system design patterns

ProblemPatternHow Cassandra uses it
Distributing data across nodesConsistent HashingRing topology with virtual nodes for even distribution
Ensuring write durabilityWrite-ahead LogEvery write goes to the commit log before the memtable
Managing log sizeSegmented LogCommit log is split into segments, truncated after flush to SSTables
Tuning read/write consistencyQuorumConfigurable R, W, and N per query
Spreading cluster stateGossip ProtocolEvery second, nodes gossip about membership, load, and schema
Detecting node failuresPhi Accrual Failure DetectionAdaptive detection that learns from network conditions
Handling temporary failuresHinted HandoffHealthy nodes store writes for downed nodes
Repairing stale replicasRead RepairStale replicas updated during read operations
Avoiding unnecessary disk readsBloom FiltersEach SSTable has a Bloom filter to skip non-matching lookups
Distinguishing pre/post restart stateSplit-brain (Generation clock)Generation number incremented on restart, included in gossip

Cassandra's DNA: what it took from each parent

ComponentFrom DynamoFrom BigTable
ArchitecturePeer-to-peer, no leader
PartitioningConsistent hashing + vnodes
ReplicationQuorum-based
Failure detectionGossip protocol
Hinted handoff
Data modelColumn families, sparse rows
Storage engineMemTable → SSTable flush
On-disk formatSSTables with Bloom filters
CompactionMerge SSTables to reclaim space
What Cassandra dropped

Cassandra uses last-write-wins instead of vector clocks. This means concurrent writes to the same key silently discard the "loser" based on timestamps. Simpler API, but silent data loss is possible. For Cassandra's typical workloads (time-series, event logs), this is acceptable.

Quick reference card

PropertyValue
TypeWide-column NoSQL database
CAP classificationAP (tunable toward CP)
Consistency modelTunable -- per-query consistency levels
Data modelRow key → column families → columns (sparse)
PartitioningConsistent hashing with virtual nodes
ReplicationConfigurable replication factor and consistency level
Conflict resolutionLast-write-wins (timestamp-based)
Failure detectionGossip + Phi Accrual Failure Detector
Storage engineMemTable → SSTable (log-structured merge)
Open sourceYes (Apache)
Design Challenge

Design a time-series metrics store for IoT

You need to design a time-series metrics store for an IoT platform with 10,000 sensors, each reporting every second -- producing 1 million writes per second. Queries retrieve metrics by sensor ID and time range. The system must tolerate full data center failures without downtime.
Hints (0/4)

References and further reading