Summary: Cassandra
The big picture
Cassandra is a hybrid: Dynamo's architecture powering BigTable's data model. It takes the peer-to-peer, leaderless replication strategy from Dynamo and combines it with the column-family storage model and SSTable-based storage engine from BigTable. The result is a system that offers the best of both worlds -- decentralized scalability with rich, structured data access.
What makes Cassandra distinctive is its tunable consistency. Unlike Dynamo (which defaults to eventual consistency) or BigTable (which enforces strong consistency), Cassandra lets you choose per-query where you land on the consistency-availability spectrum. This flexibility is why it's adopted so widely -- different parts of the same application can make different trade-offs.
How Cassandra uses system design patterns
| Problem | Pattern | How Cassandra uses it |
|---|---|---|
| Distributing data across nodes | Consistent Hashing | Ring topology with virtual nodes for even distribution |
| Ensuring write durability | Write-ahead Log | Every write goes to the commit log before the memtable |
| Managing log size | Segmented Log | Commit log is split into segments, truncated after flush to SSTables |
| Tuning read/write consistency | Quorum | Configurable R, W, and N per query |
| Spreading cluster state | Gossip Protocol | Every second, nodes gossip about membership, load, and schema |
| Detecting node failures | Phi Accrual Failure Detection | Adaptive detection that learns from network conditions |
| Handling temporary failures | Hinted Handoff | Healthy nodes store writes for downed nodes |
| Repairing stale replicas | Read Repair | Stale replicas updated during read operations |
| Avoiding unnecessary disk reads | Bloom Filters | Each SSTable has a Bloom filter to skip non-matching lookups |
| Distinguishing pre/post restart state | Split-brain (Generation clock) | Generation number incremented on restart, included in gossip |
Cassandra's DNA: what it took from each parent
| Component | From Dynamo | From BigTable |
|---|---|---|
| Architecture | Peer-to-peer, no leader | |
| Partitioning | Consistent hashing + vnodes | |
| Replication | Quorum-based | |
| Failure detection | Gossip protocol | |
| Hinted handoff | ✔ | |
| Data model | Column families, sparse rows | |
| Storage engine | MemTable → SSTable flush | |
| On-disk format | SSTables with Bloom filters | |
| Compaction | Merge SSTables to reclaim space |
Cassandra uses last-write-wins instead of vector clocks. This means concurrent writes to the same key silently discard the "loser" based on timestamps. Simpler API, but silent data loss is possible. For Cassandra's typical workloads (time-series, event logs), this is acceptable.
Quick reference card
| Property | Value |
|---|---|
| Type | Wide-column NoSQL database |
| CAP classification | AP (tunable toward CP) |
| Consistency model | Tunable -- per-query consistency levels |
| Data model | Row key → column families → columns (sparse) |
| Partitioning | Consistent hashing with virtual nodes |
| Replication | Configurable replication factor and consistency level |
| Conflict resolution | Last-write-wins (timestamp-based) |
| Failure detection | Gossip + Phi Accrual Failure Detector |
| Storage engine | MemTable → SSTable (log-structured merge) |
| Open source | Yes (Apache) |