Compaction
SSTables are immutable -- great for write performance, but it means updates and deletes create new entries rather than modifying existing ones. Over time, you accumulate many SSTables with redundant and obsolete data. Reads slow down because they must check all of them.
Compaction is the process of merging multiple SSTables into fewer, cleaner ones.
What compaction does
During compaction:
- Multiple SSTables are merged into a single new SSTable
- Keys are deduplicated -- only the latest version is kept
- Tombstones (delete markers) that have expired are removed
- A new index is created over the merged data
- The old SSTables are deleted
Benefits:
- Fewer SSTables to scan during reads → faster reads
- Obsolete data removed → disk space reclaimed
- Fresher Bloom filters → more accurate filtering
Compaction strategies
| Strategy | Best for | How it works |
|---|---|---|
| Size-Tiered (STCS) | Write-heavy, general workloads | Triggers when multiple SSTables of similar size exist. Groups and merges them. Default strategy. |
| Leveled (LCS) | Read-heavy workloads | Organizes SSTables into levels, each 10x larger than the previous. Guarantees at most one SSTable per partition per level → predictable read performance. |
| Time-Window (TWCS) | Time-series data | Groups SSTables by time window. Compacts within windows. Ideal for data that's immutable after a time period and can be bulk-deleted by dropping entire windows. |
Why writes are sequential
Every operation in Cassandra's write path -- commit log append, MemTable insert, SSTable flush -- is sequential I/O. No random seeks. No reads. This is the primary reason writes are so fast.
Compaction is the deferred cost: it reorganizes data in the background using sequential I/O. You pay for the reorganization eventually, but you never pay for it during the initial write. This amortization is why Cassandra can sustain hundreds of thousands of writes per second.
When asked "How does Cassandra achieve such high write throughput?", the answer is: all writes are sequential appends (commit log + SSTable flush), and compaction amortizes the reorganization cost in the background. The trade-off: reads are more complex (must merge across MemTable + multiple SSTables) and compaction consumes background I/O.