Anatomy of Cassandra's Read Operation
Reads in Cassandra are more complex than writes. The data might be in the MemTable, one of many SSTables, or spread across both. Cassandra uses several layers of caching and indexing to make this efficient.
Caching layers
| Cache | What it stores | Trade-off |
|---|---|---|
| Row cache | Complete rows for hot data | Highest read speed; highest memory cost |
| Key cache | Partition key → SSTable offset mappings | Low memory cost; big read performance improvement |
| Chunk cache | Uncompressed SSTable data blocks | Speeds up frequently accessed SSTable regions |
Reading from MemTable
Data in the MemTable is sorted by partition key and clustering columns. A read performs a binary search on the partition key, then returns the matching row. Fast and straightforward.
Reading from SSTables
This is where it gets interesting. There can be many SSTables per table, and the data might be in any of them.
Step 1: Bloom filter check
Each SSTable has a Bloom filter that answers: "Could this key be in this SSTable?" If the Bloom filter says no, skip the SSTable entirely. This avoids expensive disk reads for SSTables that definitely don't contain the key.
Step 2: Index lookup
Each SSTable has two index structures:
- Partition Index file (on disk) -- sorted partition keys mapped to SSTable offsets
- Partition Index Summary (in memory) -- a sampled index of the partition index for faster lookups
To find key=12:
- Check the summary to find the relevant range → offset 32 in the partition index
- Jump to offset 32 in the partition index → SSTable offset 3914
- Jump to SSTable offset 3914 → read the data
Shortcut: Key cache hit
If the partition key is in the key cache, we skip the index lookup entirely and jump straight to the SSTable offset.
The complete read workflow
- Row cache → if hit, return immediately
- Bloom filter → skip SSTables that definitely don't have the key
- Key cache → if hit, jump directly to the SSTable offset
- Partition index summary → partition index → SSTable → full index lookup path
- Merge results from MemTable + all matching SSTables
- Update caches with the accessed data
- Return the latest version to the client
Cassandra's read path shows how multiple optimization layers stack: Bloom filters eliminate unnecessary disk reads, the key cache skips index lookups, and the row cache bypasses everything. When designing a read-heavy system, describe this layered approach: each layer catches a different class of reads, and only cache misses at every layer hit disk.