Anatomy of Cassandra's Read Operation

Reads in Cassandra are more complex than writes. The data might be in the MemTable, one of many SSTables, or spread across both. Cassandra uses several layers of caching and indexing to make this efficient.

Caching layers

Cache	What it stores	Trade-off
Row cache	Complete rows for hot data	Highest read speed; highest memory cost
Key cache	Partition key → SSTable offset mappings	Low memory cost; big read performance improvement
Chunk cache	Uncompressed SSTable data blocks	Speeds up frequently accessed SSTable regions

Reading from MemTable

Data in the MemTable is sorted by partition key and clustering columns. A read performs a binary search on the partition key, then returns the matching row. Fast and straightforward.

Think first

A Cassandra table might have dozens of SSTables on disk. A read for a single partition key might need to check all of them. What data structure could you use to quickly determine whether an SSTable definitely does NOT contain a given key, without reading the SSTable?

Reading from SSTables

This is where it gets interesting. There can be many SSTables per table, and the data might be in any of them.

Step 1: Bloom filter check

Each SSTable has a Bloom filter that answers: "Could this key be in this SSTable?" If the Bloom filter says no, skip the SSTable entirely. This avoids expensive disk reads for SSTables that definitely don't contain the key.

Step 2: Index lookup

Each SSTable has two index structures:

Partition Index file (on disk) -- sorted partition keys mapped to SSTable offsets
Partition Index Summary (in memory) -- a sampled index of the partition index for faster lookups

To find key=12:

Check the summary to find the relevant range → offset 32 in the partition index
Jump to offset 32 in the partition index → SSTable offset 3914
Jump to SSTable offset 3914 → read the data

Shortcut: Key cache hit

If the partition key is in the key cache, we skip the index lookup entirely and jump straight to the SSTable offset.

Think first

Cassandra has three caches: row cache, key cache, and chunk cache. If you had limited memory, which one cache would you keep and why?

The complete read workflow

Row cache → if hit, return immediately
Bloom filter → skip SSTables that definitely don't have the key
Key cache → if hit, jump directly to the SSTable offset
Partition index summary → partition index → SSTable → full index lookup path
Merge results from MemTable + all matching SSTables
Update caches with the accessed data
Return the latest version to the client

Interview angle

Cassandra's read path shows how multiple optimization layers stack: Bloom filters eliminate unnecessary disk reads, the key cache skips index lookups, and the row cache bypasses everything. When designing a read-heavy system, describe this layered approach: each layer catches a different class of reads, and only cache misses at every layer hit disk.

Quiz

Your Cassandra table has 50 SSTables due to a high write rate and infrequent compaction. You notice read latency is increasing. Which optimization would have the MOST impact?

Increase the row cache size to store more complete rows in memory.

Trigger compaction to reduce the number of SSTables that reads must merge.

Add more nodes to the cluster to distribute the read load.

Switch from QUORUM to ONE consistency level for reads.

Caching layers​

Reading from MemTable​

Reading from SSTables​

Step 1: Bloom filter check​

Step 2: Index lookup​

Shortcut: Key cache hit​

The complete read workflow​