Skip to main content

Anatomy of Cassandra's Read Operation

Reads in Cassandra are more complex than writes. The data might be in the MemTable, one of many SSTables, or spread across both. Cassandra uses several layers of caching and indexing to make this efficient.

Caching layers

CacheWhat it storesTrade-off
Row cacheComplete rows for hot dataHighest read speed; highest memory cost
Key cachePartition key → SSTable offset mappingsLow memory cost; big read performance improvement
Chunk cacheUncompressed SSTable data blocksSpeeds up frequently accessed SSTable regions

Reading from MemTable

Data in the MemTable is sorted by partition key and clustering columns. A read performs a binary search on the partition key, then returns the matching row. Fast and straightforward.

Think first
A Cassandra table might have dozens of SSTables on disk. A read for a single partition key might need to check all of them. What data structure could you use to quickly determine whether an SSTable definitely does NOT contain a given key, without reading the SSTable?

Reading from SSTables

This is where it gets interesting. There can be many SSTables per table, and the data might be in any of them.

Step 1: Bloom filter check

Each SSTable has a Bloom filter that answers: "Could this key be in this SSTable?" If the Bloom filter says no, skip the SSTable entirely. This avoids expensive disk reads for SSTables that definitely don't contain the key.

Step 2: Index lookup

Each SSTable has two index structures:

  1. Partition Index file (on disk) -- sorted partition keys mapped to SSTable offsets
  2. Partition Index Summary (in memory) -- a sampled index of the partition index for faster lookups

To find key=12:

  1. Check the summary to find the relevant range → offset 32 in the partition index
  2. Jump to offset 32 in the partition index → SSTable offset 3914
  3. Jump to SSTable offset 3914 → read the data

Shortcut: Key cache hit

If the partition key is in the key cache, we skip the index lookup entirely and jump straight to the SSTable offset.

Think first
Cassandra has three caches: row cache, key cache, and chunk cache. If you had limited memory, which one cache would you keep and why?

The complete read workflow

  1. Row cache → if hit, return immediately
  2. Bloom filter → skip SSTables that definitely don't have the key
  3. Key cache → if hit, jump directly to the SSTable offset
  4. Partition index summary → partition index → SSTable → full index lookup path
  5. Merge results from MemTable + all matching SSTables
  6. Update caches with the accessed data
  7. Return the latest version to the client
Interview angle

Cassandra's read path shows how multiple optimization layers stack: Bloom filters eliminate unnecessary disk reads, the key cache skips index lookups, and the row cache bypasses everything. When designing a read-heavy system, describe this layered approach: each layer catches a different class of reads, and only cache misses at every layer hit disk.

Quiz
Your Cassandra table has 50 SSTables due to a high write rate and infrequent compaction. You notice read latency is increasing. Which optimization would have the MOST impact?