SSTable

Every Tablet in BigTable ultimately lives on disk as a set of SSTables (Sorted String Tables). Understanding the SSTable format is essential because it determines how BigTable achieves fast reads from disk -- often with a single disk seek.

Think first

You need a file format for storing sorted key-value data on disk that supports fast lookups (ideally one disk seek) and efficient range scans. The file should never be modified after creation. How would you design it?

How Tablets are stored in GFS

BigTable persists data in GFS using the SSTable file format:

An SSTable is a persistent, ordered, immutable map from keys to values (both arbitrary byte strings).
Each Tablet is stored as a sequence of SSTables in GFS.
An SSTable consists of a sequence of data blocks (typically 64 KB each).

Reading from an SSTable

A block index at the end of the SSTable maps keys to block offsets. When the SSTable is opened, this index loads into memory.

Step	Operation
1	Binary search the in-memory index to find the target block
2	Single disk seek to read that block

Two loading strategies exist:

Strategy	Trade-off
Load entire SSTable into memory	No disk seeks for lookups, but higher memory usage
Load only the index	One disk seek per lookup, lower memory footprint

SSTables expose two operations:

Get: retrieve the value for a given key
Iterate: scan values over a key range

Immutability

Once written to GFS, an SSTable is read-only. New data produces a new SSTable; old SSTables are garbage-collected when no longer needed. This immutability provides:

No synchronization needed for concurrent reads
Simpler Tablet splitting -- SSTables can be shared across Tablets
Clean garbage collection -- the GC handles permanent removal of deleted or stale data

Interview angle

SSTable immutability is the foundation of the Log-Structured Merge Tree (LSM) pattern used by BigTable, HBase, Cassandra, RocksDB, and LevelDB. In an interview, explain the key insight: writes are always sequential (fast), while reads may need to merge across multiple SSTables (mitigated by Bloom Filters and compaction).

Table vs. Tablet vs. SSTable

Concept	Description
Table	The logical container; composed of multiple Tablets
Tablet	A contiguous row range; the unit of distribution
SSTable	An immutable on-disk file; the unit of storage

Key relationships:

Multiple Tablets make up a Table.
SSTables can be shared by multiple Tablets.
Tablets do not overlap; SSTables can overlap.

MemTable and the write path

To avoid writing a new SSTable for every mutation, BigTable buffers recent writes in a MemTable -- an in-memory, mutable, sorted buffer:

New writes go into the MemTable.
When the MemTable reaches a size threshold, it is frozen and flushed to GFS as a new SSTable.
A fresh MemTable is created for subsequent writes.

Every mutation is also written to a commit log on GFS -- a Write-Ahead Log that contains redo records for recovery if a Tablet server crashes before flushing the MemTable.

For reads, BigTable merges data from both the MemTable and SSTables. Since both are sorted, the merge is efficient.

warning

The MemTable is the only mutable data structure in the storage path. If a Tablet server crashes before flushing, the MemTable contents are lost -- the commit log is the sole recovery mechanism. This is why the Write-Ahead Log pattern is non-negotiable in LSM-based systems.

Quiz

What would happen if BigTable made SSTables mutable -- allowing in-place updates instead of always creating new SSTables?

Write performance would improve because you wouldn't need compaction.

The system would need complex locking for concurrent reads and writes, tablet splitting would become much harder since SSTables could no longer be shared, and garbage collection would require tracking individual cell references instead of whole files.

Read performance would improve because you'd always find the latest version in one file.

There would be no significant impact since SSTables are just a storage detail.

How Tablets are stored in GFS​

Reading from an SSTable​

Immutability​

Table vs. Tablet vs. SSTable​

MemTable and the write path​

How Tablets are stored in GFS

Reading from an SSTable

Immutability

Table vs. Tablet vs. SSTable

MemTable and the write path