Skip to main content

The Life of BigTables Read & Write Operations

What happens, step by step, when a client writes a row to BigTable? What about when it reads one? The write path prioritizes durability and speed (append-only log + in-memory buffer), while the read path prioritizes freshness (merge across MemTable and SSTables). Both paths bypass the master entirely.

Think first
When a tablet server receives a write, it must both persist the data durably and make it available for reads. In what order should it write to the commit log, update the in-memory buffer, and acknowledge the client -- and why does the order matter?

Write request

When a Tablet server receives a write:

StepAction
1Validate the request is well-formed
2Authorize the sender via ACLs stored in Chubby
3Append the mutation to the commit log in GFS (Write-Ahead Log pattern)
4Insert the mutation into the in-memory MemTable
5Acknowledge success to the client
6Periodically flush MemTables to SSTables; merge SSTables during compaction
Interview angle

Notice the order: commit log first, then MemTable, then acknowledge. The client gets an ACK only after the mutation is durable on GFS. This guarantees that no acknowledged write is lost, even if the Tablet server crashes immediately after responding. This is the textbook Write-Ahead Log pattern -- a must-know for any storage system interview.

Read request

When a Tablet server receives a read:

StepAction
1Validate the request and authorize the sender
2Check the cache (Scan Cache and Block Cache) for a hit
3Read the MemTable for the most recent mutations
4Consult SSTable indexes (loaded in memory) to identify relevant SSTables
5Merge rows from MemTable and SSTables to produce the final result

Since both the MemTable and SSTables are sorted by key, the merge operation is efficient -- it behaves like a merge step in merge-sort.

warning

A read may need to touch every SSTable that makes up a Tablet if the key exists in multiple files (due to updates and compaction lag). This is the main cost of LSM-based storage. BigTable mitigates this with Bloom Filters (skip SSTables that definitely don't contain the key), caching (avoid repeated disk reads), and compaction (reduce the number of SSTables).

Quiz
What would happen if BigTable acknowledged writes to the client BEFORE appending to the commit log (reversing steps 3 and 5 in the write path)?