The Life of BigTables Read & Write Operations
What happens, step by step, when a client writes a row to BigTable? What about when it reads one? The write path prioritizes durability and speed (append-only log + in-memory buffer), while the read path prioritizes freshness (merge across MemTable and SSTables). Both paths bypass the master entirely.
Write request
When a Tablet server receives a write:
| Step | Action |
|---|---|
| 1 | Validate the request is well-formed |
| 2 | Authorize the sender via ACLs stored in Chubby |
| 3 | Append the mutation to the commit log in GFS (Write-Ahead Log pattern) |
| 4 | Insert the mutation into the in-memory MemTable |
| 5 | Acknowledge success to the client |
| 6 | Periodically flush MemTables to SSTables; merge SSTables during compaction |
Notice the order: commit log first, then MemTable, then acknowledge. The client gets an ACK only after the mutation is durable on GFS. This guarantees that no acknowledged write is lost, even if the Tablet server crashes immediately after responding. This is the textbook Write-Ahead Log pattern -- a must-know for any storage system interview.
Read request
When a Tablet server receives a read:
| Step | Action |
|---|---|
| 1 | Validate the request and authorize the sender |
| 2 | Check the cache (Scan Cache and Block Cache) for a hit |
| 3 | Read the MemTable for the most recent mutations |
| 4 | Consult SSTable indexes (loaded in memory) to identify relevant SSTables |
| 5 | Merge rows from MemTable and SSTables to produce the final result |
Since both the MemTable and SSTables are sorted by key, the merge operation is efficient -- it behaves like a merge step in merge-sort.
A read may need to touch every SSTable that makes up a Tablet if the key exists in multiple files (due to updates and compaction lag). This is the main cost of LSM-based storage. BigTable mitigates this with Bloom Filters (skip SSTables that definitely don't contain the key), caching (avoid repeated disk reads), and compaction (reduce the number of SSTables).