Summary: BigTable
The big picture
BigTable is Google's answer to a specific problem: how do you store petabytes of structured data and serve reads in milliseconds, across thousands of machines? The answer: build a wide-column store on top of two existing infrastructure layers -- GFS for durable storage and Chubby for coordination.
What makes BigTable instructive is its layered architecture. It doesn't reinvent file storage or distributed consensus -- it delegates those problems to GFS and Chubby, respectively, and focuses on what it does uniquely well: providing a structured data model with fast random reads over massive datasets. This is a powerful design principle: build on existing infrastructure rather than building everything from scratch.
Architecture at a glance
| Component | Role | Depends on |
|---|---|---|
| Master | Assigns tablets to tablet servers, monitors load, handles schema changes | Chubby (for master election) |
| Tablet servers | Serve reads/writes for their assigned tablets | GFS (for SSTable storage) |
| Chubby | Master election, schema storage, tablet server discovery, access control | Paxos (internal) |
| GFS | Stores SSTables and commit logs durably | ChunkServers |
The data path
Write path:
- Write goes to the tablet server's commit log (write-ahead log on GFS)
- Data is inserted into an in-memory MemTable (sorted by key)
- When the MemTable reaches a size threshold, it's flushed to GFS as an immutable SSTable
Read path:
- Check the MemTable first (most recent data)
- Check Bloom filters on SSTables to skip files that definitely don't contain the key
- Read matching SSTables from GFS (or from cache)
- Merge results across all sources
How BigTable uses system design patterns
| Problem | Pattern | How BigTable uses it |
|---|---|---|
| Surviving tablet server crashes | Write-ahead Log | Commit log stored on GFS; replayed during recovery |
| Monitoring tablet servers | Heartbeat | Master monitors tablet server health via Chubby sessions |
| Coordinating the cluster | Leader and Follower | Single master assigns and balances tablets across tablet servers |
| Avoiding unnecessary disk reads | Bloom Filters | Per-SSTable Bloom filters skip files that don't contain the target row |
| Verifying data integrity | Checksum | SSTable blocks are checksummed to detect corruption |
| Master election and discovery | Lease (via Chubby) | Chubby sessions with time-bound leases for tablet server registration |
BigTable vs. Cassandra: same model, opposite architectures
| Dimension | BigTable | Cassandra |
|---|---|---|
| Data model | Wide-column (column families) | Wide-column (column families) |
| Architecture | Single master (centralized) | Peer-to-peer (decentralized) |
| Consistency | Strong (CP) | Tunable (AP by default) |
| Partitioning | Range-based (tablets) | Consistent hashing (vnodes) |
| Coordination | Chubby (Paxos) | Gossip protocol |
| Conflict resolution | N/A (strong consistency) | Last-write-wins |
This comparison is gold for interviews. If asked "How would you design a wide-column store?", you can present both approaches and discuss the trade-offs: BigTable's master simplifies consistency but creates a potential bottleneck. Cassandra's peer-to-peer design scales better but makes consistency harder. Neither is "better" -- they optimize for different requirements.
Quick reference card
| Property | Value |
|---|---|
| Type | Wide-column NoSQL store |
| CAP classification | CP -- strongly consistent |
| Data model | (row key, column family:qualifier, timestamp) → value |
| Partitioning | Range-based tablet splitting |
| Storage engine | MemTable → SSTable (log-structured merge tree) |
| Underlying storage | GFS (SSTables stored as GFS files) |
| Coordination | Chubby (master election, schema, discovery) |
| Atomicity | Per-row (no cross-row transactions) |
| Open source | No (HBase is the open-source equivalent) |
Design a web crawler's URL database
References and further reading
- BigTable paper -- the original 2006 paper
- SSTable and LSM Tree internals
- Apache HBase -- open-source BigTable clone
- Dynamo paper -- the other major influence on Cassandra