Skip to main content

Summary: BigTable

The big picture

BigTable is Google's answer to a specific problem: how do you store petabytes of structured data and serve reads in milliseconds, across thousands of machines? The answer: build a wide-column store on top of two existing infrastructure layers -- GFS for durable storage and Chubby for coordination.

What makes BigTable instructive is its layered architecture. It doesn't reinvent file storage or distributed consensus -- it delegates those problems to GFS and Chubby, respectively, and focuses on what it does uniquely well: providing a structured data model with fast random reads over massive datasets. This is a powerful design principle: build on existing infrastructure rather than building everything from scratch.

Architecture at a glance

ComponentRoleDepends on
MasterAssigns tablets to tablet servers, monitors load, handles schema changesChubby (for master election)
Tablet serversServe reads/writes for their assigned tabletsGFS (for SSTable storage)
ChubbyMaster election, schema storage, tablet server discovery, access controlPaxos (internal)
GFSStores SSTables and commit logs durablyChunkServers

The data path

Write path:

  1. Write goes to the tablet server's commit log (write-ahead log on GFS)
  2. Data is inserted into an in-memory MemTable (sorted by key)
  3. When the MemTable reaches a size threshold, it's flushed to GFS as an immutable SSTable

Read path:

  1. Check the MemTable first (most recent data)
  2. Check Bloom filters on SSTables to skip files that definitely don't contain the key
  3. Read matching SSTables from GFS (or from cache)
  4. Merge results across all sources

How BigTable uses system design patterns

ProblemPatternHow BigTable uses it
Surviving tablet server crashesWrite-ahead LogCommit log stored on GFS; replayed during recovery
Monitoring tablet serversHeartbeatMaster monitors tablet server health via Chubby sessions
Coordinating the clusterLeader and FollowerSingle master assigns and balances tablets across tablet servers
Avoiding unnecessary disk readsBloom FiltersPer-SSTable Bloom filters skip files that don't contain the target row
Verifying data integrityChecksumSSTable blocks are checksummed to detect corruption
Master election and discoveryLease (via Chubby)Chubby sessions with time-bound leases for tablet server registration

BigTable vs. Cassandra: same model, opposite architectures

DimensionBigTableCassandra
Data modelWide-column (column families)Wide-column (column families)
ArchitectureSingle master (centralized)Peer-to-peer (decentralized)
ConsistencyStrong (CP)Tunable (AP by default)
PartitioningRange-based (tablets)Consistent hashing (vnodes)
CoordinationChubby (Paxos)Gossip protocol
Conflict resolutionN/A (strong consistency)Last-write-wins
Interview insight

This comparison is gold for interviews. If asked "How would you design a wide-column store?", you can present both approaches and discuss the trade-offs: BigTable's master simplifies consistency but creates a potential bottleneck. Cassandra's peer-to-peer design scales better but makes consistency harder. Neither is "better" -- they optimize for different requirements.

Quick reference card

PropertyValue
TypeWide-column NoSQL store
CAP classificationCP -- strongly consistent
Data model(row key, column family:qualifier, timestamp) → value
PartitioningRange-based tablet splitting
Storage engineMemTable → SSTable (log-structured merge tree)
Underlying storageGFS (SSTables stored as GFS files)
CoordinationChubby (master election, schema, discovery)
AtomicityPer-row (no cross-row transactions)
Open sourceNo (HBase is the open-source equivalent)
Design Challenge

Design a web crawler's URL database

You need to design a database for a web crawler that stores billions of URLs. For each URL, you need to store the page content, metadata (title, language, status code), and a history of past crawls. The system must support fast lookups by URL for deduplication and batch analytics for computing page rank.
Hints (0/4)

References and further reading