Partitioning and High-level Architecture
A single BigTable instance can store petabytes of data. How does it break that data into manageable pieces and distribute them across thousands of machines? The answer is tablets -- contiguous ranges of rows that serve as the unit of distribution and load balancing.
Table partitioning
A BigTable cluster stores multiple tables, and each table is split into Tablets (typically 100--200 MB each).
| Property | Detail |
|---|---|
| Content | A contiguous range of rows, split at row boundaries |
| Initial state | Each table starts as a single Tablet |
| Splitting | Automatic when a Tablet reaches ~100--200 MB |
| Unit of | Distribution, load balancing, and fault isolation |
| Ownership | Each Tablet is assigned to exactly one Tablet server |
Because the table is sorted by row key, reads over short row ranges touch only a small number of Tablets. This makes row key design critical -- keys with high locality yield efficient range scans.
BigTable uses range-based partitioning, not consistent hashing. This means sequential row keys land on the same Tablet, enabling fast range scans -- but it also creates hotspot risk if many writes target the same key prefix. Contrast this with Consistent Hashing used by Dynamo and Cassandra, which distributes writes evenly but makes range scans expensive.
High-level architecture
A BigTable cluster has three major components:
| Component | Role |
|---|---|
| Client library | Linked into every client application; handles communication with BigTable |
| Master server (one) | Assigns Tablets to Tablet servers, manages metadata operations, handles load balancing |
| Tablet servers (many) | Serve reads and writes for their assigned Tablets |
BigTable builds on four pieces of Google infrastructure:
| Infrastructure | Purpose |
|---|---|
| GFS | Stores data files (SSTables) and commit logs durably |
| SSTable | Persistent, ordered, immutable map from keys to values -- the on-disk format |
| Chubby | Distributed lock service for master election, schema storage, Tablet server discovery |
| Cluster Scheduling System | Schedules, monitors, and manages the BigTable cluster |
The master is not on the data path. Clients never talk to the master for reads or writes -- they go directly to Tablet servers. The master handles only metadata and coordination. This separation is what prevents the single master from becoming a bottleneck, following the Leader and Follower pattern.