Skip to main content

Partitioning and High-level Architecture

A single BigTable instance can store petabytes of data. How does it break that data into manageable pieces and distribute them across thousands of machines? The answer is tablets -- contiguous ranges of rows that serve as the unit of distribution and load balancing.

Think first
You need to split a petabyte-scale table across thousands of machines. Would you use hash-based partitioning (like consistent hashing) or range-based partitioning? What are the trade-offs of each approach?

Table partitioning

A BigTable cluster stores multiple tables, and each table is split into Tablets (typically 100--200 MB each).

PropertyDetail
ContentA contiguous range of rows, split at row boundaries
Initial stateEach table starts as a single Tablet
SplittingAutomatic when a Tablet reaches ~100--200 MB
Unit ofDistribution, load balancing, and fault isolation
OwnershipEach Tablet is assigned to exactly one Tablet server

Because the table is sorted by row key, reads over short row ranges touch only a small number of Tablets. This makes row key design critical -- keys with high locality yield efficient range scans.

Interview angle

BigTable uses range-based partitioning, not consistent hashing. This means sequential row keys land on the same Tablet, enabling fast range scans -- but it also creates hotspot risk if many writes target the same key prefix. Contrast this with Consistent Hashing used by Dynamo and Cassandra, which distributes writes evenly but makes range scans expensive.

High-level architecture

A BigTable cluster has three major components:

ComponentRole
Client libraryLinked into every client application; handles communication with BigTable
Master server (one)Assigns Tablets to Tablet servers, manages metadata operations, handles load balancing
Tablet servers (many)Serve reads and writes for their assigned Tablets

BigTable builds on four pieces of Google infrastructure:

InfrastructurePurpose
GFSStores data files (SSTables) and commit logs durably
SSTablePersistent, ordered, immutable map from keys to values -- the on-disk format
ChubbyDistributed lock service for master election, schema storage, Tablet server discovery
Cluster Scheduling SystemSchedules, monitors, and manages the BigTable cluster
warning

The master is not on the data path. Clients never talk to the master for reads or writes -- they go directly to Tablet servers. The master handles only metadata and coordination. This separation is what prevents the single master from becoming a bottleneck, following the Leader and Follower pattern.

Quiz
What would happen if BigTable routed all client read and write requests through the master server instead of directly to tablet servers?