Partitioning and High-level Architecture

A single BigTable instance can store petabytes of data. How does it break that data into manageable pieces and distribute them across thousands of machines? The answer is tablets -- contiguous ranges of rows that serve as the unit of distribution and load balancing.

Think first

You need to split a petabyte-scale table across thousands of machines. Would you use hash-based partitioning (like consistent hashing) or range-based partitioning? What are the trade-offs of each approach?

Table partitioning

A BigTable cluster stores multiple tables, and each table is split into Tablets (typically 100--200 MB each).

Property	Detail
Content	A contiguous range of rows, split at row boundaries
Initial state	Each table starts as a single Tablet
Splitting	Automatic when a Tablet reaches ~100--200 MB
Unit of	Distribution, load balancing, and fault isolation
Ownership	Each Tablet is assigned to exactly one Tablet server

Because the table is sorted by row key, reads over short row ranges touch only a small number of Tablets. This makes row key design critical -- keys with high locality yield efficient range scans.

Interview angle

BigTable uses range-based partitioning, not consistent hashing. This means sequential row keys land on the same Tablet, enabling fast range scans -- but it also creates hotspot risk if many writes target the same key prefix. Contrast this with Consistent Hashing used by Dynamo and Cassandra, which distributes writes evenly but makes range scans expensive.

High-level architecture

A BigTable cluster has three major components:

Component	Role
Client library	Linked into every client application; handles communication with BigTable
Master server (one)	Assigns Tablets to Tablet servers, manages metadata operations, handles load balancing
Tablet servers (many)	Serve reads and writes for their assigned Tablets

BigTable builds on four pieces of Google infrastructure:

Infrastructure	Purpose
GFS	Stores data files (SSTables) and commit logs durably
SSTable	Persistent, ordered, immutable map from keys to values -- the on-disk format
Chubby	Distributed lock service for master election, schema storage, Tablet server discovery
Cluster Scheduling System	Schedules, monitors, and manages the BigTable cluster

warning

The master is not on the data path. Clients never talk to the master for reads or writes -- they go directly to Tablet servers. The master handles only metadata and coordination. This separation is what prevents the single master from becoming a bottleneck, following the Leader and Follower pattern.

Quiz

What would happen if BigTable routed all client read and write requests through the master server instead of directly to tablet servers?

Consistency would improve because the master could order all operations globally.

The master would become a throughput bottleneck and single point of failure for all data operations, limiting the system's ability to scale horizontally regardless of how many tablet servers are added.

Performance would improve because the master has a global view of data placement.

There would be no difference because the master is already involved in every operation.

Table partitioning​

High-level architecture​

Table partitioning

High-level architecture