Working with Tablets
Tablets move between servers constantly -- due to load balancing, server failures, and cluster scaling. So given a row key, how does a client find the right Tablet server? BigTable solves this with a three-level metadata hierarchy, analogous to a B+ tree.
A B+ tree is an m-ary tree where each node has a variable (often large) number of children.
Tablet location hierarchy
BigTable stores Tablet locations in a special METADATA table. Each row in this table maps a Tablet to its serving Tablet server:
METADATA:
Key: table id + end row
Data: tablet server location
The METADATA table itself can grow large, so BigTable splits it into two levels:
| Level | Name | Content | Splitting |
|---|---|---|---|
| Level 0 | Meta-0 Tablet | One row per Meta-1 Tablet | Never splits -- stored in a Chubby file |
| Level 1 | Meta-1 Tablets | One row per data Tablet | Can split across multiple Tablet servers |
| Level 2 | Data Tablets | Actual user data | Split automatically at ~100--200 MB |
Lookup flow
- Client reads a Chubby file that holds the location of the Meta-0 Tablet.
- Meta-0 Tablet points to the appropriate Meta-1 Tablet.
- Meta-1 Tablet reveals the location of the target data Tablet.
The tree depth is always three -- guaranteeing bounded lookup cost. The client library caches Tablet locations and prefetches nearby metadata entries to minimize round trips.
This three-level indirection is a classic pattern for scalable metadata lookup. Compare it to DNS (root -> TLD -> authoritative) or HDFS (NameNode -> block locations). The key insight: by caching aggressively at the client, most lookups avoid hitting the metadata hierarchy entirely. When asked "how would you locate a shard?", this is a strong answer pattern.
Assigning Tablets
Each Tablet is assigned to exactly one Tablet server at any time. The master tracks:
- The set of live Tablet servers
- The Tablet-to-server mapping
- Any unassigned Tablets
Tablet server registration
When a Tablet server starts, it creates a uniquely named file in Chubby's servers directory and acquires an exclusive lock on it. This signals to the master that the server is alive and ready.
Master startup sequence
When the master restarts (via the Cluster Management System):
| Step | Action |
|---|---|
| 1 | Acquire a unique master lock in Chubby (prevents multiple masters -- Split Brain protection) |
| 2 | Scan Chubby's servers directory to discover live Tablet servers |
| 3 | Query each live Tablet server to learn its current Tablet assignments |
| 4 | Scan the METADATA table to find the full set of Tablets; assign any unassigned Tablets to servers with capacity |
Monitoring Tablet servers
BigTable uses Chubby's servers directory as a service registry. Each Tablet server holds an exclusive lock on its file in this directory.
| Event | Master's response |
|---|---|
New file appears in servers | A new Tablet server is available; assign Tablets to it |
| Lock is lost on an existing file | Master tries to acquire the lock itself |
| Master acquires the lock | Chubby is healthy; the Tablet server has failed. Master deletes the file and reassigns its Tablets |
| Master cannot acquire the lock | Chubby itself may be having issues; master backs off |
When a Tablet server loses its lock:
- It stops serving Tablets immediately.
- It attempts to reacquire the lock (handles transient network issues).
- If the file has been deleted (by the master), the Tablet server terminates itself.
The master's lock-check mechanism is subtle. The master does not immediately assume a Tablet server is dead when its lock disappears -- it first verifies that Chubby itself is healthy by trying to acquire the lock. This prevents false positives during Chubby outages.
Load balancing
The master maintains a global view of the cluster:
- Available Tablet servers and their current load
- The full list of Tablets the cluster must serve
Using this information, the master periodically rebalances Tablets across servers. This is conceptually similar to how GFS rebalances chunks across ChunkServers -- a centralized coordinator with a global view makes better placement decisions than decentralized approaches.