Skip to main content

Working with Tablets

Tablets move between servers constantly -- due to load balancing, server failures, and cluster scaling. So given a row key, how does a client find the right Tablet server? BigTable solves this with a three-level metadata hierarchy, analogous to a B+ tree.

B+ Tree

A B+ tree is an m-ary tree where each node has a variable (often large) number of children.

Think first
Given that tablets move between servers constantly, how would you design a lookup mechanism so that clients can find the right tablet server for any row key without contacting the master on every request?

Tablet location hierarchy

BigTable stores Tablet locations in a special METADATA table. Each row in this table maps a Tablet to its serving Tablet server:

METADATA:
Key: table id + end row
Data: tablet server location

The METADATA table itself can grow large, so BigTable splits it into two levels:

LevelNameContentSplitting
Level 0Meta-0 TabletOne row per Meta-1 TabletNever splits -- stored in a Chubby file
Level 1Meta-1 TabletsOne row per data TabletCan split across multiple Tablet servers
Level 2Data TabletsActual user dataSplit automatically at ~100--200 MB

Lookup flow

  1. Client reads a Chubby file that holds the location of the Meta-0 Tablet.
  2. Meta-0 Tablet points to the appropriate Meta-1 Tablet.
  3. Meta-1 Tablet reveals the location of the target data Tablet.

The tree depth is always three -- guaranteeing bounded lookup cost. The client library caches Tablet locations and prefetches nearby metadata entries to minimize round trips.

Interview angle

This three-level indirection is a classic pattern for scalable metadata lookup. Compare it to DNS (root -> TLD -> authoritative) or HDFS (NameNode -> block locations). The key insight: by caching aggressively at the client, most lookups avoid hitting the metadata hierarchy entirely. When asked "how would you locate a shard?", this is a strong answer pattern.

Assigning Tablets

Each Tablet is assigned to exactly one Tablet server at any time. The master tracks:

  • The set of live Tablet servers
  • The Tablet-to-server mapping
  • Any unassigned Tablets

Tablet server registration

When a Tablet server starts, it creates a uniquely named file in Chubby's servers directory and acquires an exclusive lock on it. This signals to the master that the server is alive and ready.

Master startup sequence

When the master restarts (via the Cluster Management System):

StepAction
1Acquire a unique master lock in Chubby (prevents multiple masters -- Split Brain protection)
2Scan Chubby's servers directory to discover live Tablet servers
3Query each live Tablet server to learn its current Tablet assignments
4Scan the METADATA table to find the full set of Tablets; assign any unassigned Tablets to servers with capacity

Monitoring Tablet servers

BigTable uses Chubby's servers directory as a service registry. Each Tablet server holds an exclusive lock on its file in this directory.

EventMaster's response
New file appears in serversA new Tablet server is available; assign Tablets to it
Lock is lost on an existing fileMaster tries to acquire the lock itself
Master acquires the lockChubby is healthy; the Tablet server has failed. Master deletes the file and reassigns its Tablets
Master cannot acquire the lockChubby itself may be having issues; master backs off

When a Tablet server loses its lock:

  1. It stops serving Tablets immediately.
  2. It attempts to reacquire the lock (handles transient network issues).
  3. If the file has been deleted (by the master), the Tablet server terminates itself.
warning

The master's lock-check mechanism is subtle. The master does not immediately assume a Tablet server is dead when its lock disappears -- it first verifies that Chubby itself is healthy by trying to acquire the lock. This prevents false positives during Chubby outages.

Load balancing

The master maintains a global view of the cluster:

  • Available Tablet servers and their current load
  • The full list of Tablets the cluster must serve

Using this information, the master periodically rebalances Tablets across servers. This is conceptually similar to how GFS rebalances chunks across ChunkServers -- a centralized coordinator with a global view makes better placement decisions than decentralized approaches.

Quiz
What would happen if BigTable used a flat metadata table (one level) instead of its three-level hierarchy for tablet location lookups?