Skip to main content

BigTable Characteristics

What properties make BigTable suitable for Google-scale workloads, and how does it compare to other distributed databases? Understanding these characteristics helps you decide when BigTable (or an HBase/Cassandra alternative) is the right tool -- and when it isn't.

Think first
BigTable uses centralized coordination (single master) while Dynamo uses decentralized coordination (peer-to-peer). Under what circumstances would you choose one approach over the other?

BigTable performance characteristics

CharacteristicDetail
Distributed multi-level mapRuns across thousands of machines with data partitioned into Tablets
Horizontally scalableAdd nodes without downtime or manual rebalancing; achieves linear scalability on commodity hardware
Fault-tolerantData replicated via GFS across multiple ChunkServers on different racks
DurableAll data persisted to GFS with Write-Ahead Log guarantees
Centralized coordinationSingle master maintains data consistency and a global view of cluster state (Leader and Follower pattern)
Separated control and data planesClients talk to the master for metadata only; all data reads/writes go directly to Tablet servers
Interview angle

The separation of control and data planes is the single most important architectural decision in BigTable. It allows the master to be a single point of coordination without becoming a single point of contention. When designing your own system in an interview, always consider: "Can I separate metadata operations from data operations so they scale independently?"

Dynamo vs. BigTable

These two systems represent fundamentally different approaches to distributed storage:

CategoryDynamoBigTable
ArchitectureDecentralized -- every node has equal responsibilitiesCentralized -- master handles metadata, Tablet servers handle data
Data modelKey-valueMultidimensional sorted map (wide-column)
SecurityNo built-in fine-grained access controlAccess rights at column-family level
PartitioningConsistent hashing with virtual nodesRange-based Tablets (contiguous row ranges)
ReplicationSloppy quorum -- each item replicated to N nodesGFS chunk replication across ChunkServers
CAP stanceAP -- prioritizes availabilityCP -- prioritizes consistency
OperationsBy individual keyBy key range (efficient scans)
StoragePluggable storage engineSSTables in GFS
MembershipGossip protocolMaster-initiated via Chubby
warning

"Dynamo vs. BigTable" is not about which is better -- it's about which trade-offs your application needs. If you need range scans and strong consistency, BigTable wins. If you need write availability during network partitions, Dynamo wins. Interviewers want you to articulate why, not pick a winner.

Systems inspired by BigTable

BigTable's design influenced an entire generation of NoSQL databases:

SystemRelationship to BigTable
HBaseMost direct open-source clone; runs on HDFS instead of GFS
HypertableOpen-source C++ implementation; abstracts the file system layer to work with HDFS, GlusterFS, or CloudStore via a broker process
CassandraHybrid architecture -- uses BigTable's data model (SSTables, MemTables, column families) on top of Dynamo's infrastructure (consistent hashing, gossip, decentralized)
Interview angle

Cassandra's lineage is the ultimate interview talking point for distributed systems. It proves that data models and architectures are independent design dimensions. You can take BigTable's wide-column model and run it on a Dynamo-style ring -- or take Dynamo's key-value model and run it on a master-based architecture. Understanding this decomposition shows deep architectural thinking.

Quiz
You are designing a new distributed database that needs both efficient range scans (like BigTable) and high write availability during network partitions (like Dynamo). What architectural trade-off would you face?