System APIs
What operations does BigTable expose to clients? The API is deliberately narrow -- no SQL, no joins, no multi-row transactions. This simplicity is a design choice that enables BigTable to scale to thousands of machines.
BigTable provides two categories of operations: metadata and data.
Metadata operations
Metadata operations manage the structure of the system:
- Create and delete tables
- Create and delete column families
- Change cluster, table, and column family metadata (e.g., access control rights)
Data operations
Data operations read and write cell values:
| Operation | Description |
|---|---|
Set() | Write cells in a row |
DeleteCells() | Delete specific cells in a row |
DeleteRow() | Delete all cells in a row |
Read operations
- Each row read is atomic.
- Reads can target a single row, a range of rows, or all rows.
- Results can be restricted to specific column families or individual columns.
Transactions and batch operations
| Capability | Scope |
|---|---|
| Single-row transactions | Atomic read-modify-write on one row key |
| Cross-row batch writes | Batch interface (no transactional guarantee) |
| Integer counters | Cells can be used as atomic counters |
BigTable does not support multi-row transactions. If your design requires atomicity across multiple rows, BigTable is the wrong tool. Google later built Spanner to fill this gap.
Integration points
- BigTable can serve as both an input source and output target for MapReduce jobs.
- Clients can write Sawzall scripts for server-side data processing (transform, filter, aggregate) before network transfer -- reducing data movement.
The "no multi-row transactions" constraint is a favorite interview topic. It forces you to think about row key design -- if data that must be updated atomically lives in different rows, your schema is wrong. This is why BigTable applications embed related data in a single row with multiple column families rather than normalizing across rows like a relational database.