System APIs

What operations does BigTable expose to clients? The API is deliberately narrow -- no SQL, no joins, no multi-row transactions. This simplicity is a design choice that enables BigTable to scale to thousands of machines.

Think first

If you were designing an API for a database that must scale to thousands of machines, what operations would you deliberately exclude -- and why?

BigTable provides two categories of operations: metadata and data.

Metadata operations

Metadata operations manage the structure of the system:

Create and delete tables
Create and delete column families
Change cluster, table, and column family metadata (e.g., access control rights)

Data operations

Data operations read and write cell values:

Operation	Description
`Set()`	Write cells in a row
`DeleteCells()`	Delete specific cells in a row
`DeleteRow()`	Delete all cells in a row

Read operations

Each row read is atomic.
Reads can target a single row, a range of rows, or all rows.
Results can be restricted to specific column families or individual columns.

Transactions and batch operations

Capability	Scope
Single-row transactions	Atomic read-modify-write on one row key
Cross-row batch writes	Batch interface (no transactional guarantee)
Integer counters	Cells can be used as atomic counters

warning

BigTable does not support multi-row transactions. If your design requires atomicity across multiple rows, BigTable is the wrong tool. Google later built Spanner to fill this gap.

Integration points

BigTable can serve as both an input source and output target for MapReduce jobs.
Clients can write Sawzall scripts for server-side data processing (transform, filter, aggregate) before network transfer -- reducing data movement.

Interview angle

The "no multi-row transactions" constraint is a favorite interview topic. It forces you to think about row key design -- if data that must be updated atomically lives in different rows, your schema is wrong. This is why BigTable applications embed related data in a single row with multiple column families rather than normalizing across rows like a relational database.

Quiz

You are designing a social media application on BigTable where a user's profile and their latest posts must always be updated atomically. What would happen if you stored the profile in one row and each post in a separate row?

BigTable would handle cross-row consistency automatically through its master server.

Updates would be efficient because each row is on a separate tablet server.

You would risk inconsistent state -- the profile could update while the post write fails -- because BigTable only guarantees single-row atomicity. The correct design is to store both in the same row using separate column families.

You could use MapReduce to ensure consistency across rows.

Metadata operations​

Data operations​

Read operations​

Transactions and batch operations​

Integration points​

Metadata operations

Data operations

Read operations

Transactions and batch operations

Integration points