Skip to main content

System APIs

What operations does BigTable expose to clients? The API is deliberately narrow -- no SQL, no joins, no multi-row transactions. This simplicity is a design choice that enables BigTable to scale to thousands of machines.

Think first
If you were designing an API for a database that must scale to thousands of machines, what operations would you deliberately exclude -- and why?

BigTable provides two categories of operations: metadata and data.

Metadata operations

Metadata operations manage the structure of the system:

  • Create and delete tables
  • Create and delete column families
  • Change cluster, table, and column family metadata (e.g., access control rights)

Data operations

Data operations read and write cell values:

OperationDescription
Set()Write cells in a row
DeleteCells()Delete specific cells in a row
DeleteRow()Delete all cells in a row

Read operations

  • Each row read is atomic.
  • Reads can target a single row, a range of rows, or all rows.
  • Results can be restricted to specific column families or individual columns.

Transactions and batch operations

CapabilityScope
Single-row transactionsAtomic read-modify-write on one row key
Cross-row batch writesBatch interface (no transactional guarantee)
Integer countersCells can be used as atomic counters
warning

BigTable does not support multi-row transactions. If your design requires atomicity across multiple rows, BigTable is the wrong tool. Google later built Spanner to fill this gap.

Integration points

  • BigTable can serve as both an input source and output target for MapReduce jobs.
  • Clients can write Sawzall scripts for server-side data processing (transform, filter, aggregate) before network transfer -- reducing data movement.
Interview angle

The "no multi-row transactions" constraint is a favorite interview topic. It forces you to think about row key design -- if data that must be updated atomically lives in different rows, your schema is wrong. This is why BigTable applications embed related data in a single row with multiple column families rather than normalizing across rows like a relational database.

Quiz
You are designing a social media application on BigTable where a user's profile and their latest posts must always be updated atomically. What would happen if you stored the profile in one row and each post in a separate row?