Skip to main content

Dynamo Introduction

Design a distributed key-value store that is highly available, highly scalable, and completely decentralized.

Core concepts
  • High availability -- the system keeps serving requests even when nodes fail. No downtime, no "please try again later."
  • Scalability -- throw more machines at the problem and get proportional improvement. No re-architecture needed.
  • Decentralized -- no single node is special. No leader, no coordinator, no single point of failure.

Why Dynamo matters

It's Black Friday at Amazon. Millions of customers are adding items to their shopping carts simultaneously. A few servers go down -- because at this scale, hardware failure isn't a possibility, it's a certainty. What happens to those shopping carts?

This is the problem Amazon built Dynamo to solve. The insight that drove the entire design: a customer who can't add an item to their cart is a lost sale. Availability isn't a nice-to-have -- it's revenue. Amazon observed that even brief outages directly correlated with fewer customers served, and therefore chose to build a system that is always writable, even at the cost of occasionally returning stale data.

Think about that trade-off: Amazon would rather show you a slightly outdated cart than show you an error page. Most inconsistencies get resolved in the background, and the customer never notices.

Interview insight

Dynamo is one of the best examples of business requirements driving system design. In an interview, when you're asked to design a system, always start with: "What matters more -- consistency or availability?" Dynamo shows what happens when you go all-in on availability.

What is Dynamo?

Dynamo is a highly available key-value store that Amazon built for internal use. Many Amazon services -- shopping cart, bestseller lists, sales rank, product catalog -- only need simple primary-key access to data. A relational database would be overkill for these use cases and would actually limit the scalability and availability these services need.

Dynamo provides a flexible design where applications choose their own trade-off point between availability and consistency.

Don't confuse these

Dynamo (the subject of this chapter) is Amazon's internal system, described in their 2007 paper. DynamoDB is the commercial managed service Amazon later built, inspired by Dynamo's design but with significant differences. They are not the same thing.

Background

In CAP theorem terms, Dynamo is an AP system -- it chooses availability and partition tolerance over strong consistency. When a network partition happens, Dynamo keeps serving reads and writes on both sides of the partition, and reconciles later.

The Dynamo design was hugely influential. It directly inspired:

If you understand Dynamo deeply, you already understand the foundation of half the NoSQL databases in production today.

Design goals

GoalWhat it meansWhy it matters
Highly availableThe system always responds, even during failuresDowntime = lost revenue for Amazon
ScalableAdd a node, get proportional capacity increaseMust handle Black Friday-scale traffic spikes
DecentralizedNo leader node, no single point of failureA leader is a bottleneck and a risk
Eventually consistentWrites propagate asynchronously; reads may see stale data brieflyThe price of being always-writable
The key trade-off

Dynamo uses optimistic replication -- it accepts writes without waiting for all replicas to confirm. This makes writes fast and highly available, but means different nodes can temporarily have different versions of the same data. Dynamo resolves these conflicts later, during reads.

Dynamo's use cases

Dynamo works well when:

  • You need simple key-value access (no complex queries, no joins)
  • Availability matters more than consistency (e.g., a shopping cart should always be writable)
  • You need predictable low-latency at massive scale

Dynamo is a poor fit when:

  • You need strong consistency (e.g., financial transactions, inventory counts)
  • You need complex queries across multiple keys
  • Your data model requires relations between entities
Interview angle

When an interviewer asks "Why not just use a relational database?", the answer for Dynamo's use case is clear: Amazon's services only need primary-key lookups. A relational DB would impose coordination overhead (for joins, transactions, schemas) that hurts availability and scalability without providing any benefit these services actually need.

System APIs

Dynamo exposes just two operations -- radical simplicity by design:

  • get(key) -- Returns the object(s) associated with the key. If there are conflicting versions (because of concurrent writes), it returns all of them along with a context that carries version metadata. The application decides which version wins.

  • put(key, context, object) -- Stores the object for the given key. The context (from a previous get) is passed back so Dynamo can track version lineage -- think of it as a cookie that helps Dynamo know which version you're updating.

Dynamo treats both keys and values as opaque byte arrays (typically under 1 MB). It applies MD5 hashing to the key to produce a 128-bit identifier, which determines which nodes are responsible for storing that key.

Why MD5?

MD5 produces a uniform distribution of hash values, which means keys get spread evenly across nodes. The specific hash function doesn't matter much -- what matters is that it distributes well and is fast to compute. Dynamo doesn't use MD5 for security, just for distribution.

Think first
Amazon needs a data store for its shopping cart. The store must always accept writes, even during network partitions or node failures. What trade-off is Amazon implicitly making, and what problems will this create?
Quiz
A Dynamo cluster has N=3, W=2, R=2. One of the three replica nodes goes down. What happens to write operations?

What's next

In the following chapters, we'll walk through Dynamo's architecture piece by piece:

Each of these maps directly to a system design pattern you'll see in Part 2 of this course.