Design a Message Broker That Can Replay History
The problem
Your company runs a microservices architecture for a ride-sharing platform. Dozens of services need to communicate: the ride-matching service emits events when a ride is assigned, the pricing service needs those events to compute fares, the analytics service needs them for dashboards, the fraud-detection service needs them for real-time scoring, and the data warehouse needs them for batch processing.
Traditional message queues (like RabbitMQ) delete messages once consumed. But your fraud-detection team just deployed a new ML model and needs to reprocess the last 3 days of ride events through it. Your data warehouse team runs nightly batch jobs that consume the same events the real-time services already processed. And when a consumer crashes and restarts, it needs to pick up exactly where it left off.
The requirements:
- 2 million messages per second aggregate throughput
- 7-day message retention -- messages are not deleted after consumption
- Multiple independent consumer groups reading the same stream at different speeds
- Replay capability -- any consumer can rewind to any point in the retention window
- Ordering guarantees within a logical partition (e.g., all events for one ride stay in order)
- At-least-once delivery with support for exactly-once semantics where needed
- Horizontal scalability -- add more machines to increase throughput
Key requirements to identify
| Requirement | What it implies |
|---|---|
| Messages retained after consumption | This is a log, not a queue -- append-only, immutable |
| Multiple independent consumers | Consumers maintain their own offsets (positions) in the log |
| Replay capability | The log must be seekable by offset or timestamp |
| 2M messages/sec | Must partition the log across many machines for parallel I/O |
| Ordering within a partition | Messages in a partition use append-only sequential writes |
| Fault tolerance | Each partition must be replicated across multiple brokers |
| Exactly-once semantics | Need idempotent producers and transactional writes |
The design approach
Forget everything you know about traditional message queues. Those systems track which messages have been delivered to which consumer, maintain per-message acknowledgment state, and delete messages once acknowledged. That model breaks down at 2M messages/sec with 7-day retention.
Instead, think of the message broker as a distributed commit log. Each topic is split into partitions, and each partition is an ordered, append-only sequence of messages. A message's position in the partition is its offset -- a monotonically increasing integer. Producers append to the end; consumers read sequentially from any offset they choose.
This design is brilliant for several reasons. Writes are sequential appends -- the fastest possible I/O pattern. Reads are sequential scans. There is no per-message state to track. Retention is just deleting old log segments by timestamp. Replay is just seeking to an earlier offset.
For partitioning, you hash a message key (e.g., ride_id) to determine the partition. All messages with the same key go to the same partition, preserving order. Different keys may go to different partitions, enabling parallelism.
For fault tolerance, each partition has one leader replica and multiple follower replicas. All reads and writes go through the leader. Followers pull data from the leader and stay in sync. The set of followers that are fully caught up is the In-Sync Replica set (ISR). A message is considered "committed" only when all ISR members have it.
How the industry solved it
LinkedIn built Apache Kafka to solve exactly this problem. Kafka treats messaging as a distributed commit log problem rather than a message queue problem, and this architectural choice is the source of its scalability.
Start here: Kafka Introduction
Architecture overview
The commit log
Every partition is a structured commit log stored as a series of segment files on disk. Each segment is an append-only file. Old segments are deleted when they fall outside the retention window. This is Kafka's core abstraction.
Deep dive: Kafka Deep Dive and the Segmented Log pattern
Consumer groups
A consumer group is a set of consumers that cooperate to consume a topic. Each partition is assigned to exactly one consumer in the group. If you have 6 partitions and 3 consumers in a group, each consumer handles 2 partitions. If a consumer dies, its partitions are rebalanced to the remaining consumers.
Different consumer groups are completely independent. Group A (real-time processing) and Group B (batch analytics) each maintain their own offsets and consume at their own pace. This is how Kafka supports multiple independent consumers on the same data.
Deep dive: Consumer Groups
Replication and the high-water mark
Each partition is replicated across multiple brokers. The leader handles all reads and writes. Followers pull data from the leader. The high-water mark is the offset of the last message replicated to all ISR members. Consumers can only read up to the high-water mark -- this prevents them from reading data that might be lost if the leader fails before replication completes.
Deep dive: Kafka Workflow and the High-Water Mark pattern
Coordination
Kafka relies on ZooKeeper (or in newer versions, KRaft) for controller election, broker membership, and topic metadata. The controller broker is responsible for partition leader election when a broker fails.
Deep dive: Role of ZooKeeper | Controller Broker
Key patterns used
| Pattern | Why it is needed | Reference |
|---|---|---|
| Segmented Log | Store the commit log as a series of files for efficient retention and cleanup | Pattern |
| High-Water Mark | Prevent consumers from reading unreplicated (potentially lost) data | Pattern |
| Leader and Follower | Ensure a single writer per partition for ordering; replicate for fault tolerance | Pattern |
| Write-Ahead Log | The commit log itself IS a write-ahead log for the message data | Pattern |
| Heartbeat | Detect broker failures and trigger leader re-election | Pattern |
| Split Brain | Prevent two brokers from believing they are the leader of the same partition | Pattern |
Delivery semantics
Kafka supports three delivery guarantees, and understanding when to use each is critical:
- At-most-once: Consumer commits offset before processing. Fast but may lose messages on crash.
- At-least-once: Consumer processes then commits. Safe but may produce duplicates on crash.
- Exactly-once: Uses idempotent producers and transactional writes. Strongest guarantee, slight overhead.
Deep dive: Kafka Delivery Semantics