Design a Message Broker That Can Replay History

The problem

Your company runs a microservices architecture for a ride-sharing platform. Dozens of services need to communicate: the ride-matching service emits events when a ride is assigned, the pricing service needs those events to compute fares, the analytics service needs them for dashboards, the fraud-detection service needs them for real-time scoring, and the data warehouse needs them for batch processing.

Traditional message queues (like RabbitMQ) delete messages once consumed. But your fraud-detection team just deployed a new ML model and needs to reprocess the last 3 days of ride events through it. Your data warehouse team runs nightly batch jobs that consume the same events the real-time services already processed. And when a consumer crashes and restarts, it needs to pick up exactly where it left off.

The requirements:

2 million messages per second aggregate throughput
7-day message retention -- messages are not deleted after consumption
Multiple independent consumer groups reading the same stream at different speeds
Replay capability -- any consumer can rewind to any point in the retention window
Ordering guarantees within a logical partition (e.g., all events for one ride stay in order)
At-least-once delivery with support for exactly-once semantics where needed
Horizontal scalability -- add more machines to increase throughput

Key requirements to identify

Requirement	What it implies
Messages retained after consumption	This is a log, not a queue -- append-only, immutable
Multiple independent consumers	Consumers maintain their own offsets (positions) in the log
Replay capability	The log must be seekable by offset or timestamp
2M messages/sec	Must partition the log across many machines for parallel I/O
Ordering within a partition	Messages in a partition use append-only sequential writes
Fault tolerance	Each partition must be replicated across multiple brokers
Exactly-once semantics	Need idempotent producers and transactional writes

The design approach

Forget everything you know about traditional message queues. Those systems track which messages have been delivered to which consumer, maintain per-message acknowledgment state, and delete messages once acknowledged. That model breaks down at 2M messages/sec with 7-day retention.

Instead, think of the message broker as a distributed commit log. Each topic is split into partitions, and each partition is an ordered, append-only sequence of messages. A message's position in the partition is its offset -- a monotonically increasing integer. Producers append to the end; consumers read sequentially from any offset they choose.

This design is brilliant for several reasons. Writes are sequential appends -- the fastest possible I/O pattern. Reads are sequential scans. There is no per-message state to track. Retention is just deleting old log segments by timestamp. Replay is just seeking to an earlier offset.

For partitioning, you hash a message key (e.g., ride_id) to determine the partition. All messages with the same key go to the same partition, preserving order. Different keys may go to different partitions, enabling parallelism.

For fault tolerance, each partition has one leader replica and multiple follower replicas. All reads and writes go through the leader. Followers pull data from the leader and stay in sync. The set of followers that are fully caught up is the In-Sync Replica set (ISR). A message is considered "committed" only when all ISR members have it.

Think first

Before reading the solution, consider: how would you track which messages each consumer has processed? In a traditional queue, the broker tracks this. In a log-based system with millions of messages per second, can the broker afford to track per-message acknowledgment state for every consumer?

How the industry solved it

LinkedIn built Apache Kafka to solve exactly this problem. Kafka treats messaging as a distributed commit log problem rather than a message queue problem, and this architectural choice is the source of its scalability.

Start here: Kafka Introduction

Architecture overview

The commit log

Every partition is a structured commit log stored as a series of segment files on disk. Each segment is an append-only file. Old segments are deleted when they fall outside the retention window. This is Kafka's core abstraction.

Deep dive: Kafka Deep Dive and the Segmented Log pattern

Consumer groups

A consumer group is a set of consumers that cooperate to consume a topic. Each partition is assigned to exactly one consumer in the group. If you have 6 partitions and 3 consumers in a group, each consumer handles 2 partitions. If a consumer dies, its partitions are rebalanced to the remaining consumers.

Different consumer groups are completely independent. Group A (real-time processing) and Group B (batch analytics) each maintain their own offsets and consume at their own pace. This is how Kafka supports multiple independent consumers on the same data.

Deep dive: Consumer Groups

Replication and the high-water mark

Each partition is replicated across multiple brokers. The leader handles all reads and writes. Followers pull data from the leader. The high-water mark is the offset of the last message replicated to all ISR members. Consumers can only read up to the high-water mark -- this prevents them from reading data that might be lost if the leader fails before replication completes.

Deep dive: Kafka Workflow and the High-Water Mark pattern

Coordination

Kafka relies on ZooKeeper (or in newer versions, KRaft) for controller election, broker membership, and topic metadata. The controller broker is responsible for partition leader election when a broker fails.

Deep dive: Role of ZooKeeper | Controller Broker

Key patterns used

Pattern	Why it is needed	Reference
Segmented Log	Store the commit log as a series of files for efficient retention and cleanup	Pattern
High-Water Mark	Prevent consumers from reading unreplicated (potentially lost) data	Pattern
Leader and Follower	Ensure a single writer per partition for ordering; replicate for fault tolerance	Pattern
Write-Ahead Log	The commit log itself IS a write-ahead log for the message data	Pattern
Heartbeat	Detect broker failures and trigger leader re-election	Pattern
Split Brain	Prevent two brokers from believing they are the leader of the same partition	Pattern

Delivery semantics

Kafka supports three delivery guarantees, and understanding when to use each is critical:

At-most-once: Consumer commits offset before processing. Fast but may lose messages on crash.
At-least-once: Consumer processes then commits. Safe but may produce duplicates on crash.
Exactly-once: Uses idempotent producers and transactional writes. Strongest guarantee, slight overhead.

Deep dive: Kafka Delivery Semantics

Design Challenge

Variation: Design an event sourcing system

You are building an event-sourced banking application. Every account state change (deposit, withdrawal, transfer) is stored as an immutable event. The current balance is derived by replaying all events for an account. You need strong ordering per account, exactly-once processing, and the ability to rebuild the entire account state from the event log.

Hints (0/4)

The problem​

Key requirements to identify​

The design approach​

How the industry solved it​

Architecture overview​

The commit log​

Consumer groups​

Replication and the high-water mark​

Coordination​

Key patterns used​

Delivery semantics​