Messaging Systems Introduction
Design a distributed messaging system that can reliably transfer a high throughput of messages between different entities.
The problem
Imagine a log aggregation service receiving hundreds of log entries per second from dozens of microservices. Three challenges immediately arise:
- Traffic spikes: The service handles 500 messages/second normally. What happens during a deployment when it spikes to 5,000? How do you buffer without losing data?
- Coupling: Every producer and consumer must agree on protocols and data formats. Adding a new consumer means modifying producers. This is a tightly coupled nightmare.
- Failures: What happens to in-flight messages when the service goes down?
A messaging system solves all three by sitting between producers and consumers -- buffering traffic spikes, decoupling components, and persisting messages until they're safely consumed.
What is a messaging system?
A messaging system transfers data between services asynchronously. Producers send messages without knowing (or caring) who consumes them. Consumers process messages at their own pace. This decoupling is the key architectural benefit.
Two messaging models
Queue (point-to-point)
Each message is consumed by exactly one consumer. Once consumed, it's removed from the queue. Great for distributing work across multiple workers, but multiple consumers can't read the same message.
Publish-subscribe (pub-sub)
Messages are organized into topics. Publishers send to a topic; all subscribers to that topic receive every message. Multiple consumers can independently read the same messages.
The messaging system (the broker) stores messages, decouples publishers from subscribers, and provides fault tolerance by persisting messages until consumed.
Most messaging systems implement either queue or pub-sub. Kafka implements both through consumer groups: a single consumer in a group gets queue behavior, multiple groups get pub-sub behavior. And unlike traditional brokers, Kafka retains messages after consumption -- consumers can replay from any point.
Why use a messaging system?
| Benefit | How |
|---|---|
| Traffic buffering | Absorb spikes by queuing messages until consumers catch up |
| Guaranteed delivery | Persist messages so they survive producer/consumer failures |
| Architectural decoupling | Producers and consumers don't need to know about each other |
| Scalability | Add consumers to increase throughput without touching producers |