Kafka Workflow
Let's trace the end-to-end flow of messages through Kafka in both messaging models.
Pub-sub workflow
- Producer publishes a message to a topic
- Broker stores the message in the appropriate partition (key-based or round-robin)
- Consumer subscribes to the topic and receives the current offset
- Consumer polls Kafka at regular intervals for new messages
- Kafka delivers new messages to the consumer
- Consumer processes the message and sends an acknowledgment (offset commit)
- Kafka advances the offset (stored in ZooKeeper or an internal
__consumer_offsetstopic) - Repeat -- consumer keeps polling for new messages
Key detail: Consumers pull messages from Kafka (Kafka doesn't push). This lets each consumer control its own pace. If a consumer falls behind, it simply has more messages waiting; Kafka doesn't slow down other consumers.
Replay capability: Since offsets are just numbers, a consumer can rewind to any offset and re-process old messages. This is useful for recomputing derived data, debugging, or recovering from processing errors.
Consumer group workflow
The same flow, but with work distribution:
- Producer publishes to a topic (same as above)
- A single consumer subscribes with a group ID
- When a second consumer joins with the same group ID, Kafka switches to shared mode:
- Each partition is assigned to exactly one consumer in the group
- Messages are distributed, not duplicated
- If the number of consumers exceeds partitions, excess consumers wait idle as standbys
- If a consumer leaves or crashes, its partitions are rebalanced to remaining consumers
Same topic, same group ID → queue (messages distributed). Same topic, different group IDs → pub-sub (messages broadcast to each group). This is the only configuration difference between the two models.