Skip to main content

Kafka Introduction

Why Kafka matters

In 2010, LinkedIn had a problem that every fast-growing tech company eventually faces: data was everywhere, and nothing could keep up.

User activity events, application metrics, server logs, database changelogs -- all of this data needed to flow between dozens of internal systems. The existing solutions (traditional message queues, direct database-to-database transfers, custom ETL pipelines) were breaking under the load. They were too slow, too fragile, or couldn't handle the volume.

LinkedIn needed something fundamentally different: a system that could ingest millions of events per second, store them durably, and let multiple independent consumers read the same data stream at their own pace. Traditional message queues (like RabbitMQ) deleted messages after delivery. Databases weren't designed for high-throughput streaming. Nothing fit.

So LinkedIn built Kafka -- and in doing so, created an entirely new category of infrastructure: the distributed commit log.

Interview insight

Kafka appears in system design interviews constantly -- not always by name, but by pattern. Any time an interviewer asks you to design a system that needs to "decouple producers from consumers," "handle bursty traffic," or "enable real-time analytics alongside batch processing," they're describing Kafka's use case. Understanding Kafka gives you the vocabulary for all of these.

What is Kafka?

Apache Kafka is a distributed, durable, fault-tolerant streaming platform. At its core, it's a system that:

  1. Takes in streams of messages from applications (producers)
  2. Stores them reliably on a cluster of machines (brokers)
  3. Delivers them to applications that process them (consumers)

What makes Kafka different from a traditional message queue:

Traditional message queueKafka
Messages deleted after consumptionMessages persisted on disk, retained for a configurable period
One consumer gets each messageMultiple consumer groups can independently read the same data
Optimized for message deliveryOptimized for throughput -- millions of messages per second
Push-based deliveryPull-based -- consumers read at their own pace
No replayConsumers can replay from any point in the stream

The commit log: Kafka's big idea

At a high level, Kafka is a distributed commit log (also known as a write-ahead log). This is an append-only data structure that stores a sequence of records:

  • Records are always appended to the end of the log
  • Once written, records are immutable -- they cannot be modified or deleted
  • Reading always happens sequentially, from old to new

This simplicity is Kafka's superpower. Because all writes are sequential appends and all reads are sequential scans, Kafka achieves remarkable throughput by exploiting sequential disk I/O -- which, on modern hardware, can be faster than random-access memory operations.

The performance insight

Kafka stores everything on disk, yet achieves throughput measured in millions of messages per second. How? Sequential disk I/O. A modern disk can write 600MB/s sequentially but only 100KB/s randomly. By constraining its data structure to append-only sequential access, Kafka turns disk from a bottleneck into an advantage. This is why Kafka can retain messages for days or weeks with negligible performance impact.

Kafka use cases

Use caseHow Kafka fitsExample
Metrics & monitoringDistributed services push operational metrics to Kafka topics; monitoring systems consume and aggregateDatadog-style dashboards pulling from Kafka streams
Log aggregationCollect logs from hundreds of services into a unified streamReplacing scattered log files with a central, queryable log pipeline
Stream processingRaw data flows through multiple transformation stages via topic-to-topic processingEnriching clickstream data before loading into a data warehouse
Event sourcingUse Kafka as the system of record -- every state change is an immutable eventOrder lifecycle events: created → paid → shipped → delivered
Website activity trackingKafka's original use case at LinkedIn -- capture every page view, search, and clickReal-time analytics, A/B test measurement, recommendation engines
Change data captureDatabase changes published to Kafka for downstream systemsKeeping a search index in sync with a primary database
Kafka vs. traditional messaging

Kafka isn't just a "faster RabbitMQ." It's a fundamentally different abstraction. A message queue is about delivering messages. Kafka is about storing a log of events that multiple systems can independently consume. This makes it the backbone for event-driven architectures.

What's next

In the following chapters, we'll explore: