Skip to main content

19 Read Repair

A node was down for an hour. It missed several writes. Hinted handoff delivered some of them, but not all (maybe the hint window expired). Now the node is back up and serving reads -- with stale data. How do you detect and fix this, without running a separate repair process?

Think first
You're already reading from multiple replicas to satisfy a quorum. Some replicas have stale data. Instead of running a separate repair process, how could you fix the stale replicas using the data you already have from the read operation?

Background

In eventually consistent systems, replicas can drift apart. Hinted handoff handles short-term failures, but it's not comprehensive -- hints can be lost, expire, or cover only a subset of missed writes. You need another mechanism to detect and fix stale replicas, ideally without adding a separate background process.

The clever insight: you're already reading from multiple replicas to satisfy the quorum. Why not compare them while you're at it?

Definition

During a read operation, the system reads from multiple replicas and compares their responses. If any replica has stale data, the system immediately pushes the latest version to it. This "repair during read" is called read repair.

How it works

  1. Client sends a read request
  2. The coordinator reads the full data from one replica and a digest (checksum) from the others
  3. If digests match → all replicas are in sync, return the data
  4. If digests don't match → read full data from all replicas, determine the newest version
  5. Return the newest version to the client
  6. Asynchronously push the newest version to any replicas that had stale data

Optimization: probabilistic read repair

Comparing all replicas on every read is expensive. When the consistency level is less than ALL, many systems perform read repair probabilistically -- for example, only on 10% of reads. This reduces overhead while still gradually repairing stale replicas over time.

Full read repair (every read)Probabilistic read repair
Every read repairs inconsistencies immediatelyRepairs happen gradually over time
Higher read latency (extra comparisons)Lower overhead per read
Used when consistency level = ALLUsed when consistency level < ALL
The key insight

Read repair is lazy -- it only fixes data that's actually being read. Hot data (frequently accessed) gets repaired quickly. Cold data (rarely accessed) might remain stale for a long time. For cold data, you need Merkle trees to proactively find and fix divergence.

The three layers of anti-entropy

MechanismWhen it runsWhat it fixesSpeed
Hinted HandoffDuring write (proactive)Temporary node failuresImmediate (when node recovers)
Read RepairDuring read (reactive)Stale replicas for accessed dataOn next read
Merkle TreesBackground process (proactive)All divergence, including cold dataEventually

These three mechanisms form a layered defense: hinted handoff catches most temporary failures, read repair fixes stale data as it's accessed, and Merkle trees sweep up everything else in the background.

Examples

Cassandra

Cassandra implements both full and probabilistic read repair. The read_repair_chance setting controls the probability of triggering read repair on reads below the ALL consistency level. At ALL consistency, read repair always runs.

Dynamo

Dynamo uses read repair as part of its anti-entropy strategy. During reads, the coordinator compares responses and pushes updates to stale replicas. This works together with Merkle tree-based background synchronization.

Interview angle

Read repair is the answer to "How do you fix stale replicas without a separate repair process?" The key insight: since you're already reading from multiple replicas for quorum, compare them and fix discrepancies on the spot. Mention it alongside hinted handoff and Merkle trees as three complementary anti-entropy mechanisms -- interviewers love seeing you understand the layered approach.

Quiz
Read repair only fixes data that is actively being read. What problem does this create for data that is written once and rarely accessed again (cold data)?