19 Read Repair
A node was down for an hour. It missed several writes. Hinted handoff delivered some of them, but not all (maybe the hint window expired). Now the node is back up and serving reads -- with stale data. How do you detect and fix this, without running a separate repair process?
Background
In eventually consistent systems, replicas can drift apart. Hinted handoff handles short-term failures, but it's not comprehensive -- hints can be lost, expire, or cover only a subset of missed writes. You need another mechanism to detect and fix stale replicas, ideally without adding a separate background process.
The clever insight: you're already reading from multiple replicas to satisfy the quorum. Why not compare them while you're at it?
Definition
During a read operation, the system reads from multiple replicas and compares their responses. If any replica has stale data, the system immediately pushes the latest version to it. This "repair during read" is called read repair.
How it works
- Client sends a read request
- The coordinator reads the full data from one replica and a digest (checksum) from the others
- If digests match → all replicas are in sync, return the data
- If digests don't match → read full data from all replicas, determine the newest version
- Return the newest version to the client
- Asynchronously push the newest version to any replicas that had stale data
Optimization: probabilistic read repair
Comparing all replicas on every read is expensive. When the consistency level is less than ALL, many systems perform read repair probabilistically -- for example, only on 10% of reads. This reduces overhead while still gradually repairing stale replicas over time.
| Full read repair (every read) | Probabilistic read repair |
|---|---|
| Every read repairs inconsistencies immediately | Repairs happen gradually over time |
| Higher read latency (extra comparisons) | Lower overhead per read |
| Used when consistency level = ALL | Used when consistency level < ALL |
Read repair is lazy -- it only fixes data that's actually being read. Hot data (frequently accessed) gets repaired quickly. Cold data (rarely accessed) might remain stale for a long time. For cold data, you need Merkle trees to proactively find and fix divergence.
The three layers of anti-entropy
| Mechanism | When it runs | What it fixes | Speed |
|---|---|---|---|
| Hinted Handoff | During write (proactive) | Temporary node failures | Immediate (when node recovers) |
| Read Repair | During read (reactive) | Stale replicas for accessed data | On next read |
| Merkle Trees | Background process (proactive) | All divergence, including cold data | Eventually |
These three mechanisms form a layered defense: hinted handoff catches most temporary failures, read repair fixes stale data as it's accessed, and Merkle trees sweep up everything else in the background.
Examples
Cassandra
Cassandra implements both full and probabilistic read repair. The read_repair_chance setting controls the probability of triggering read repair on reads below the ALL consistency level. At ALL consistency, read repair always runs.
Dynamo
Dynamo uses read repair as part of its anti-entropy strategy. During reads, the coordinator compares responses and pushes updates to stale replicas. This works together with Merkle tree-based background synchronization.
Read repair is the answer to "How do you fix stale replicas without a separate repair process?" The key insight: since you're already reading from multiple replicas for quorum, compare them and fix discrepancies on the spot. Mention it alongside hinted handoff and Merkle trees as three complementary anti-entropy mechanisms -- interviewers love seeing you understand the layered approach.