Replication

Like Dynamo, Cassandra replicates data to multiple nodes for fault tolerance. If one node goes down, replicas on other nodes serve the request. Two settings control replication:

Replication factor

The replication factor is the number of copies of each piece of data. With a replication factor of 3, every row exists on 3 different nodes. Each keyspace can have its own replication factor.

Think first

If you replicate data to consecutive nodes on the ring, what failure scenario are you still vulnerable to? How would you address it in a multi-data-center deployment?

Replication strategy

The first replica goes to the node that owns the partition key's hash range. The remaining replicas are placed clockwise on the ring, but how depends on the strategy:

Simple replication strategy

For single data center clusters. Replicas go to consecutive nodes clockwise on the ring. Simple and predictable, but doesn't account for rack or data center topology.

Network topology strategy

For multi-data center clusters. You specify a replication factor per data center. Within each data center, replicas are placed by walking the ring clockwise until reaching a node in a different rack. This guards against rack-level failures -- power, cooling, and network switches tend to fail per-rack.

Interview angle

When asked "How would you replicate data across data centers?", describe network topology strategy: different replication factors per DC, rack-aware placement within each DC. This shows you understand that failures are correlated by physical location.

Quiz

You have a 6-node cluster across 2 data centers (3 nodes each), using NetworkTopologyStrategy with RF=3 per DC. One entire data center goes offline. What happens to reads with LOCAL_QUORUM consistency?

All reads fail because the cluster has lost half its nodes and cannot achieve quorum.

Reads from clients connected to the surviving DC succeed; reads from clients that were connected to the failed DC fail until they reconnect to the surviving DC.

All reads succeed because Cassandra automatically reroutes to the surviving data center.

Reads succeed but return stale data because the failed DC might have had the latest writes.

Replication factor​

Replication strategy​

Simple replication strategy​

Network topology strategy​

Replication factor

Replication strategy

Simple replication strategy

Network topology strategy