Gossiper

In a decentralized system with no master, how does each node learn about every other node's state? Cassandra's answer: the Gossiper -- its implementation of the gossip protocol.

Think first

In a 1000-node cluster with no central coordinator, how would you ensure every node eventually learns about a new node joining? What is the time complexity of information propagation?

How Cassandra gossips

Every second, each node:

Picks 1 to 3 random nodes
Exchanges state information -- what nodes it knows about, their status, their load, their schema version
Merges received information with its own, keeping the most recent version (each gossip message includes a version number)

This means any cluster-wide change (node joining, node failing, schema update) propagates to all nodes in O(log N) rounds.

Generation number: Each node stores a generation number that increments on every restart. This number is included in gossip messages so other nodes can distinguish a node's current state from its pre-restart state. If a gossip message arrives with a higher generation number, the receiver knows the sender restarted and updates its view accordingly.

Seed nodes: Like Dynamo, Cassandra uses designated seed nodes to prevent logical partitions. Seeds have no special operational role -- they're just well-known bootstrap points that new nodes can use to discover the rest of the cluster.

Think first

Why can't Cassandra use a simple fixed-timeout heartbeat (e.g., 'if no heartbeat in 5 seconds, declare the node dead') for failure detection in a multi-data-center cluster?

Node failure detection: Phi Accrual

Cassandra doesn't use simple heartbeat timeouts to declare nodes dead. Instead, it uses the Phi Accrual Failure Detector -- an adaptive algorithm that:

Tracks historical heartbeat arrival times for each node
Outputs a suspicion level (φ) rather than a binary alive/dead signal
Adapts automatically to network conditions -- a node on a slow network gets a more lenient threshold

When φ exceeds the configured threshold (default: 8), the node is declared down. This dramatically reduces false positives compared to fixed timeouts, especially in clusters spanning multiple data centers with varying network latencies.

Interview angle

When discussing failure detection, mention that Cassandra uses Phi Accrual instead of fixed-timeout heartbeats. The key insight: binary alive/dead decisions with fixed timeouts force you to choose between fast detection (many false positives) and accuracy (slow detection). Phi Accrual gives you both by adapting to observed network behavior.

Quiz

A Cassandra cluster spans two data centers connected by a WAN link. The WAN link becomes congested, doubling cross-DC latency for 30 seconds before recovering. With the Phi Accrual Failure Detector, what happens to cross-DC node status?

All cross-DC nodes are immediately marked as down because heartbeats arrive late.

The suspicion level (phi) rises but likely stays below the threshold, so nodes remain marked as up. If congestion persisted much longer, phi would eventually cross the threshold.

Nothing happens because gossip uses TCP, which guarantees delivery regardless of latency.

Cassandra switches to a fallback heartbeat mechanism for cross-DC communication.

How Cassandra gossips​

Node failure detection: Phi Accrual​

How Cassandra gossips

Node failure detection: Phi Accrual