Going Decentralized — Gossip, Rumors, and the SWIM Protocol

Part 3 of 4 in the Heartbeats & Failure Detection series:

Is This Thing On? — Building a Heartbeat System from Scratch
The Trade-off Maze — Push vs Pull, and Why "Is It Dead?" Is Harder Than It Sounds
Going Decentralized — Gossip, Rumors, and the SWIM Protocol (you are here)
Getting Smart — Phi Accrual and the End of Hardcoded Timeouts

The quorum idea from the last post has a scaling problem: every node checking every other node is N-squared connections. At 1000 nodes, that's a million health checks per round. So how do you get cluster-wide awareness without cluster-wide communication?

The answer is surprisingly human: gossip.

Spreading Rumors

You know how rumors work in an office. You don't announce it to everyone — you tell 2 or 3 people. They each tell 2 or 3 people. Those people tell more people. Within a few hours, everyone knows.

Gossip protocols work the same way. Each node, every few seconds, picks a random subset of peers (say 2-3) and exchanges information with them. The information spreads exponentially: 3 → 9 → 27 → 81. That's O(log N) rounds to reach the whole cluster. A thousand nodes? About 7 rounds. A million? Around 13.

And here's the key: each node's work per round is constant. You always talk to 2-3 peers, whether the cluster has 10 nodes or 10,000. The cluster scales; your per-node cost doesn't.

No Central Monitor

This changes the architecture fundamentally. There's no central monitor anymore. Every node is both a monitor and a participant. Each node maintains its own membership table — a map of every node it knows about and their status.

When two nodes gossip, they exchange their entire membership tables and merge them. "Here's everything I know about everyone. What do you know?" Both sides learn from each other in a single exchange. Over a few rounds, the whole cluster converges to the same view.

This eliminates the "who watches the watcher" problem entirely. The cluster watches itself.

The Bootstrap Problem: Seed Nodes

But wait — when a node starts up, its membership table only contains itself. Who does it gossip with? You can't discover peers from nothing.

The solution is a seed node. When a new node starts, it's told about at least one existing node. Think of it like joining a party where you don't know anyone — you need one introduction. After that, you meet people through people.

In practice: the first node starts alone. The spawner waits for it to report its port, then tells every subsequent node "here's where to find the seed." Each new node gossips with the seed, gets its membership table, and instantly learns about everyone the seed knows.

The seed isn't special after bootstrap. It's just a regular peer. No extra authority, no single point of failure.

The False Rumor Problem

Node-17 health-checks node-42 and gets no response. It starts gossiping: "node-42 is dead." The rumor spreads. Within a few rounds, the whole cluster thinks node-42 is down.

But what if it was just the network between node-17 and node-42? Node-42 is fine — it's happily serving traffic. One node's bad network connection just triggered a cluster-wide false alarm.

This is where the naive gossip model breaks down. Binary states (alive/dead) are too aggressive. One bad check, one rumor, one cascade.

Three States: Alive, Suspect, Dead

The fix: add a buffer. Instead of going straight from alive to dead, pass through suspect first. This is the core idea behind the SWIM protocol (Scalable Weakly-consistent Infection-style Process Group Membership), used by Consul and Serf.

The flow:

Node-17 can't reach node-42 → marks it as suspect, not dead
Node-17 asks a few random peers: "can you check node-42 for me?"
If those peers also can't reach node-42 → declared dead, gossip begins
If a peer says "node-42 is fine" → suspicion cleared, false alarm avoided

Notice how this combines everything from the previous posts: secondary confirmation (ask someone else), quorum (multiple confirmations needed), and gossip (spread the result efficiently). The SWIM protocol is all three ideas fused into one design.

Suspect Rounds and the Symmetry Problem

In the implementation, I tracked suspectRounds — a counter that increments each time a health check fails. After 5 consecutive failures (the threshold), the node is declared dead.

But then I thought about recovery. If a dead node comes back and responds to a health check, should it immediately go back to "alive"? My first instinct was yes — it responded, it's alive.

Then I thought harder. What if it's flapping — coming up for a second, crashing, coming up, crashing? If one alive signal immediately overrides 5 dead signals, you'd oscillate between dead and alive every round, triggering expensive responses (rerouting, rebalancing) each time.

The solution: symmetric recovery. If it takes 5 rounds to declare dead, it takes 5 successful rounds to declare alive again. Each successful health check decrements suspectRounds by 1. The node has to prove stability before it's trusted again.

This also raised a subtle bug: what happens if a dead node gets picked 200 times while it's down? suspectRounds grows to 200. Now it needs 200 successful rounds to recover — punished for being picked too often while dead. The fix: cap suspectRounds at the threshold. Once you're dead after 5 rounds, the counter stops growing. Recovery always takes exactly 5 rounds, regardless of how long the node was down.

The Consistency Window

There's a trade-off inherent to gossip: not everyone knows at the same time. Broadcasting tells everyone in one round. Gossip takes several rounds. During that window, some nodes know about a failure and others don't.

This can cause conflicting views. Node-42 dies. The death rumor starts spreading. But node-42 restarts before the rumor reaches everyone. Now half the cluster thinks it's dead, half knows it's alive. Conflicting information circulating simultaneously.

The resolution: timestamps on gossip messages. "Node-42 is dead (as of T=100)" vs "node-42 is alive (as of T=105)." When merging membership tables, the newer timestamp wins, regardless of who delivered it. Eventually, all nodes converge to the same view. This is called eventual convergence — it's not instant, but it's guaranteed.

The Implementation

Each node runs an Express server with two endpoints:

GET /health — "are you alive?" Returns a simple yes.
POST /gossip — "here's my membership table." Merges the incoming table, returns its own.

Every 2 seconds (one gossip round), each node:

Updates its own timestamp (proof it's alive)
Picks one random peer from the membership table
Health-checks that peer
Exchanges membership tables

The membership table merge is simple: for each entry, keep whichever has the newer lastUpdatedAt. No built-in Map.merge in JavaScript — a 5-line custom function.

Testing it: spin up 3 nodes. All discover each other within 2 rounds via the seed node. Kill one node by PID. Watch the suspect rounds climb: 1, 2, 3, 4, 5 → dead. The surviving nodes agree. Gossip works.

What We Built

Gossip for scalable information spreading — O(log N), constant per-node cost
Seed nodes for bootstrapping — one introduction, then you know everyone
Three states (alive/suspect/dead) instead of two — the suspect buffer prevents false rumors
SWIM protocol — secondary confirmation + quorum + gossip in one design
Symmetric recovery — proving stability before being trusted again
Eventual convergence — timestamps resolve conflicting information

But gossip and SWIM use fixed thresholds — 5 rounds, same for everyone. What if a node in the same datacenter and a node across the ocean need different thresholds? What if the "right" threshold changes as network conditions change? Next up: letting the system figure it out on its own.

← Previous: The Trade-off Maze — Push vs Pull, and Why "Is It Dead?" Is Harder Than It Sounds

Next → Getting Smart — Phi Accrual and the End of Hardcoded Timeouts