Hi,
I was going through CockroachDb gossip protocol implementation.
One of the things that I was trying to understand in CockroachDb gossip design is why it maintains client connections to specific cluster nodes.
In other designs like SWIM gossip or Cassandra, the implementations relies on chosing a random cluster node for every gossip round.
Was there any specific reason for not choosing a random node for each gossip round? I can see that it allows maintaining per client versions to avoid one extra message round to exchange info version numbers to determine delta of gossip state (e.g. Cassandra needs a three way handshake to pass version numbers, and Hashicorp memberlist does a full state sync periodically without any kind of versioning), but not sure if that was the only reason, so wanted to confirm)
Wont this design be more vulnerable to network partitions?
I also see that node liveness is also persisted using the standard raft backed persistence for key ranges. But node address resolution still seems to rely on node addresses being gossiped. (particularly getLivenessLocked method seems to be consulting only in memory gossip state), and persisted liveness is not consulted?
Thanks,
Unmesh