2 nodes configuration

Simple Stuff

unread,

Nov 16, 2022, 11:20:35 AM11/16/22

to Serf

Hi. Our team is evaluating the possibility to use serf as a solution to detect and react to failures in a simple 2 nodes configuration, initially in active/passive mode.

We consider serf as a possible alternative to using more traditional solutions for such a use case, like keepalived, that require a strong and efficient network link between the two servers.

This would be interesting for us since, in a second step, we could use serf also to detect availability of other clusters, reachable through a non reliable network connection, without implementing another, dedicated, membership solution.

So does anybody have a positive feedback on using Serf in a 2 nodes only configuration ? Or are there any reason for which Serf would not fit such a deployment configuration ?

Dan Upton

unread,

Nov 17, 2022, 12:20:18 PM11/17/22

to Serf

Hi, thanks for your interest in Serf!

Although Serf could certainly be used in a configuration with only two nodes, most of the benefits of its gossip-style membership are only realized in larger deployments.
If the nodes are relatively static (i.e. have fixed IP addresses) then Serf wouldn't provide much value over a more traditional heartbeat-based solution with only two members.

That said, if you were planning to make use of Serf's other features, such as by using its event-handler system to automate failover, or would just prefer to avoid implementing your own heartbeat system, the Serf binary could be useful.

You also mentioned future interconnectivity with other clusters over an unreliable network. We've certainly seen Serf (and the underlying memberlist library) used to implement membership over higher latency links - this is how Consul's WAN federation works, for example.

For this, you'd need to tune the various parameters (e.g. timeouts, probe interval) to find the right balance between timely failure detection and handling transient connection issues. We've only really deployed Serf in situations where the network is basically reliable with occasional hiccups, rather than systems where members are frequently offline for extended periods. Your mileage may vary!

As you've mentioned that you're implementing an active/passive system, I wonder if Consul might be a better fit? It has primitives for implementing leader election, if that's useful for you: https://developer.hashicorp.com/consul/tutorials/developer-configuration/application-leader-elections

Hope that's helpful!

--

Dan Upton (he/him)

Senior Engineer, Consul Core Platform

Simple Stuff

unread,

Nov 18, 2022, 8:08:23 AM11/18/22

to ser...@googlegroups.com

Hi Dan. Thanks for taking time for this very complete answer. Clearly helpful.

I'm aware using Serf in a 2 nodes configuration is a bit curious.
We are indeed planning to make use of Serf's event-handler system to automate failover, you're right, and also for multi-cluster scenarios in the future.
Good to know that Serf is used in Consul for scenarios with high latency networks.
We are concerned with scenarios where connectivity may be lost and recovered quite often, so much often than on a traditional WAN... don't know if you are aware of other users targeting such deployments with serf and consul ?
I will have a look to the "application leader election with sessions" documentation of Consul. Thanks for the link. I guess this solution is for at least 3 services configurations (like for other solutions based on raft).

Just in case: did you identify other users that successfully make use of serf in a 2 nodes configuration ?

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/serf/issues
---
You received this message because you are subscribed to a topic in the Google Groups "Serf" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/serfdom/yC--n6KtV4E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to serfdom+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/serfdom/e97eb6bd-78c2-4690-aee1-58bfbfd51bacn%40googlegroups.com.

Dan Upton

unread,

Nov 22, 2022, 11:02:11 AM11/22/22

to ser...@googlegroups.com

No problem, glad it was helpful!

I'm not personally aware of Serf users with 2-node clusters or highly unstable networks, sorry about that! Maybe others can advise though. I'd definitely be interested to hear more about your system, and how the different clusters will interact and handle connectivity failures if you're happy to share details. 🙂

You're right that a failure-tolerant deployment of Consul (or any consensus-based system) would require at least 3 nodes, which I guess might be too much overhead for your use case?

To view this discussion on the web visit https://groups.google.com/d/msgid/serfdom/CAPxzhcKZquCwqiiyUOmi85sa%2B6TA%3Dret4XNeUSG9GEmrUR8Fyg%40mail.gmail.com.

Simple Stuff

unread,

Nov 23, 2022, 4:30:34 AM11/23/22

to Serf

Thanks for your feedback.

It's unfortunately not that easy to share more details at this stage. But I'll try to provide feedback (or new questions) if it may be valuable for the community.

Reply all

Reply to author

Forward