I just started testing serf, hoping to get this going in about 250 clients. A couple of questions, if they have discussed before, my apology for that.
As I was going through my initial testing, I noticed that 'left' nodes are never contacted. So, for example, if I have a designated node that I use as a headnode ( i,e. the nodes' IP address that I have used from other nodes to join the cluster) , and if, for some reason, 'serf' process is killed in that node ( headnode ) -- then the other nodes show that node as 'left' and serf agent from other nodes does not connect ( as the documentation states ) to that node.
Even though that node is up and running ( again ), I could not get that node visible again in the cluster without restarting 'serf' processes in other nodes.
How do other resolve this issue? Or is there any other means to get a node with 'left' status reporting back in the cluster (?), hopefully without restarting all the other nodes processes?
A nodes 'serf' process could be killed for many reasons, but, IMO... , it should have the ability to connect back up and others in the cluster should be able to see that node.