On Mar 20, 2015 2:08 PM, "Joe Stubbs" <jst...@tacc.utexas.edu
> Very cool, Luca. Can you share anything more about your deployment? We would be interested in the network topology details (data centers, firewalls, etc) and any performance metrics you may have regarding bandwidth consumption, convergence times, and so on, if you are measuring them. Also, is this pure serf or are you deploying Consul?
I only tested the software for few days and the cluster was mostly
idle, getting set up for acceptance tests, so take this info with the
consideration it deserves...
We were running serf on a flat 10ge network (all the nodes were in the
same network segment) so no routing no firewall.
If i remember well i noticed a cpu consumption of about 5 minutes a
day by the daemon on each node.
Bandwidth i have no clue i think there is a formula to calculate that
(and i dont think it represent an issue considering serf message size
and the fact that we are running on 10GE).
We are planning to use it to deliver user messages from our compute
nodes to our service nodes, i did some preliminary (and rough) tests
and noticed that the delivery of messages is basically instantaneous,
most of the nodes get the message in around a second or two, and some
of that delay might have been caused by the central rsyslogd receiving
all messages from all the nodes.
The critical issue is how is it going to perform under load when most
of our compute nodes CPU load will be consistently 100%?
This will introduce high latency in processing serf messages on all
nodes and it _might_ cause node flapping (going in and out of the
ATT: These are all my speculations I have no real number for this.
Since we are not planning to actively use cluster membership (who is
joining and who is leaving) this should not be a problem for us, in
fact messages will be delivered anyway thanks to the gossip protocol
characteristics (even if some nodes flap).
I hope that helps,