flapping issue in EC2

811 views
Skip to first unread message

Paul Lorenz

unread,
May 27, 2015, 10:48:10 PM5/27/15
to consu...@googlegroups.com
Hi all,
  We're doing some performance testing in AWS using consul, and we're seeing a flapping issue, manifested as below. I know that in general this is due to network reachability issues, however the instances are running in the same security group and all instances in the security group have all ports, tcp and udp, open to other members of the security group. 

The problem only seems to start after we start testing, which is quite CPU and network intensive. We've set GOMAXPROCS to nprocs (with a minimum of 2) on all nodes, to try and ensure that consul is not CPU starved. 

Any thoughts on what could be causing the flapping or any diagnostics we could run?

Thank you,
Paul


May 28 02:02:28 ip-172-31-10-203 consul[2006]: consul: adding server i-52001c9b (Addr: 172.31.3.112:8300) (DC: us-west-2)
May 28 02:02:35 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-3ba03ccc has failed, no acks received
May 28 02:02:45 ip-172-31-10-203 consul[2006]: memberlist: Marking i-3ba03ccc as failed, suspect timeout reached
May 28 02:02:45 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:02:48 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:03:07 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-39a03cce 172.31.36.149
May 28 02:03:15 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-39a03cce 172.31.36.149
May 28 02:03:41 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:03:41 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:04:03 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-27a03cd0 172.31.36.148
May 28 02:04:03 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:04:04 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:04:16 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-27a03cd0 172.31.36.148
May 28 02:04:44 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:04:49 ip-172-31-10-203 consul[2006]: memberlist: Marking i-f9a5390e as failed, suspect timeout reached
May 28 02:04:49 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f9a5390e 172.31.39.131
May 28 02:04:50 ip-172-31-10-203 consul[2006]: memberlist: Marking i-38a03ccf as failed, suspect timeout reached
May 28 02:04:50 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:04:50 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:04:51 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-f9a5390e 172.31.39.131
May 28 02:04:51 ip-172-31-10-203 consul[2006]: memberlist: Marking i-c85cc03f as failed, suspect timeout reached
May 28 02:04:51 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c85cc03f 172.31.47.108
May 28 02:04:54 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c85cc03f 172.31.47.108
May 28 02:05:02 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:05:03 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:05:03 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:05:04 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-39a03cce 172.31.36.149
May 28 02:05:14 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-cb5cc03c 172.31.47.111
May 28 02:05:15 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-39a03cce 172.31.36.149
May 28 02:05:16 ip-172-31-10-203 consul[2006]: memberlist: Marking i-3ba03ccc as failed, suspect timeout reached
May 28 02:05:16 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:05:17 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:05:19 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-3ba03ccc has failed, no acks received
May 28 02:05:21 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-f9a5390e has failed, no acks received
May 28 02:05:24 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-cb5cc03c 172.31.47.111
May 28 02:05:25 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:05:25 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f8a5390f 172.31.39.130
May 28 02:05:26 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:05:26 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-f8a5390f 172.31.39.130
May 28 02:05:29 ip-172-31-10-203 consul[2006]: memberlist: Marking i-3ba03ccc as failed, suspect timeout reached
May 28 02:05:29 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:05:32 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:05:34 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-39a03cce 172.31.36.149
May 28 02:05:37 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-3ba03ccc has failed, no acks received
May 28 02:05:44 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-52001c9b 172.31.3.112
May 28 02:05:44 ip-172-31-10-203 consul[2006]: consul: removing server i-52001c9b (Addr: 172.31.3.112:8300) (DC: us-west-2)
May 28 02:05:45 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-52001c9b 172.31.3.112
May 28 02:05:45 ip-172-31-10-203 consul[2006]: consul: adding server i-52001c9b (Addr: 172.31.3.112:8300) (DC: us-west-2)
May 28 02:05:46 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-39a03cce 172.31.36.149
May 28 02:05:47 ip-172-31-10-203 consul[2006]: memberlist: Marking i-3ba03ccc as failed, suspect timeout reached
May 28 02:05:47 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:05:50 ip-172-31-10-203 consul[2006]: memberlist: Marking i-c85cc03f as failed, suspect timeout reached
May 28 02:05:50 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c85cc03f 172.31.47.108
May 28 02:06:01 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:06:06 ip-172-31-10-203 consul[2006]: memberlist: Refuting a suspect message (from: i-ac011d65)
May 28 02:06:21 ip-172-31-10-203 consul[2006]: memberlist: Marking i-38a03ccf as failed, suspect timeout reached
May 28 02:06:21 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:06:21 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:06:25 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c85cc03f 172.31.47.108
May 28 02:06:41 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-2bab37dc 172.31.46.127
May 28 02:06:41 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-2bab37dc 172.31.46.127
May 28 02:06:42 ip-172-31-10-203 consul[2006]: memberlist: Marking i-c85cc03f as failed, suspect timeout reached
May 28 02:06:42 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c85cc03f 172.31.47.108
May 28 02:06:42 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c85cc03f 172.31.47.108
May 28 02:06:45 ip-172-31-10-203 consul[2006]: memberlist: Refuting a suspect message (from: i-f8a5390f)
May 28 02:06:48 ip-172-31-10-203 consul[2006]: memberlist: Marking i-c95cc03e as failed, suspect timeout reached
May 28 02:06:48 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c95cc03e 172.31.47.109
May 28 02:06:48 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c95cc03e 172.31.47.109
May 28 02:07:25 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:07:34 ip-172-31-10-203 consul[2006]: memberlist: Marking i-f8a5390f as failed, suspect timeout reached
May 28 02:07:34 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f8a5390f 172.31.39.130
May 28 02:07:34 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:07:38 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c95cc03e 172.31.47.109
May 28 02:07:38 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c95cc03e 172.31.47.109
May 28 02:07:39 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-f8a5390f 172.31.39.130
May 28 02:07:43 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-f9a5390e has failed, no acks received
May 28 02:08:00 ip-172-31-10-203 consul[2006]: memberlist: Marking i-3ba03ccc as failed, suspect timeout reached
May 28 02:08:00 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:08:04 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:08:08 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c85cc03f 172.31.47.108
May 28 02:08:12 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-27a03cd0 has failed, no acks received
May 28 02:08:14 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-c95cc03e has failed, no acks received
May 28 02:08:16 ip-172-31-10-203 consul[2006]: memberlist: Marking i-3ba03ccc as failed, suspect timeout reached
May 28 02:08:16 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-3ba03ccc 172.31.36.151
May 28 02:08:21 ip-172-31-10-203 consul[2006]: memberlist: Marking i-f9a5390e as failed, suspect timeout reached
May 28 02:08:21 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f9a5390e 172.31.39.131
May 28 02:08:23 ip-172-31-10-203 consul[2006]: memberlist: Refuting a dead message (from: i-39a03cce)
May 28 02:08:25 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c85cc03f 172.31.47.108
May 28 02:08:34 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-3ba03ccc 172.31.36.151
May 28 02:08:36 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-f9a5390e 172.31.39.131
May 28 02:08:39 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c85cc03f 172.31.47.108
May 28 02:08:55 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c85cc03f 172.31.47.108
May 28 02:09:03 ip-172-31-10-203 consul[2006]: memberlist: Marking i-39a03cce as failed, suspect timeout reached
May 28 02:09:03 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-39a03cce 172.31.36.149
May 28 02:09:04 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-39a03cce 172.31.36.149
May 28 02:09:05 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:09:06 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:09:14 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-c85cc03f has failed, no acks received
May 28 02:09:23 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c85cc03f 172.31.47.108
May 28 02:09:23 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c85cc03f 172.31.47.108
May 28 02:09:26 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f8a5390f 172.31.39.130
May 28 02:09:27 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-f8a5390f 172.31.39.130
May 28 02:09:34 ip-172-31-10-203 consul[2006]: memberlist: Refuting a suspect message (from: i-c85cc03f)
May 28 02:09:48 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-39a03cce has failed, no acks received
May 28 02:09:50 ip-172-31-10-203 consul[2006]: memberlist: Refuting a suspect message (from: i-2bab37dc)
May 28 02:10:03 ip-172-31-10-203 consul[2006]: memberlist: Refuting a suspect message (from: i-27a03cd0)
May 28 02:10:22 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-2bab37dc 172.31.46.127
May 28 02:10:22 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-2bab37dc 172.31.46.127
May 28 02:10:23 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f9a5390e 172.31.39.131
May 28 02:10:24 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-f9a5390e 172.31.39.131
May 28 02:10:26 ip-172-31-10-203 consul[2006]: memberlist: Refuting a suspect message (from: i-f8a5390f)
May 28 02:10:30 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-de53cf29 172.31.38.237
May 28 02:10:32 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-df53cf28 172.31.38.236
May 28 02:10:33 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-f8a5390f has failed, no acks received
May 28 02:10:41 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-39a03cce 172.31.36.149
May 28 02:10:43 ip-172-31-10-203 consul[2006]: memberlist: Suspect i-38a03ccf has failed, no acks received
May 28 02:10:46 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-39a03cce 172.31.36.149
May 28 02:10:49 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-8151cd76 172.31.38.235
May 28 02:10:53 ip-172-31-10-203 consul[2006]: memberlist: Marking i-38a03ccf as failed, suspect timeout reached
May 28 02:10:53 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:10:55 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-8051cd77 172.31.38.234
May 28 02:11:08 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:11:09 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f9a5390e 172.31.39.131
May 28 02:11:24 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-f9a5390e 172.31.39.131
May 28 02:11:27 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-52001c9b 172.31.3.112
May 28 02:11:27 ip-172-31-10-203 consul[2006]: consul: removing server i-52001c9b (Addr: 172.31.3.112:8300) (DC: us-west-2)
May 28 02:11:27 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-52001c9b 172.31.3.112
May 28 02:11:27 ip-172-31-10-203 consul[2006]: consul: adding server i-52001c9b (Addr: 172.31.3.112:8300) (DC: us-west-2)
May 28 02:11:41 ip-172-31-10-203 consul[2006]: memberlist: Marking i-f9a5390e as failed, suspect timeout reached
May 28 02:11:41 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f9a5390e 172.31.39.131
May 28 02:11:41 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-f9a5390e 172.31.39.131
May 28 02:11:43 ip-172-31-10-203 consul[2006]: memberlist: Marking i-c85cc03f as failed, suspect timeout reached
May 28 02:11:43 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c85cc03f 172.31.47.108
May 28 02:11:55 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c85cc03f 172.31.47.108
May 28 02:12:01 ip-172-31-10-203 consul[2006]: memberlist: Marking i-38a03ccf as failed, suspect timeout reached
May 28 02:12:01 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:12:01 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:12:24 ip-172-31-10-203 consul[2006]: memberlist: Marking i-c85cc03f as failed, suspect timeout reached
May 28 02:12:24 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-c85cc03f 172.31.47.108
May 28 02:12:27 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-c85cc03f 172.31.47.108
May 28 02:12:36 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150
May 28 02:12:36 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-38a03ccf 172.31.36.150
May 28 02:12:54 ip-172-31-10-203 consul[2006]: memberlist: Marking i-8151cd76 as failed, suspect timeout reached
May 28 02:12:54 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-8151cd76 172.31.38.235
May 28 02:12:54 ip-172-31-10-203 consul[2006]: serf: EventMemberJoin: i-8151cd76 172.31.38.235
May 28 02:12:58 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-f8a5390f 172.31.39.130
May 28 02:13:00 ip-172-31-10-203 consul[2006]: memberlist: Marking i-38a03ccf as failed, suspect timeout reached
May 28 02:13:00 ip-172-31-10-203 consul[2006]: serf: EventMemberFailed: i-38a03ccf 172.31.36.150

Igor Cicimov

unread,
May 28, 2015, 4:31:39 AM5/28/15
to consu...@googlegroups.com
Same here and that is without load. Wonder if that serf timeout can be tuned maybe increasing it can help?

dan phrawzty

unread,
May 29, 2015, 12:23:25 PM5/29/15
to consu...@googlegroups.com
We're seeing exactly the same issue in EC2. :(

We thought that it might be a problem with the security group, so we opened up 830[0-2]/tcp and 830[1-2]/udp on all of our nodes internally - this did not help.

Anybody else seeing this behaviour - or, better yet, come up with an answer?


--
dan.

Armon Dadgar

unread,
May 29, 2015, 7:59:33 PM5/29/15
to consu...@googlegroups.com, dan phrawzty
Are the machines running Consul either completely saturating the network or CPU?

Under high load conditions, the Consul agents will not process the failure detection messages
in a timely manner, causing them to be marked as failed. (Arbitrarily slow is hard to distinguish from failed).
If that is the case, you can experiment with giving a very high priority nice value to Consul to ensure
timely processing of messages. Otherwise, hopefully with 0.6 we will expose more tuning values of
the lower level systems like the Serf failure detector to allow you to increase the various timeout intervals.

Best Regards,
Armon Dadgar
--
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Helmer

unread,
May 29, 2015, 8:15:22 PM5/29/15
to Armon Dadgar, consu...@googlegroups.com, dan phrawzty
In our (Mozilla's) case, the load is not particularly high. Clients
seem to disconnect rather quickly in some cases, not in others - we're
still trying to diagnose exactly what the problem is, before we pin
all the blame on Consul.

We have infra in different AZs within the same region, so maybe it's
just network latency between certain machines?

If this does turn out to be the case, being able to tune the Serf
failure detector to be a bit more lenient would be great.

Robert Helmer

unread,
May 30, 2015, 12:18:50 AM5/30/15
to Robert Helmer, Armon Dadgar, consu...@googlegroups.com, dan phrawzty
After digging into this a bit more, I found we were not allowing UDP
egress from any nodes.

I applied this to our terraform configs and it seems much healthier now:
https://github.com/mozilla/socorro-infra/pull/145

Still testing, and I haven't applied it to all of our apps yet, but
everything I've applied that to now stays connected. I started with
the server cluster and haven't seen those disconnect from each other
yet, before it was happening almost immediately.

Paul Lorenz

unread,
Jun 3, 2015, 12:54:43 PM6/3/15
to Armon Dadgar, consu...@googlegroups.com, dan phrawzty
Hi Armon,
  Thank you for the advice. I tried nicing consul to -20, however, it did not help. We are saturating CPU (load avg of 4-5 on a 2 VCPU machine) and have heavy network use, not sure if we are saturating it or not. We are mostly able to work around the issue by taking consul failures as advisory, not definitive, but we're looking forward to 0.6.0 to see if tweaking serf timouts helps.

Thank you,
Paul

Armon Dadgar

unread,
Jun 3, 2015, 3:22:09 PM6/3/15
to Paul Lorenz, consu...@googlegroups.com, dan phrawzty
Hey Paul,

You may want to try setting the scheduling priority to realtime (SCHED_FIFO) with chrt for the Consul process.
Given what you have said, the machines are being staved of CPU time, and this is likely causing issues.
Be sure that all threads have their priority changed, not just the main thread.

Best Regards,
Armon Dadgar
Message has been deleted
Message has been deleted

DJ Enriquez

unread,
Aug 21, 2015, 11:51:23 AM8/21/15
to Consul
Hi all,

Sorry to bump this topic, but has there been any solutions to this flapping problem? We have a very similar setup in EC2 except everything running in Docker. Have there been any new recommendations to changes in configs to help with the flapping??

Thanks,
DJ Enriquez

Joshua Garnett

unread,
Aug 21, 2015, 11:55:19 AM8/21/15
to consu...@googlegroups.com
I had issues when my security group wasn't setup properly.  Here is a snippet from my terraform config:

  # Internal consul ports
  ingress {
      from_port = 8301
      to_port = 8301
      protocol = "tcp"
      cidr_blocks = ["${var.vpc_cidr}"]
  }

  ingress {
      from_port = 8301
      to_port = 8301
      protocol = "udp"
      cidr_blocks = ["${var.vpc_cidr}"]
  }

I believe I originally only opened up UDP.  Opening up TCP also resolved issues.

--Josh

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.

James Phillips

unread,
Aug 21, 2015, 1:02:05 PM8/21/15
to consu...@googlegroups.com
We also found a couple folks that were running into a Xen network driver bug that was disrupting TCP communications between nodes in the cluster:


If you see any "xen_netfront: xennet: skb rides the rocket: 19 slots" log messages from the kernel you'll definitely want to fix that up to make things more stable.

-- James

DJ Enriquez

unread,
Aug 28, 2015, 5:37:55 PM8/28/15
to Consul
Hi Josh,

We have 8300-8302/tcp & udp, 8400/tcp and 8500/tcp open.

I don't believe this issue is necessarily "critical" but I do know that the flapping causes consul-template to rerun over and over again. This means our NGINX config reloading over and over again on flapping...

James,

No xen_netfront issues that we can see. Thanks for the heads up though.

James Phillips

unread,
Aug 31, 2015, 6:27:26 PM8/31/15
to consu...@googlegroups.com
Hi DJ - it might be good to open a Github issue on Consul so we can work your problem in more detail over there.

DJ Enriquez

unread,
Sep 1, 2015, 2:25:13 PM9/1/15
to Consul
Thanks, James.

Reply all
Reply to author
Forward
0 new messages