large number of arp probes sent over lan

518 views
Skip to first unread message

Grégoire Seux

unread,
Nov 26, 2015, 2:22:14 PM11/26/15
to consu...@googlegroups.com
Hello,

I have recently activated consul on a datacenter with a medium sized
LAN (around 1,300 nodes) and noticed a large number of arp requests
sent (1.900 requests/sec).
Here is a screenshot:
https://snapshot.raintank.io/dashboard/snapshot/GiinHMNMvny3IT7RRdMRZ3Bf8wxpvoAp

As far as I understand, consul mainly exchange over udp for its
gossip. This leads to have a very large arp cache (it is correctly
sized on my servers though) but also to have a lots of stale entries
in that cache.

The reason is that consul does not have frequent interaction with each
other agent leading to expiration of the arp cache entry.
Default expiration is a random number in 30sec(+/-50%) so it is normal
that entries expire.

A solution would be to increase the cache expiration time
(net.ipv4.neigh.[interface].base_reachable_time_ms on linux, netsh int
ipv4 set interface [interface] basereachable on windows), the maximum
value (on windows) being 1hour.

But since udp is non-connected protocol, using the arp cache entry
cannot extend its lifetime as it would do on tcp (or even icmp).
Whatever the base_reachable_time you use, the entries will get stale
at some point (the kernel thinks you don't have confirmation of their
validity) and trigger an arp probe.

I've increased the value anyway to see the effect and decreased to
~1500 requests/sec (which is better but far from a near zero that I'd
like).
Screenshot: https://snapshot.raintank.io/dashboard/snapshot/lq3hx83m36C0l6o0Pru7g8MuVqNrfadG.

As a side note, I've fixed the advertised address that was randomly
picked by consul (some nodes have choosen an address without gateway
which is connected to loadbalancers) and decreased to 1200
requests/sec.


Of course this is probably the price to have a large network based on
layer 2, but I am interested to know if other users have encountered
such issues and the way they've solved it.

--
Gregoire

Darron Froese

unread,
Nov 26, 2015, 3:35:06 PM11/26/15
to consu...@googlegroups.com
I think Digital Ocean had some sort of problem like this as well:

http://youtu.be/LUgE-sM5L4A

I can't remember how they solved it - but the video should have some clues.

Hope that helps.
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/CAKvXUfhsFukFH6d6b%2BFp6n3fe0mdEjL4EQoJogz0AUb34DKK9w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Grégoire Seux

unread,
Nov 27, 2015, 4:37:39 AM11/27/15
to consu...@googlegroups.com
Thanks Darron for your answer,

I have seen this video indeed a few weeks ago and it helped us to
characterize the issue.
The solution for DO seems to be: tweak arp cache + use a recent kernel.

I've increased the cache size indeed (that was part of the update in
which I've increased base_reachable_time setting) even if the size was
under gc_thresh2 on all servers.
Regarding kernel update, it will be hard since I am still on centos 6

--
Grégoire

Andrey Larionov

unread,
Dec 4, 2015, 6:54:35 AM12/4/15
to Consul
Hello, Grégoire 

can you share your experience. We are planning to deploy consul on a large cluster, and want to be prepared. Could you tell which tools you used to monitor ARP requests and cache status?

This information could save much time of our ops.

Thanks.

Grégoire Seux

unread,
Dec 8, 2015, 6:23:54 AM12/8/15
to consu...@googlegroups.com
Hello Andrey,

our experience is summarized on the first mail of the thread. You are
likely to be impacted if your network in working on layer 2 (with more
than 1024 nodes in the lan). First actions are to tune the relevant
sysctl on linux and some netork parameters on windows. We have not
made in progress since then but the issue is now mitigated.

Our network team did not provide arp counts on their switch so we had
to improvise a crappy tcpdump sending points to graphite:
> while true; do sudo timeout 60 tcpdump -n arp[6:2] = 1 | grep tell | sed -re 's/^.*tell (.*),.*$/\1/' | sort | uniq -c | sed 's/\./_/g' | awk -v date=$(date +%s) '{print "debug.arp_received.by_"$2" "$1" "date}' | nc -v graphite-relay.storage.criteo.prod 3341; done


running during our investigations. Our network engineers are now
working to have a sustainable graphing of arp requests.

Hope that help a bit

--
Gregoire

Andrey Larionov

unread,
Dec 8, 2015, 1:01:51 PM12/8/15
to consu...@googlegroups.com, Grégoire Seux
Thanks, a lot. This one-liner definitely will help. Hope in future we will be able to gather this data from devices. But for now tcpdump is not as bad as nothing at all. 
-- 
Andrey Larionov
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to a topic in the Google Groups "Consul" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/consul-tool/ks9OAcSmYhY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to consul-tool...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/CAKvXUfhGJrFJWBt3Txzk_EQ2dCUQYiGQAApLWxgrN%2BjmkHLReQ%40mail.gmail.com.

Tonghao Zhang

unread,
Mar 15, 2018, 8:39:19 AM3/15/18
to Consul


On Friday, November 27, 2015 at 3:22:14 AM UTC+8, Grégoire Seux wrote:
Hello,

I have recently activated consul on a datacenter with a medium sized
LAN (around 1,300 nodes) and noticed a large number of arp requests
sent (1.900 requests/sec).
Here is a screenshot:
https://snapshot.raintank.io/dashboard/snapshot/GiinHMNMvny3IT7RRdMRZ3Bf8wxpvoAp

As far as I understand, consul mainly exchange over udp for its
gossip. This leads to have a very large arp cache (it is correctly
sized on my servers though) but also to have a lots of stale entries
in that cache.

The reason is that consul does not have frequent interaction with each
other agent leading to expiration of the arp cache entry.
Default expiration is a random number in 30sec(+/-50%) so it is normal
that entries expire.

A solution would be to increase the cache expiration time
(net.ipv4.neigh.[interface].base_reachable_time_ms on linux, netsh int
ipv4 set interface [interface] basereachable on windows), the maximum
value (on windows) being 1hour.

But since udp is non-connected protocol, using the arp cache entry
cannot extend its lifetime as it would do on tcp (or even icmp).
Hi, I dont understand why udp and tcp affect the arp lifetime differently ?
can you explain it? and i cant find it different in linux kernel.

Grégoire Seux

unread,
Mar 16, 2018, 4:23:53 AM3/16/18
to consu...@googlegroups.com
On Thu, Mar 15, 2018 at 1:39 PM Tonghao Zhang <xiangxi...@gmail.com> wrote:
Hi, I dont understand why udp and tcp affect the arp lifetime differently ?
can you explain it? and i cant find it different in linux kernel.


Hello Tonghao,

from what I remember, in any protocol where you can associate request/response (tcp with ack, icmp with pong, ..) the kernel will take the response as a confirmation that arp entry is still valid.
Using that information, it can extend the lifetime of the entry in the cache.

For udp, there is no such thing so the kernel relies solely on arp requests to feed the cache entries.

When there is no positive feedback for an existing mapping after some time (see the /proc interfaces below), a neighbor cache entry is considered stale. Positive feedback can be gotten from a higher layer; for example from a successful TCP ACK

and later:

There is no way to signal positive feedback from user space. This means connection-oriented protocols implemented in user space will generate excessive ARP traffic, because ndisc will regularly reprobe the MAC address. The same problem applies for some kernel protocols (e.g., NFS over UDP).


Now I'm looking at the linux kernel code and it seems that even udp packets can have the MSG_CONFIRM flag that would trigger arp cache entry extension.

Let me know if you find more information

-- 
Grégoire
Reply all
Reply to author
Forward
0 new messages