Global ulimit settings for the CoreOS host itself

792 views
Skip to first unread message

Megyesi Dániel

unread,
Oct 13, 2015, 10:04:34 AM10/13/15
to CoreOS User
Hi guys,

we're facing a network timeout issue in production when we have a lot of incoming traffic towards our web servers on a cluster of CoreOS machines.

I am suspecting it can be a problem with the number of open file descriptors (ulmit).
I have already raised this limit for the Docker daemon itself:

- path: /etc/systemd/system/docker.service.d/increase-ulimit.conf
  owner: core:core
  permissions: 0644
  content: |
    [Service]
    LimitNOFILE=20480

and it seems to be working fine with the incoming connections from the outside world to the Apaches.

However, the Apache containers are connecting to a pool of Tomcats on the same network (but those are bare metals, not in Docker or on CoreOS). And these connections between Apache and the non-dockerized Tomcats have timeout issues (once every 30-60 seconds) when we have a lot of concurrent requests.

The same configuration on non-dockerized Apache doesn't have these network timeout issues, so I'm suspecting it's either something with Docker's NAT performance or something related to the networking of CoreOS.


Right now, the only thing I can suspect is the number of open file descriptors on the CoreOS host itself - so not for the Docker daemon, but for a service(?) which handles the networking in CoreOS. The default ulimit is 1024, and it might be possible we run out of this limit and this causes the issues. Do you think this can be the problem?

My questions:
1. Is there a debug option to see in syslog if we reach an ulimit setting and it causes connection dropping?

2. Is it possible it's not enough to raise ulimit only for the Docker daemon?
- If yes, what else should I tweak? The systemd-networkd service unit file or is it possible to define globally on CoreOS (for the root user)? On Linux, it would be /etc/security/limits.conf or something like that, but how do I do it on CoreOS?


Please feel free to let me know if you have any other suggestions on the possible problem with the network connectinos - right now I can only think of a ulimit-related issue.

Thank you very much for your support in advance!


Best regards,

Daniel

anton....@coreos.com

unread,
Oct 13, 2015, 11:58:27 AM10/13/15
to CoreOS User
Dear Daniel,

Try to tune "net.nf_conntrack_max" sysctl value. Docker containers use iptables NAT so probably conntrack buffer is overflowed.

Let me know if that doesn't help.

Regards,
Anton
Reply all
Reply to author
Forward
0 new messages