This morning users were having problems connecting to the beegfs filesystem.
Looking at the beegfs-mgmtd.log on the mgmtd server showed the following error:
(0) Jan18 09:25:59 StreamLis [StreamLis] >> Trying to continue after connection accept error: Error during socket accept(): Too many open files
I have already increased the limit of open files to 1000000, but this seems to only delay the problem.
[root@bgfs001 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1030239
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1000000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1030239
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimitedI have increased the limit of open files to
A workaround is to kill and restart the mgmtd, but I would like to know what the root cause is and how to prevent this from happening again.
I am running BeeGFS 7.1.1 on Oracle Linux 7.5 (RHEL clone—mostly) with kernel 4.1.12-103.9.4.el7uek.x86_64.
Any help would be appreciated.
Thanks,
Jim Burton
--
James Burton
OS and Storage Architect
Advanced Computing Infrastructure
Clemson University Computing and Information Technology
340 Computer Court
Anderson, SC 29625