Gearman Too many open files

1,108 views
Skip to first unread message

Alexey

unread,
Oct 30, 2015, 6:50:45 AM10/30/15
to Gearman
Hi All,

I asked the same question here, but there is no reply yet. I googled it and saw a topic in this group, but there was no real reason specified, why this is happening. I'm ready to provide the output of lsof, strace or explain what our app is doing. We run a nodejs application with 10 node workers connected to gearman and waiting for jobs to take a screenshot with slimer or phantomjs, save the file, convert/resize it, upload to S3 and then remove it. I believe we properly close all gearman and mongodb (we use it as our storage) connections. But the log is still growing.

What we noticed: there is a flood of these 

  ERROR 2015-10-29 13:05:37.000000 [  main ] accept(Too many open files) -> libgearman-server/gearmand.cc:788

records once every hour.

The log file can grow up to ~70 Gb in a day. 

I'd appreciate any help in finding the cause of this.

Thanks in advance!

Conrad Jones

unread,
Oct 30, 2015, 6:54:03 AM10/30/15
to gea...@googlegroups.com

The vaguely remember raising  limit on our workers, icant remember details as I am not near them and they are built by template, but some kind of ulimit option???

--
You received this message because you are subscribed to the Google Groups "Gearman" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gearman+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexey

unread,
Oct 30, 2015, 6:55:30 AM10/30/15
to Gearman
Hello Conrad,

could you explain in more detail, please? What limit and how do I increase it? 

Thanks.

пятница, 30 октября 2015 г., 11:54:03 UTC+1 пользователь Conrad Jones написал:

the noob

unread,
Oct 30, 2015, 7:01:24 AM10/30/15
to gea...@googlegroups.com
Off the top of my head this is about file descriptors, you need to increase that.

Conrad Jones

unread,
Oct 30, 2015, 7:02:43 AM10/30/15
to gea...@googlegroups.com

I'm not in a position too google it for you just google too many open files and ulimit

Alexey

unread,
Oct 30, 2015, 7:11:23 AM10/30/15
to Gearman
Thanks a lot for your help. I asked that because I did not understand the word "limit" (not "ulimit") in your initial message and I didn't get your English first. My bad - I'm not a native speaker. 

What I'm asking now is the reason of too many files open. I can increase the settings, but will it help if we hit it not in 1 day, but in a week? I believe one should fix the reason, not tuning OS settings to handle the problem in the nearest future.


пятница, 30 октября 2015 г., 12:02:43 UTC+1 пользователь Conrad Jones написал:

Conrad Jones

unread,
Oct 30, 2015, 7:21:51 AM10/30/15
to gea...@googlegroups.com

German itself uses quite a few you may be just a few over the limit, I would raise it and see what happens. i raised ours and the problem didn't reoccur so you may not be leaking open file handles you may just be needing more to do what you are doing concurrently. It's quite common with other applications too.

Brian Moon

unread,
Oct 30, 2015, 9:12:07 AM10/30/15
to gea...@googlegroups.com
On 10/30/15 11:50 , Alexey wrote:
> |ERROR 2015-10-29 13:05:37.000000 [ main ] accept(Too many open files) ->
> libgearman-server/gearmand.cc:788|

This usually happens due to workers and/or clients disconnecting and
reconnecting. If this happens too quickly, gearmand can use all of its
available file descriptors (an open connection uses an fd). I am not
going to go into how TCP works, but when one side closes a connection,
the other side does not always close it immediately. Also, some client
and worker libraries are bad about not closing the connection properly.
So, gearmand has to wait for the connection to time out since it was not
closed well.

It is very unlikely that you need to up your fd limit. That is just a
symptom of misbehaving clients and workers.

--

Brian.
--------
http://brian.moonspot.net/

Alexey

unread,
Oct 30, 2015, 9:19:14 AM10/30/15
to Gearman
Thanks Brian,

I also suspected something like that as I saw earlier too many connections between gearmand and worker clients in ESTABLISHED (not even TIME_WAIT) state. Here is, for example, just a really small snippet of `netstat -apn | grep ESTABLISHED` output:

0731/node
tcp        0      0 127.0.0.1:44998         127.0.0.1:4730          ESTABLISHED                           1449/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:34910         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:35136         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:37304         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:33123         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:55355         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:36462         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:54638         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:56845         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:35630         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:56210         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:38554         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:33052         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:39993         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:60178         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:44999         127.0.0.1:4730          ESTABLISHED                           1450/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:58376         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:38145         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:36341         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:55801         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:60882         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:45008         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:56186         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:58517         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:35300         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:60920         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:54439         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:36071         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:38568         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:58689         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:55719         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:34351         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:60850         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:57198         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:55893         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:33230         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:59466         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:36577         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:37470         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:36306         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:4730          127.0.0.1:60191         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:57617         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:39705         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:55149         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:44999         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:33581         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:4730          127.0.0.1:34453         ESTABLISHED                           810/gearmand
tcp        0      0 127.0.0.1:37988         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:59967         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:54755         127.0.0.1:4730          ESTABLISHED                           30731/node
tcp        0      0 127.0.0.1:55355         127.0.0.1:4730          ESTABLISHED    



пятница, 30 октября 2015 г., 14:12:07 UTC+1 пользователь brianlmoon написал:

Alexey

unread,
Oct 30, 2015, 9:20:09 AM10/30/15
to Gearman
do you think there is any other option than replacing https://github.com/andris9/node-gearman client with something else? Assuming we close gearman connection in the application and end job in workers properly?


пятница, 30 октября 2015 г., 14:19:14 UTC+1 пользователь Alexey написал:

Brian Moon

unread,
Oct 30, 2015, 10:23:16 AM10/30/15
to gea...@googlegroups.com
On 10/30/15 14:19 , Alexey wrote:
> Thanks Brian,
>
> I also suspected something like that as I saw earlier too many
> connections between gearmand and worker clients in ESTABLISHED (not even
> TIME_WAIT) state. Here is, for example, just a really small snippet of
> `netstat -apn | grep ESTABLISHED` output:

netstat is not accurate for reasons I don't know.

These commands are better:

# cat /proc/`cat /var/run/gearmand/gearmand.pid`/status | fgrep FDSize
# lsof -p `cat /var/run/gearmand/gearmand.pid` | fgrep TCP | wc -l

Of course, replace that path with the path to your gearmand pid file.

That won't help though with these events most likely. You have to catch
this when it happens. It does not take long (milliseconds) for this to
happen. By the time you log in to the server, it is gone.

Here is the script we use to monitor this.

https://gist.github.com/brianlmoon/1e03b89958492b5ef49b

If you run it with -d, it will dump to the screen. This does fork lsof.
I need to update it to read /proc/net/tcp instead. But, it's low impact
and should be able to run with -t 1 to refresh every second. Put it in a
screen and let it run. When this happens again, go look at the screen
session and scroll back to where it happened.

Alexey

unread,
Nov 2, 2015, 3:49:17 AM11/2/15
to Gearman
Hi Brian,
I installed lua and daemontools and tried to launch that script (from root) : 

root@ip-10-0-1-94:~# ./gearman_statsd.lua
./gearman_statsd.lua:110: attempt to index a nil value





пятница, 30 октября 2015 г., 15:23:16 UTC+1 пользователь brianlmoon написал:

Sisavang SAYAVONG

unread,
Jan 18, 2017, 4:50:42 AM1/18/17
to Gearman
Hello,

We have the same behavior with our job scheduler gearman. Sometimes, gearmand crashes and the reason is that it reaches the maximum number of open files (that is 22000 in our case).
I found that by monitoring the number of FD of the process. Now I don't think increasing the max number of FD will resolve this problem. Maybe there is a root cause ?

Earlier I created a post on the nagios forum (because we use gearman with Nagios). There is more information on this post : https://support.nagios.com/forum/viewtopic.php?f=7&t=41375&p=208334

Sorry for bothering you and sorry for my english.

Sisavang

Алексей Пастухов

unread,
Jan 18, 2017, 5:32:17 AM1/18/17
to Gearman
Hi Sisavang.
You are using mod-gearman. There is an an other issue describes misbehaviour of gearmand with mod-gearman  on github.
Maybe it's a mod-gearman issue. See the brianlmoon's message from 30 October 2015 14:12:07 UTC+1, brianlmoon.
I'll open an issue in mod-gearman github repo.

Regards,
Alexei
  

Sisavang SAYAVONG

unread,
Jan 19, 2017, 6:08:06 AM1/19/17
to Gearman
Hello,

Thank you for your reply. It's not a bad idea to catch which worker floods the gearmand.
I'm going to write a little script to get the list of open sockets when gearmand reaches the limit.
Then I will check the log of the mod-gearman that floods.

Thank you again

Sisavang

Brian Moon

unread,
Jan 19, 2017, 1:48:13 PM1/19/17
to gea...@googlegroups.com
On 1/19/17 5:08 , Sisavang SAYAVONG wrote:
> Hello,
>
> Thank you for your reply. It's not a bad idea to catch which worker
> floods the gearmand.
> I'm going to write a little script to get the list of open sockets when
> gearmand reaches the limit.
> Then I will check the log of the mod-gearman that floods.

This is the script I used when diagnosing our fd limit issues.

https://gist.github.com/anonymous/1babbf86b8d3714d986f3b850b74009c

Алексей Пастухов

unread,
Jan 23, 2017, 4:22:27 AM1/23/17
to Gearman
If you use a newest gearmand release with mod-gearman, it could be blamed for the issue.
See

Sisavang SAYAVONG

unread,
Jan 25, 2017, 3:55:48 AM1/25/17
to Gearman
Thank you everybody.
We set a little script to find the mod-gearman that opens a lot a connections.
Wait and see the next gearmand crash.

Sisa

Sisavang SAYAVONG

unread,
Feb 28, 2017, 3:18:14 AM2/28/17
to Gearman
Hello,

The output of rpm -qa | grep -i gearman :
rpm -qa | grep -i gearman
mod_gearman2-2.1.1-1.el6.x86_64
gearmand-0.33-2.x86_64
gearmand-server-0.33-2.x86_64
gearmand-devel-0.33-2.x86_64


gearmand has crashed twice and my script catched the list of open files while crashing. the gearmand reached the limit of open files (22000) and I thought I will find out the worker that opens a lot of connections but I didn't.
In the output, we have a lot of "can't identify protocol", at least 19000 ...

Maybe I should exec a netstat while gearmand crashes ?

Thank you,
Sisa

Clint Byrum

unread,
Feb 28, 2017, 3:26:35 AM2/28/17
to gearman
Excerpts from Sisavang SAYAVONG's message of 2017-02-28 00:18:14 -0800:
> Hello,
>
> The output of rpm -qa | grep -i gearman :
> rpm -qa | grep -i gearman
> mod_gearman2-2.1.1-1.el6.x86_64
> gearmand-0.33-2.x86_64
> gearmand-server-0.33-2.x86_64
> gearmand-devel-0.33-2.x86_64
>

That release is now 5 years old:

https://launchpad.net/gearmand/1.0/0.33

It's possible a bug has been fixed that would make it close dead
connections faster.

If nothing else, maybe try 1.0.6, which is 4 years old, but has a fair
amount of bug fixes:

https://launchpad.net/gearmand/1.0/1.0.6

1.1.x are a bit more aggressive, and are available on github:

https://github.com/gearman/gearmand/releases

>
> gearmand has crashed twice and my script catched the list of open files
> while crashing. the gearmand reached the limit of open files (22000) and I
> thought I will find out the worker that opens a lot of connections but I
> didn't.
> In the output, we have a lot of "can't identify protocol", at least 19000
> ...
>
> Maybe I should exec a netstat while gearmand crashes ?
>

netstat -pan should give you the most information to track down the
connections. You'll need that to run as root.

> Thank you,
> Sisa
>
> Le lundi 23 janvier 2017 10:22:27 UTC+1, Алексей Пастухов a écrit :
> >
> > If you use a newest gearmand release with mod-gearman, it could be blamed
> > for the issue.
> > See
> > Mod Gearman supported dependencies
> > <https://labs.consol.de/nagios/mod-gearman/index.html#_supported_dependencies>
> > "Too many open files" in combination with mod-gearman
> > <https://github.com/sni/mod_gearman/issues/105#issuecomment-273863998>
> >
> > On Thursday, 19 January 2017 12:08:06 UTC+1, Sisavang SAYAVONG wrote:
> >>
> >> Hello,
> >>
> >> Thank you for your reply. It's not a bad idea to catch which worker
> >> floods the gearmand.
> >> I'm going to write a little script to get the list of open sockets when
> >> gearmand reaches the limit.
> >> Then I will check the log of the mod-gearman that floods.
> >>
> >> Thank you again
> >>
> >> Sisavang
> >>
> >> Le mercredi 18 janvier 2017 11:32:17 UTC+1, Алексей Пастухов a écrit :
> >>>
> >>> Hi Sisavang.
> >>> You are using mod-gearman. There is an an other issue describes
> >>> misbehaviour of gearmand with mod-gearman on github
> >>> <https://github.com/gearman/gearmand/issues/7>.
> >>> Maybe it's a mod-gearman issue. See the brianlmoon's message from 30
> >>> October 2015 14:12:07 UTC+1, brianlmoon.
> >>> I'll open an issue in mod-gearman github repo
> >>> <https://github.com/sni/mod_gearman/issues>.

Faustino Olpindo

unread,
Feb 28, 2017, 7:01:57 PM2/28/17
to gea...@googlegroups.com
Hi,

Please do increase ulimit for the user which runs your scripts.

Thanks and regards

--
You received this message because you are subscribed to the Google Groups "Gearman" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gearman+unsubscribe@googlegroups.com.

Алексей Пастухов

unread,
Apr 27, 2017, 3:06:34 AM4/27/17
to Gearman
Dear all, 
last release of gearmand logs "Too many open files" as well.

I spent sometime for testing it extensively and describes in in the github issue:

From my point of view it's a bed idea to log the error at all. But I could't find any  implementation bug that force gearmand to hold any connection without demand.

Sisavang SAYAVONG

unread,
May 30, 2017, 5:37:10 AM5/30/17
to Gearman
Hi all,

Today, our gearmand sounds great. I posted what we did to fix our issue on nagios forum : https://support.nagios.com/forum/viewtopic.php?f=7&t=41375&p=222354#p222354

Thank you for your help

Sisa

Le vendredi 30 octobre 2015 11:50:45 UTC+1, Alexey a écrit :

Алексей Пастухов

unread,
May 30, 2017, 5:43:27 AM5/30/17
to Gearman
Thank you for sharing your experience.

What version of gearmand do you use?

Cheers,
Alexei

Sisavang SAYAVONG

unread,
Jun 6, 2017, 11:25:30 AM6/6/17
to Gearman
Hello,

The current version we use :
/usr/sbin/gearmand --version

Regards,
Sisa
Reply all
Reply to author
Forward
0 new messages