Nagios4+Gearman hanging in Centos 6.8

37 views
Skip to first unread message

rffe...@nebrasil.com.br

unread,
Feb 13, 2017, 8:25:38 AM2/13/17
to mod_gearman
Hello everyone,

Good morning.

Ive been trying to solve this issue for a long time now, but it's rather complicated throubleshooting since I'm a little bit of a novice to this setup. Here in our company we have a Nagios XI (Core 4.2.4) being offloaded by a Gearman Server which is being accessed through port forwarding by workers at remote sites, each processing a single hostgroup queue in their respective local networks.

The problem starts after around 3 to 4 hours of processing: "Jobs Waiting" begin to pile up in the check_results queue and this value keeps getting larger by the minute, without ever going down. Meanwhile, the Nagios XI services and hosts stop being processed completely and indefinitely until we reset the gearmand and nagios services. Tried installing the latest gearmand-server version provided by https://assets.nagios.com/downloads/nagiosxi/docs/Integrating_Mod_Gearman_with_Nagios_XI.pdf as well as Consol Labs Repositories, but nothing seems to change this behaviour.

While the jobs are stuck, we've found that some large amounts of CLOSE_WAIT connections for each of the workers are shown through "netstat -anp | grep 4730". Our structure consists of around 1050 services of which an average of 800 are handled by a sum of 15 workers. Please, would you be able to shed a light on what's going on?

Thank you very much for your attention and time!

Best regards,
Ramiro Fróes Ferrão
Reply all
Reply to author
Forward
0 new messages