workers not picking up work

723 views
Skip to first unread message

Andrew Fong

unread,
Sep 11, 2012, 12:59:20 PM9/11/12
to mod_g...@googlegroups.com
I am running nagios 3.2.3, mod_gearman_worker 1.3.8 and gearmand v0.38 

I am not running any nagios checks via mod_gearman yet but I am simply testing to make sure things are working w/ nagios checks so far.

The workers seem to never pickup their health check work with gearman_top showing 1 job pending and 1 worker available.

It does not happen every time and sometimes when I restart everything the system runs as expected for a few hours. 

I am running mod_gearman_worker as such:

/usr/sbin/mod_gearman_worker --identifier=mon4-1 --server=mon1:4730 --services --hosts --keyfile=/srv/keys/gearman_key --max-jobs=50 --min-worker=50 --max-worker=50 idle-timeout=0 --p1_file=/usr/lib/nagios3/p1.pl --debug=2

gearmand is running just as described w/ -j 10 and -t 10

Nothing special there. 

Does anyone have any idea why this would be failing?

Andrew Fong

unread,
Sep 11, 2012, 1:01:07 PM9/11/12
to mod_g...@googlegroups.com
Here is the output of gearman_top after a check_gearman runs

2012-09-11 17:00:14  -  localhost:4730   -  v0.37

 Queue Name        | Worker Available | Jobs Waiting | Jobs Running
--------------------------------------------------------------------
 host              |             400  |           0  |           0
 service           |             400  |           0  |           0
 worker_mon4-1 |               1  |           1  |           0
 worker_mon4-2 |               1  |           1  |           0
 worker_mon4-3 |               1  |           0  |           0
 worker_mon4-4 |               1  |           0  |           0
 worker_mon5-1 |               1  |           0  |           0
 worker_mon5-2 |               1  |           0  |           0
 worker_mon5-3 |               1  |           0  |           0
 worker_mon5-4 |               1  |           0  |           0
--------------------------------------------------------------------

Jean Prat

unread,
Sep 12, 2012, 2:20:38 AM9/12/12
to mod_g...@googlegroups.com
Hi,

Check your mod_gearman log on the neb and thé workers. .
I'm not sure that libgearman 0.38 is compliant with mod_gearman.
Try libgearman 0.25.

Brian Aker

unread,
Sep 14, 2012, 1:09:41 AM9/14/12
to mod_g...@googlegroups.com
Hi,

On Sep 11, 2012, at 9:59 AM, Andrew Fong <and...@mazeline.com> wrote:

> The workers seem to never pickup their health check work with gearman_top showing 1 job pending and 1 worker available.

Can this be boiled down to a test that we can add to Gearman?

Thanks,
-Brian

andrew fong

unread,
Sep 14, 2012, 11:03:04 AM9/14/12
to mod_g...@googlegroups.com
I was just using the nagios check with the versions of gearman and mod_gearman listed in my email. I couldnt reproduce 100% of the time reliably but on most restarts they would wedge :/

Sven Nierlein

unread,
Sep 15, 2012, 3:45:25 AM9/15/12
to mod_g...@googlegroups.com
On 9/14/12 7:09, Brian Aker wrote:
> Can this be boiled down to a test that we can add to Gearman?


Hi Brian,

i will make some tests with todays gearman release and see if i can reproduce that.

Sven

Brian Aker

unread,
Sep 15, 2012, 9:15:46 PM9/15/12
to mod_g...@googlegroups.com
Hi!

On Sep 15, 2012, at 12:45 AM, Sven Nierlein <sv...@nierlein.de> wrote:

> i will make some tests with todays gearman release and see if i can reproduce that.

Awesome, that would be great.

If mod_gearman has a "make test" I could look at seeing how to incorporate that into gearman's regression system.

Cheers,
-Brian

Sven Nierlein

unread,
Sep 16, 2012, 5:58:46 PM9/16/12
to mod_g...@googlegroups.com
On 9/16/12 3:15, Brian Aker wrote:
> If mod_gearman has a "make test" I could look at seeing how to incorporate that into gearman's regression system.

Hi Brian,

Mod-Gearman has a make test. One test fails with the latest libgearman release, but thats only something about
exit code or logfile entries.

Sven

Erick Mendes

unread,
Mar 28, 2014, 4:43:33 PM3/28/14
to mod_g...@googlegroups.com
Hi people... I know this is an old thread, but I could use some help from you guys...

I've got a omd/nagios/gearmand box, it seems to be working fine with it's own workers, so I think... gearman_top shows 5 workers avaliable on host, eventhandler and service, but it shows 0 jobs waiting and 0 jobs running... 
It's also showing my other box (worker only), called german-01. On this box I'm running mod_german_worker with 5 threads, but gearman_top shows me only 1 avaliable worker, and it's also 0 for jobs waiting and jobs running:

2014-03-28 17:35:26  -  10.18.0.49:4730  -  v0.33

 Queue Name           | Worker Available | Jobs Waiting | Jobs Running
--------------------------------------------------------------------
 eventhandler            |               0  |           0  |           0
 host                        |               0  |           0  |           0
 service                    |               0  |           0  |           0
 worker_gearman-01  |               1  |           0  |           0
--------------------------------------------------------------------


The server box and the gearman-01 box can talk to each other over tcp 4370 without problems, I even tested it with telnet.
On the gearman-01 box, the worker is running like this:

nagios   30618  0.0  0.1 135604  3128 ?        S    17:36   0:00 /usr/local/bin/mod_gearman_worker -d --config=/etc/mod_gearman/mod_gearman/mod_gearman_worker.conf --pidfile=/var/mod_gearman/mod_gearman_worker.pid

The worker.conf file is pointing to the server box ip and port, the keyfile is the same for both, so I'm losing some hair here cause I can't find why these pals aren't working together...

check_gearman shows me this:

For server:

 /usr/bin/check_gearman -H localhost

check_gearman CRITICAL - Queue worker_nagios-omd-a has 1 job without any worker. |'eventhandler_waiting'=0;10;100;0 'eventhandler_running'=0 'eventhandler_worker'=5;25;50;0 'host_waiting'=0;10;100;0 'host_running'=0 'host_worker'=5;25;50;0 'service_waiting'=0;10;100;0 'service_running'=0 'service_worker'=5;25;50;0 'worker_gearman-01_waiting'=0;10;100;0 'worker_gearman-01_running'=0 'worker_gearman-01_worker'=1;25;50;0 'worker_nagios-omd-a_waiting'=1;10;100;0 'worker_nagios-omd-a_running'=0 'worker_nagios-omd-a_worker'=0;25;50;0

For gearman-01:

/opt/omd/versions/1.10/lib/nagios/plugins/check_gearman -H 10.18.0.49 -q worker_`hostname` -t 10 -s check

check_gearman OK - gearman-01 has 5 worker and is working on 0 jobs. Version: 1.4.14|worker=5;;;5;200 jobs=496c

Note that this number of jobs, 496c, seems to be addind just because of every check_gearman execution I did....
(Yes... I already run that 496 times.... : /  )

Any idea? help?



Reply all
Reply to author
Forward
0 new messages