nagios not checking anything

898 views
Skip to first unread message

Cory Coager

unread,
Jan 7, 2011, 8:43:45 AM1/7/11
to mod_gearman
This is a new setup so this hasn't worked yet. I have gearman running
and the worker node is talking to the server, I can see this with
queue_top.pl. Upon starting nagios the server goes through all the
host and service checks once and never again! Then I see entries in
the nagios log about host and service checks being stale so the server
is forcing the check:

[1294407569] Warning: The check of host 'example01' looks like it was
orphaned (results never came back). I'm scheduling an immediate check
of the host...

Why is this happening?

Sven Nierlein

unread,
Jan 7, 2011, 8:53:59 AM1/7/11
to mod_g...@googlegroups.com

queue_top.pl is the old one. Use gearman_top instead. Do you see any worker connected to your gearmand?
Are the queues filling up?

Cory Coager

unread,
Jan 7, 2011, 9:06:16 AM1/7/11
to mod_gearman
> queue_top.pl is the old one. Use gearman_top instead. Do you see any worker connected to your gearmand?
> Are the queues filling up?

OK, checking gearman_top. I see 1 worker available for host, service
and 2 for check_results. If I kill the worker process the jobs
waiting fills up from 0 to 318 for both service and host queues. If I
restart the worker the jobs waiting goes to 0.

Sven Nierlein

unread,
Jan 7, 2011, 9:23:16 AM1/7/11
to mod_g...@googlegroups.com

2 for check_results is 1 too much. Nagios usually only starts a single worker for the results queue.
Seems like there is a results worker which takes away all the results, which then leads to the
"results never came back" never came back messages.
Is there an orphaned nagios process running or something like that?

Cory Coager

unread,
Jan 7, 2011, 9:26:40 AM1/7/11
to mod_gearman
> 2 for check_results is 1 too much. Nagios usually only starts a single worker for the results queue.
> Seems like there is a results worker which takes away all the results, which then leads to the
> "results never came back" never came back messages.
> Is there an orphaned nagios process running or something like that?
Actually that was my fault. I started the worker node with the
check_results queue as well. Sorry this is my first experience with
gearman and I couldn't find a lot of information on how to set this up.

Cory Coager

unread,
Jan 7, 2011, 9:41:53 AM1/7/11
to mod_gearman
OK, right now there is 1 worker available for check_results, host and
service. There are 0 jobs waiting and 0 jobs running in the queue for
each. However, I am still getting orphaned messages in the nagios
logs.

Sven Nierlein

unread,
Jan 7, 2011, 10:08:53 AM1/7/11
to mod_g...@googlegroups.com

Could you try to stop Nagios and move the retention.dat away so you have a clean start.
check_gearman is able to monitor the check counter for the worker. Another idea is to
increase the loglevel from the worker. You should then see the jobs coming in.

Regarding the documentation: Help is always welcome :-)
Its not easy for me to guess what a new user expects or what pitfalls they can avoid.

Sven

Cory Coager

unread,
Jan 7, 2011, 10:37:15 AM1/7/11
to mod_gearman
> Could you try to stop Nagios and move the retention.dat away so you have a clean start.
> check_gearman is able to monitor the check counter for the worker. Another idea is to
> increase the loglevel from the worker. You should then see the jobs coming in.
I deleted the retention.dat and started over. Right now all the host
and service checks say pending so there is definitely a problem. One
more thing to mention, if I run the gearman client in the foreground
it spits out complete garbage when it receives jobs. Does this mean
anything to you? If not, whats next? Should I turn on the logging
for the gearman server?

Sven Nierlein

unread,
Jan 7, 2011, 11:14:59 AM1/7/11
to mod_g...@googlegroups.com
On 1/7/11 16:37, Cory Coager wrote:
> more thing to mention, if I run the gearman client in the foreground
> it spits out complete garbage when it receives jobs. Does this mean
> anything to you? If not, whats next? Should I turn on the logging
> for the gearman server?


This usually means a problem with encryption. There should be messages
like "discarding invalid job" or something like that. Make sure your encryption
settings from the worker match the ones from the server. Password, Encryption enabled etc...

Cory Coager

unread,
Jan 7, 2011, 11:39:53 AM1/7/11
to mod_gearman
> This usually means a problem with encryption. There should be messages
> like "discarding invalid job" or something like that. Make sure your encryption
> settings from the worker match the ones from the server. Password, Encryption enabled etc...

I tried with and without encryption on both sides, same result. The
client spits out garbage and no checks are processed in nagios.

Cory Coager

unread,
Jan 7, 2011, 11:51:52 AM1/7/11
to mod_gearman
I set debug=3 in mod_gearman.conf and restarted nagios. Unfortunately
I don't see any logs being generated.

Sven Nierlein

unread,
Jan 7, 2011, 2:03:06 PM1/7/11
to mod_g...@googlegroups.com
On 1/7/11 17:51, Cory Coager wrote:
> I set debug=3 in mod_gearman.conf and restarted nagios. Unfortunately
> I don't see any logs being generated.

Have you set the logfile option?

Cory Coager

unread,
Jan 7, 2011, 2:06:05 PM1/7/11
to mod_gearman
> Have you set the logfile option?

Yes, logfile option is set in the config and the user has write access
to the directory.

Sven Nierlein

unread,
Jan 7, 2011, 2:10:33 PM1/7/11
to mod_g...@googlegroups.com

Ah i mixed that up. mod_gearman logs to the nagios.log. But don't do that (at least not in production), because Nagios is not thread safe and will segfault with a version <= 3.2.3. Hopefully the next nagios is better. They are working on that problem. The logfile option is only for the worker.

Cory Coager

unread,
Jan 7, 2011, 2:15:36 PM1/7/11
to mod_gearman
> Ah i mixed that up. mod_gearman logs to the nagios.log. But don't do that (at least not in production), because Nagios is not thread safe and will segfault with a version <= 3.2.3. Hopefully the next nagios is better. They are working on that problem. The logfile option is only for the worker.

If I set debug=1 I see messages in the nagios.log:

[1294427513] mod_gearman: received job for queue service: example01 -
Check Ping

Cory Coager

unread,
Jan 7, 2011, 2:18:00 PM1/7/11
to mod_gearman
> > Ah i mixed that up. mod_gearman logs to the nagios.log. But don't do that (at least not in production), because Nagios is not thread safe and will segfault with a version <= 3.2.3. Hopefully the next nagios is better. They are working on that problem. The logfile option is only for the worker.

I don't see any logging options for the client/worker.

Sven Nierlein

unread,
Jan 7, 2011, 2:23:42 PM1/7/11
to mod_g...@googlegroups.com
On 1/7/11 20:18, Cory Coager wrote:
>>> Ah i mixed that up. mod_gearman logs to the nagios.log. But don't do that (at least not in production), because Nagios is not thread safe and will segfault with a version <= 3.2.3. Hopefully the next nagios is better. They are working on that problem. The logfile option is only for the worker.
>
> I don't see any logging options for the client/worker.

How do you start your worker? Its in the same config file as the encryption settings for your worker.

Cory Coager

unread,
Jan 7, 2011, 2:33:36 PM1/7/11
to mod_gearman
> How do you start your worker? Its in the same config file as the encryption settings for your worker.

I'm using the gearman client.

Sven Nierlein

unread,
Jan 7, 2011, 2:39:07 PM1/7/11
to mod_g...@googlegroups.com
On 1/7/11 20:33, Cory Coager wrote:
>> How do you start your worker? Its in the same config file as the encryption settings for your worker.
>
> I'm using the gearman client.

The gearman client provided by the gearman package itself? Thats probably just a sample client. You have to start the mod_gearman_worker provided by mod_gearman. This would explain everything...

Cory Coager

unread,
Jan 7, 2011, 2:41:00 PM1/7/11
to mod_gearman
> The gearman client provided by the gearman package itself? Thats probably just a sample client. You have to start the mod_gearman_worker provided by mod_gearman. This would explain everything...

Yeah, I see it in the installation section now. Not sure how I missed
that. Silly me...

Sven Nierlein

unread,
Jan 7, 2011, 2:46:33 PM1/7/11
to mod_g...@googlegroups.com

I think a small step by step instruction would be really helpful sometimes...

Cory Coager

unread,
Jan 7, 2011, 3:22:24 PM1/7/11
to mod_gearman
> I think a small step by step instruction would be really helpful sometimes...

Looks like things are working now. Thank you for your help!
Reply all
Reply to author
Forward
0 new messages