nagmq initial configuration for a simple test

114 views
Skip to first unread message

Andrés More

unread,
Jun 4, 2013, 9:17:27 AM6/4/13
to na...@googlegroups.com
Hi, 

I've been trying to make a simple nagios <-> server couple, without any luck.

I can see that nagios events reach the plugins in the server, but the response is not reaching back nagios dashboard.

Jun  4 13:04:10 ip-nagios mqbroker: Received message from frontend for device 0

Jun  4 13:06:51 ip-server mqbroker: Received message from backend for device 1
Jun  4 13:06:51 ip-server mqbroker: Received message from backend for device 1
Jun  4 13:06:51 ip-server mqexec: Received job from upstream: host_check_initiate /usr/local/nagios/libexec/check_ping -H 10.0.102.45 -w 3000.0,80% -c 5000.0,100% -p 5
Jun  4 13:06:51 ip-server mqexec: Kicked off 13346 for ip-10-0-102-45 (none)
Jun  4 13:06:55 ip-server mqexec: Sending result for ip-10-0-102-45 (none): PING OK - Packet loss = 0%, RTA = 0.02 ms|rta=0.024000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0#012 0
Jun  4 13:06:55 ip-server mqexec: Child 13346 ended with 0. Sending "PING OK - Packet loss = 0%, RTA = 0.02 ms|rta=0.024000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0#012" upstream
Jun  4 13:06:55 ip-server mqbroker: Received message from frontend for device 2

These are the respective configurations for devices, everything else is just like the provided configuration examples.

Nagios
"devices": [
                [ { "backend": { "type": "push", "bind":"tcp://*:5558",
                                 "noblock":true },
                    "frontend": { "type": "sub",
                                "connect":"ipc:///var/nagios/nagmqevents.sock",
                                "subscribe": [ "service_check_initiate",
                                               "host_check_initiate" ] } }

                                               ]
        ]

Server
"devices": [
                [ { "backend": { "type": "push", "bind":"tcp://*:5558", "noblock":true },
                    "frontend": { "type": "sub", "connect":"ipc:///var/nagios/nagmqevents.sock",
                                "subscribe": [ "service_check_initiate", "host_check_initiate" ] } }
,
                { "backend": { "type": "pull", "connect":"tcp://ip-nagios:5558" },
                  "frontend": { "type": "push", "bind": "ipc:///var/nagios/mqexecjobs.sock" } },

                { "backend": { "type": "push", "connect":"tcp://ip-nagios:5556" },
                  "frontend": { "type": "pull", "bind": "ipc:///var/nagios/mqexecresults.sock" } }
 ]
        ]

I've just removed all filtering with tcpfilteraccept as found in the Readme.md just in case, but it wasn't working with them either.
I'm using a Nagios from source installation as described on the official install guide, also cloned latest nagmq repository from github.

Some extra comments:
The configuration guide is a little bit scarce, can you please elaborate on a simple scenario (like mine :))? I think most users will first attempt to build something similar.
The Wiki says "enable": true is required but it seems it works without that according to the Readme.md file.
I think --help options are not working in mqbroker and mqexec, so something is wrong in implementation. I'm I right?

I hope this note helps, thanks!

Jonathan Reams

unread,
Jun 4, 2013, 1:52:16 PM6/4/13
to Andrés More, na...@googlegroups.com
Hi Andres, so as I understand it, you're trying to set up distributed monitoring between a few nagios hosts, and that you compiled nagios and nagmq from scratch? Your configuration for mqbroker looks okay, and based on the log output you included it looks like messages are coming out the event side of NagMQ and making it to the executors, but there may be a bug in getting the check results back in.There was a bug introduced in Nagios 3.4.3 that could be causing your problem - I've included a patch for Nagios that should fix it. I had already submitted this upstream to Nagios core, but it looks like it got removed when they accidentally merged their 3.x and 4.x branches a few months back - I'll follow up with them on it. Also, can you tell me the configure command you used to build NagMQ? You may have to specify the path to nagios's include directory, since the ABI changed between 3.4 and 3.5 (NagMQ comes with 3.4 header files). Thanks for catching mqexec/mqbroker's "-h" flag not working, I just pushed a fix for it to trunk. Let me know if this helps!

JBR



--
You received this message because you are subscribed to the Google Groups "NagMQ Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nagmq+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

checkinit.patch

Andrés More

unread,
Jun 6, 2013, 1:48:09 PM6/6/13
to na...@googlegroups.com, Andrés More
Thanks for the prompt response.

I've patched my Nagios deployment and also updated both NagMQs to the latest and greatest after your fixes. -h now works OK.
But the Nagios results were the same as before. Note that the patch did not applied cleanly, but succeeded.

$ patch -p1 < ~/checkinit.patch
patching file base/checks.c
Hunk #3 succeeded at 2978 with fuzz 2 (offset -61 lines).

I think maybe that patch is still missing something as I can see now in Nagios log that the check result is seen but somehow the timeout is not disabled.
So Nagios GUI is highlighting the service system as a flapping host after this.

Jun  6 14:32:07 ip-nagios nagios: HOST ALERT: ip-service;UP;SOFT;2;PING OK - Packet loss = 0%, RTA = 0.03 ms
Jun  6 14:32:27 ip-nagios nagios: HOST ALERT: ip-service;DOWN;SOFT;1;(Host Check Timed Out)

Event Start Time

Event End Time

Event Duration

Event/State Type

Event/State Information

06-03-2013 17:15:56

06-03-2013 17:26:36

0d 0h 10m 40s

HOST DOWN (HARD)

(Host Check Timed Out)

06-03-2013 17:26:36

06-03-2013 17:38:46

0d 0h 12m 10s

HOST UP (HARD)

PING OK - Packet loss = 0%, RTA = 0.03 ms

06-03-2013 17:38:46

06-03-2013 17:38:56

0d 0h 0m 10s

HOST DOWN (HARD)

(Host Check Timed Out)

06-03-2013 17:38:56

06-03-2013 17:40:16

0d 0h 1m 20s

HOST UP (HARD)

PING OK - Packet loss = 0%, RTA = 0.03 ms

06-03-2013 17:40:16

06-03-2013 17:40:26

0d 0h 0m 10s

HOST DOWN (HARD)

(Host Check Timed Out)


I've also tried changing Nagios configuration on delays and timeouts without any luck.

Regarding your question on build steps, I'm using a plain './configure' without arguments on Ubuntu Server 12.04.2 LTS.

Thanks again.
Reply all
Reply to author
Forward
0 new messages