Still problem with Nagios

657 views
Skip to first unread message

Don Bavol

unread,
May 12, 2017, 11:22:09 AM5/12/17
to openATTIC Users
I wiped my config and did a fresh install using OpenAttic 2.0.20 on Centos 7.3. Everything was great until I ran oaconfig install.

SystemError: "systemctl" "reload-or-restart" "nagios" failed: Job for nagios.service failed because start of the service was attempted too often. See "systemctl status nagios.service" and "journalctl -xe" for details.

Getting the nagios.service status:

[root@GV1-MGMT-SAN01 ~]# systemctl status nagios.service
nagios.service - Nagios Network Monitoring
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2017-05-12 10:16:05 CDT; 4s ago
  Process: 5521 ExecStopPost=/usr/bin/rm -f /var/spool/nagios/cmd/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 5477 ExecStart=/usr/sbin/nagios -d /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 5475 ExecStartPre=/usr/sbin/nagios -v /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 5478 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nagios.service

May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys nagios[5478]: wproc: Registry request: name=Core Worker 5513;pid=5513
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys nagios[5478]: Successfully launched command file worker with pid 5515
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys nagios[5478]: Caught SIGTERM, shutting down...
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys systemd[1]: Stopping Nagios Network Monitoring...
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys nagios[5515]: Caught SIGTERM, shutting down...
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys nagios[5478]: Successfully shutdown... (PID=5478)
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys systemd[1]: start request repeated too quickly for nagios.service
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys systemd[1]: Failed to start Nagios Network Monitoring.
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys systemd[1]: Unit nagios.service entered failed state.
May 12 10:16:05 GV1-MGMT-SAN01.corporate.intellys systemd[1]: nagios.service failed.

What do you need from me to help troubleshoot this for the next release? I want to help as much as I can, here. I plan on installing this on 4 other devices and need it running clean.

Thanks,
Don

Don Bavol

unread,
May 12, 2017, 11:43:27 AM5/12/17
to openATTIC Users
I ran "systemctl reset-failed nagios.service" and "systemctl start nagios.service" after which I did a reboot. I then ran "oaconfig install" which showed the same error. I then ran "systemctl reset-failed nagios.service" and "systemctl start nagios.service", waited 30 minutes and ran "oaconfig install" and the process completed successfully this time.

I am not sure if the waiting 30 minutes has anything to do with it or not, but it seemed to work last time, as well.

Christopher Diekkamp

unread,
May 12, 2017, 11:47:02 AM5/12/17
to openatt...@googlegroups.com

Hi Don,

I had an similar problem with a test installation on ubuntu xenial while setting up my salt states for deployment.
In my case, the problem was that the nagios service wasn't started and oaconfig tried to restart it which failed.
Starting nagios before running oaconfig install worked for me.

Cheers
Christopher


    
Am 12.05.2017 um 17:43 schrieb Don Bavol:
I ran "systemctl reset-failed nagios.service" and "systemctl start nagios.service" after which I did a reboot. I then ran "oaconfig install" which showed the same error. I then ran "systemctl reset-failed nagios.service" and "systemctl start nagios.service", waited 30 minutes and ran "oaconfig install" and the process completed successfully this time.

I am not sure if the waiting 30 minutes has anything to do with it or not, but it seemed to work last time, as well.
--
You received this message because you are subscribed to the Google Groups "openATTIC Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openattic-use...@googlegroups.com.
To post to this group, send email to openatt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openattic-users/491ccd78-2c89-47b3-876e-6127b680f854%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Don Bavol

unread,
May 12, 2017, 11:56:11 AM5/12/17
to openATTIC Users, e...@otto-schneider.de
However, I verified nagios was started by running systemctl status nagios.service before running oaconfig install and it still failed. That is why I am posting this up here.

Lenz Grimmer

unread,
May 15, 2017, 3:50:55 AM5/15/17
to openatt...@googlegroups.com
Hi Don,

thanks for reporting this and sorry for the trouble.

On 05/12/2017 05:56 PM, Don Bavol wrote:

> However, I verified nagios was started by running systemctl status
> nagios.service before running oaconfig install and it still failed. That
> is why I am posting this up here.

This is already tracked here - https://tracker.openattic.org/browse/OP-2042

During installation, "oaconfig install" performs a discovery of all
devices and creates Nagios configuration entries for these. The current
implementation reloads the Nagios service for each entry as soon as it
was created, which makes systemd think the service is "flapping" and
puts it into "failed" state because of this.

According to the comments in that bug report, we might actually have a
fix for that already, but it has not been incorporated into the code
base so far as it needs further testing.

Lenz

--
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)

signature.asc
Reply all
Reply to author
Forward
0 new messages