Nagios checks

74 views
Skip to first unread message

Peter Berghold

unread,
May 27, 2010, 2:37:09 PM5/27/10
to puppet...@googlegroups.com

Has anybody out there written a custom check for Nagios to determine if puppetd and/or puppetmasterd is running? I am considering writing one if not.

Michael DeHaan

unread,
May 27, 2010, 3:19:13 PM5/27/10
to puppet...@googlegroups.com

Richard Crowley

unread,
May 27, 2010, 5:12:15 PM5/27/10
to puppet...@googlegroups.com
On Thu, May 27, 2010 at 12:19 PM, Michael DeHaan <mic...@puppetlabs.com> wrote:
> On Thu, May 27, 2010 at 2:37 PM, Peter Berghold <salty....@gmail.com> wrote:
>> Has anybody out there written a custom check for Nagios to determine if
>> puppetd and/or puppetmasterd is running? I am considering writing one if
>> not.

I did a minimal check like this on the assumption that both apache2
and puppetmasterd have to be functioning for the response to be 400.

nagios_command { "check_https_port_status":
command_line => "/usr/lib/nagios/plugins/check_http --ssl -H
'\$HOSTADDRESS\$' -I '\$HOSTADDRESS\$' -p '\$ARG1\$' -e '\$ARG2\$'",
}

nagios_service { "puppet-https-8140":
service_description => "HTTPS 8140",
hostgroup_name => "puppetmaster",
check_command => "check_https_port_status!8140!400",
}

Joe McDonagh

unread,
May 28, 2010, 12:06:17 PM5/28/10
to puppet...@googlegroups.com
> --
> You received this message because you are subscribed to the Google
> Groups "Puppet Users" group.
> To post to this group, send email to puppet...@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-users?hl=en.
I use the one that came with the source download, however it requires a
rubygem that IMO it probably shouldn't use.

--
--
Joe McDonagh
Operations Engineer
AIM: YoosingYoonickz
IRC: joe-mac on freenode
"When the going gets weird, the weird turn pro."

Todd Zullinger

unread,
Jun 1, 2010, 9:51:39 PM6/1/10
to puppet...@googlegroups.com

FWIW, I've got an overengineered check_puppet and puppetstatus tool
at: http://tmz.fedorapeople.org/scripts/puppetstatus/

I have found it necessary to disable puppet for a short time to work
on something and not have puppet helpfully undo my work more than a
few times. While it's easy to use puppetd --disable to prevent puppet
from running, it's also easy to forget to re-enable it. Or worse, in
a place with multiple SA's, it's easy for someone else to come along
and notice puppetd seems to be 'stuck' and 'helpfully' clear out the
lock file.

Using 'sudo puppetstatus -d "Testing some foo"' creates the lock file
as puppetd --disable would, but adds the text given and the username
of the person disabling puppet. That then shows up in nagios and if
puppet remains disabled for longer than check_puppet would normally
consider a critical amount of time, it remains a warning if there is a
reason in the lockfile. That also lets other SA's know puppet is down
intentionally so they don't have to bug me or worry about 'fixing' it.

(The checks in the script to chide folks running it as root are more
of a goof, to gently prod admins in the habit of doing everything as
root to stop that. :)

(Oh, and this is in python -- sorry to any ruby lover's who might take
offense. I'll try to turn a blind eye to gems and vendor/ dirs if you
don't complain to much about my python usage.)

--
Todd OpenPGP -> KeyID: 0xBEAF0CE3 | URL: www.pobox.com/~tmz/pgp
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All of us could take a lesson from the weather. It pays no attention
to criticism.

Disconnect

unread,
Jun 2, 2010, 12:20:31 PM6/2/10
to puppet-users
I do a very simple check:
        # warn if the state files are 90 minutes old, critical at 150 minutes
        nagios::service { "puppet_running": check_command => "check_puppet_age!7500!9000!2048!2047",
            notifications_enabled => 0, }
    }

check_puppet_age is: command_line => '/usr/lib/nagios/plugins/check_file_age -w $ARG1$ -c $ARG2$ -W $ARG3$ -C $ARG4$ -f /var/lib/puppet/state/state.yaml';

Just checks for state.yaml aging - it misses some problems, but for the most part it fires as expected when a host is disabled, a manifest is broken, etc..
Reply all
Reply to author
Forward
0 new messages