Nagios Network Monitor - Installation and configuration

8 views

Skip to first unread message

Nirmal Pathak

unread,

Mar 22, 2007, 6:36:06 AM3/22/07

to VG...@googlegroups.com, linuxc...@googlegroups.com

Nagios Network Monitor - Installation and configuration

Introduction

If you manage a network of any size, you want to be notified of problems before your customers or your bosses find out, but you don't want to be tied to a console checking for the availability of hosts and services. This is where Nagios shines. If you put in the time it takes to install and customize Nagios for your environment, you'll be rewarded with a superb monitoring and notification solution that happens to be free. In this PET, I will guide you through the installation and configuration of Nagios, and I will provide examples of customizations you can add using plugins you can write yourself.

Gather up our packages

I will use Redhat Enterprise Linux AS 4.0 in these examples, but they can be adapted for any Linux distribution. The following are required packages for HTTPD services that will drive Nagios's web interface:

Apache

httpd

httpd-suexec

apr-util

Optional (for secure sockets layer, HTTPS interface)

mod_ssl

If you selected the default package set during installation, these are already installed. If you opted not to make Apache available during Redhat install, you can grab the packages from RHN using up2date or by manually downloading them.

The following are needed for Nagios basic functionality, really it's the Nagios framework we get. Nagios's checks are accomplished entirely through the use of plugins, which are available in a separate package. From here on out, I will suggest getting prebuilt packages from Dag Wieers's collection, and occasionally from CPAN. To make it easier on yourself, add Dag's repositories if you use YUM.

Nagios

nagios-2.2-1.el4.rf.i386.rpm http://dag.wieers.com/packages/nagios/

The following are needed for Nagios to actually perform checks

Nagios Plugins

nagios-plugins-1.4.1-1.2.el4.rf.i386.rpm http://dag.wieers.com/packages/nagios-plugins/

fping-2.4-1.b2.2.el4.rf.i386.rpm http://dag.wieers.com/packages/fping/

perl-Crypt-DES-2.03-3.2.el4.rf.i386.rpm http://dag.wieers.com/packages/perl-Crypt-DES/

perl-Net-SNMP-5.0.1-1.2.el4.rf.noarch.rpm http://dag.wieers.com/packages/perl-Net-SNMP/

perl-IO-Socket-INET6-2.51-1.2.el4.rf.noarch.rpm http://dag.wieers.com/packages/perl-IO-Socket-INET6/

Digest-HMAC-1.01.tar.gz http://search.cpan.org/~gaas/Digest-HMAC-1.01/lib/Digest/HMAC.pm

Digest-SHA1-2.11.tar.gz http://search.cpan.org/~gaas/Digest-SHA1-2.11/SHA1.pm

Install Necessary Packages

We can begin installation of the packages by first installing Nagios:

rpm -ivh nagios-2.2-1.el4.rf.i386.rpm

Now we begin satisfying nagios-plugins dependencies:

rpm -ivh fping-2.4-1.b2.2.el4.rf.i386.rpm
rpm -ivh perl-Crypt-DES-2.03-3.2.el4.rf.i386.rpm

mkdir /tmp/perltmp
cp *gz /tmp/perltmp
cd /tmp/perltmp
find . -name "*gz" -exec tar xvzf {} \;

cd Digest-SHA1-2.11
perl Makefile.pl
make test
make install

cd ../Digest-HMAC-1.01
 perl Makefile.pl
make test
make install

cd ../Socket6-0.19
perl Makefile.pl
make test
make install

These next two Dag perl packages expect SHA1, HMAC and Socket6 to be available as rpms, but since they were not, we have to tell rpm not to check dependencies.

rpm -ivh --nodeps perl-Net-SNMP-5.0.1-1.2.el4.rf.noarch.rpm
rpm -ivh --nodeps perl-IO-Socket-INET6-2.51-1.2.el4.rf.noarch.rpm
rpm -ivh nagios-plugins-1.4.1-1.2.el4.rf.i386.rpm

Begin Configuration

Nagios has two methods for arranging its configuration files. One way relies on a single file where you specify hosts, groups, services etc. The other allows you to split these files up by purpose for ease of administration. The single file method can become unwieldy as you add machines and services to monitor. Here, we'll assume the multiple definition file method.

Configure The Nagios Service

Let's become familiar with the file locations that the Dag provided packages use as defaults:

Main Nagios Configs

/etc/nagios

Plugins and CGIs

/usr/lib/nagios

Nagios Web Files

/usr/share/nagios

Here, we see the example config files in /etc/nagios:

[radar@test2 ~]$ ls -lh /etc/nagios
total 160K
-rw-rw-r--  1 root root  30K Apr  8 08:28 bigger.cfg
-rw-rw-r--  1 root root 9.4K Apr  8 08:28 cgi.cfg
-rw-rw-r--  1 root root 4.8K Apr  8 08:28 checkcommands.cfg

-rw-r--r--  1 root root  16K Aug  5  2005 command-plugins.cfg
-rw-rw-r--  1 root root  14K Apr  8 08:28 minimal.cfg
-rw-rw-r--  1 root root 4.2K Apr  8 08:28 misccommands.cfg
-rw-rw-r--  1 root root  30K Apr  8 08:28 
nagios.cfg
-rw-rw----  1 root root 1.3K Apr  8 08:28 resource.cfg

The first file we're interested in is nagios.cfg, the main config file. This file specifies, among other things, the object config (definition) files. Those are what we are most interested in at this point. We want to open /etc/nagios/nagios.cfg in an editor and comment out the line that contains minimal.cfg. Then we'll uncomment the lines containing the object config files that we'll need to create, and populate with our definitions. Let's go ahead and do that, then.

# You can split other types of object definitions across several
# config files if you wish (as done here), or keep them all in a
# single config file.

#cfg_file=/etc/nagios/minimal.cfg

Here, I have commented out minimal.cfg

cfg_file=/etc/nagios/contactgroups.cfg
cfg_file=/etc/nagios/contacts.cfg
#cfg_file=/etc/nagios/dependencies.cfg
#cfg_file=/etc/nagios/escalations.cfg
cfg_file=/etc/nagios/hostgroups.cfg
cfg_file=/etc/nagios/hosts.cfg

cfg_file=/etc/nagios/services.cfg
cfg_file=/etc/nagios/timeperiods.cfg

And here I have uncommented the object config files we will work with first, to get basic functionality. We will now create these and populate them with some hosts, services, groups, etc.

While we're at it we want to enable service commands in the CGIs, and enable flap detection:

Still in nagios.cfg, change:

check_external_commands=0
check_external_commands=1

and change:

enable_flap_detection=0
enable_flap_detection=1

open minimal.cfg and copy the timeperiod definition and paste it into a new file called timeperiods.cfg and save it.

define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00

        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
        }

Do the same for the contact definition and contact group definition. For hosts, copy the generic-host definition, along with the localhost definition and paste into hosts.cfg.

define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1       ; Host notifications are enabled
        event_handler_enabled           1       ; Host event handler is enabled

        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data

        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!

        }


# Since this is a simple configuration file, we only monitor one host - the
# local host (this machine).

define host{
        use                     generic-host            ; Name of host template to use

        host_name               localhost
        alias                   localhost
        address                 127.0.0.1
        check_command           check-host-alive

        max_check_attempts      10
        notification_interval   120
        notification_period     24x7
        notification_options    d,r
        contact_groups  admins
        }

define host{

        use                     generic-host            ; Name of host template to use
        host_name               testbox
        alias                   Testbox
        address                 
192.168.0.4
        check_command           check-host-alive
        max_check_attempts      10
        notification_interval   120
        notification_period     24x7
        notification_options    d,r

        contact_groups  admins
        }

I have added a networked host to check. Copy the hostgroup definition from minimal.cfg and paste into the new hostgroups.cfg.

define hostgroup{
        hostgroup_name  test
        alias           Test Servers
        members         localhost,testbox
        }

I added our testbox to this group. We will need to copy the services definitions from minimal.cfg and paste them all into the new services.cfg file. Now we verify our work using nagios:

[radar@test2 nagios]$ sudo nagios -v /etc/nagios/nagios.cfg

Nagios 2.2
Copyright (c) 1999-2006 Ethan Galstad (
http://www.nagios.org)
Last Modified: 04-07-2006
License: GPL

Reading configuration data...

Running pre-flight check on configuration data...

Checking services...
        Checked 5 services.

Checking hosts...
Warning: Host 'testbox' has no services associated with it!
        Checked 2 hosts.
Checking host groups...
        Checked 1 host groups.
Checking service groups...
        Checked 0 service groups.

Checking contacts...
        Checked 1 contacts.
Checking contact groups...
        Checked 1 contact groups.
Checking service escalations...
        Checked 0 service escalations.
Checking service dependencies...

        Checked 0 service dependencies.
Checking host escalations...
        Checked 0 host escalations.
Checking host dependencies...
        Checked 0 host dependencies.
Checking commands...
        Checked 22 commands.

Checking time periods...
        Checked 1 time periods.
Checking extended host info definitions...
        Checked 0 extended host info definitions.
Checking extended service info definitions...
        Checked 0 extended service info definitions.

Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...


Total Warnings: 1
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

If we had made a mistake, nagios would do its best to hint toward the problem. So all looks good for us to have a basic functioning setup. I will address the warning about no services set up for the testbox in a bit. We will now set up apache for authentication.

Configure HTTPD authentication and CGI accesses

Look at /etc/httpd/conf.d/nagios.conf to see how authentication files are set:

AuthName "Nagios Access"
AuthType Basic
AuthUserFile /etc/nagios/htpasswd.users
Require valid-user

So we need to add nagiosadmin, who's defined as a contact, in htpasswd.users:

sudo /usr/bin/htpasswd -c /etc/nagios/htpasswd.users nagiosadmin

Make sure this file is readable by the apache user, if not already:

sudo chmod 644 /etc/nagios/htpasswd.users

Now edit cgi.cfg, uncommenting the lines containing allowed actions for the nagiosadmin user.

Configure Nagios and Apache Services for Start

[radar@test2 ~]$ sudo /sbin/chkconfig --level 35 httpd on
[radar@test2 ~]$ sudo /sbin/chkconfig --level 35 nagios on

Unfortunately, before we proceed, we have to disable SELinux. There is no policy (that I know of) created to allow nagios functionality with SELinux enabled apache. If anyone knows the solution, please see contact info at the end of this PET, and discuss. The easiest way to disable SELinux, is to go to applications, system settings, security level and select the selinux tab. Uncheck "Enabled (Modification Requires Reboot". Then click ok and reboot.

When the machine is up, we can point the browser to https://machine/nagios. We'll see right away in the control panel that there's an issue with the total processes check. By looking at /etc/nagios/services.cfg for check_local_procs we see the check definition:

check_local_procs!250!400

So lets look at our checkcommands.cfg file to see how that's defined:

$USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$

Right away, we see there's a mismatch. The default service definition supplies only 2 arguments (delimited by the '!'), yet the command definition is looking for 3. Lets see what that -s is for:

cd /usr/lib/nagios/plugins
./check_procs -h | less

The help tells us that the -s is optional:

Optional Filters:
-s, --state=STATUSFLAGS

So we'll remove that from the command definition for now:

define command{
        command_name    check_local_procs
        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
        }

define command{
        command_name    check_local_procs
        command_line    $USER1$/check_procs -w $ARG1$ -c $ARG2$
        }

We've removed the optional ps status flag.

Restart nagios:

[radar@test2 plugins]$ sudo /sbin/service nagios restart
Running configuration check...done
Stopping network monitor: nagios
Waiting for nagios to exit . done.
Starting network monitor: nagios

Now all is green! We have basic Nagios functionality and can start adding our customizations.

Adding Services To Nagios

Remember that when we verified nagios's configuration, we got a warning about our testbox host not having any services associated with it. What this means is that, besides the obvious, nagios will not do any host alive checks against it. Nagios tries to spread out the checks in an efficient manner and will normally only check a host's alive state when a service is failing. Once we establish a service for testbox. It will count the host as alive if the service associated with it succeeds. You can set up a service just to ping the box, but we'll set up a custom command using one of the provided plugins.

Using a Supplied Plugin

I have started apache on our testbox, and will use the check_http plugin to define a command, and then from that, define a service to run against testbox. We can test the plugin directly so we know what to expect:

/usr/lib/nagios/plugins/check_http -h

Gives us the usage

[radar@test2 www]$ /usr/lib/nagios/plugins/check_http -H testbox -u /error/noindex.html
HTTP OK HTTP/1.1 200 OK - 4177 bytes in 0.007 seconds |time=0.006624s;;;0.000000 size=4177B;;;0

Gives us the default new install page. We can use that to set up a service to test whether apache is up on testbox. Create a new config file in /etc/nagios called custom_cmds.cfg and place the following in it:

define command{
        command_name    check_apache
        command_line    $USER1$/check_http -H $ARG1$ -S -u $ARG2$
        }

Now open services.cfg in an editor and define a service to use this command definition:

define service{
       use                             generic-service         ; Name of service template to use
       host_name                       testbox
       service_description             Check Apache

       is_volatile                     0
       check_period                    24x7
       max_check_attempts              4
       normal_check_interval           5
       retry_check_interval            1

       contact_groups                  admins
       notification_options            w,u,c,r
       notification_interval           960
       notification_period             24x7
       check_command                   check_apache!testbox!/error/noindex.html

We have to tell nagios that this new command file exists by adding the path to the file:

cfg_file=/etc/nagios/custom_cmds.cfg

I added that under the existing command definition. Now we can use this file to add custom command definitions. We need to verify that we did'nt make any mistakes:

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Good. We can restart nagios:

sudo /sbin/service nagios restart

We see that the new service is there, but it's pending. We can force it by rescheduling the next check and accepting the default time, which is immediate. We now can see that the service is working.

Pretty easy, but we may also want to write our own plugin and make a service check from that. Let's emulate the functionality of the check_http plugin, for illustration purposes, using available tools and wrap it up in a bash script.

Create Custom Plugin

To use this example, curl needs to be installed. It is by default on RHEL.

Nagios expects plugins to return a code telling what the status of the check is. The following details what the codes are:

0 = OK
1 = WARNING
2 = CRITICAL
3 = UNKNOWN

The warning and critical exit codes are ideal for setting thresholds, such as CPU usage and load averages. But since our service is either on or off, we can use critical, ok, and unknown (for bad parameters passed).

This script takes arguments and passes them to the curl command. We'll use it to get similar functionality as the check_http plugin.

#!/bin/bash
#
# testweb.sh
#
#

BADCALL="Wrong combination of parameters $@"

printuse ()
{
cat <<End-of-usage

Usage:   ./testweb.sh -h [hostname] [-H|S]
         ./testweb.sh -h [hostname] [-H|S] -p [port]


Example: ./testweb.sh -h www.redhat.com -S
         ./testweb.sh -h 192.168.0.10 -p 7778

End-of-usage
}

# Rudimentary check for proper number and combination of parameters


if [ "$#" -lt 3 ] || [ "$#" -gt 5 ] || [ "$#" -eq 4 ] || [ "$1" != "-h" ] || \
   [ ! `echo "$3" | grep [S,H]` ]
then
    echo "$BADCALL"

    printuse
    exit 3
elif [ "$#" -eq 5 ] && [ "$4" != "-p" ] || [ `echo "$5" | grep [^0-9]` ]
then
    echo $BADCALL
    printuse
    exit 3
fi


# Set the URL prefix based on parameter 3

if [ "$3" == "-S" ]
then
    PRE=https://
else
    PRE=http://
fi

# Build URL

HOST="$2"
if [ "$#" -eq 5 ]

then
    PORT=":$5"
    URL="$PRE$HOST$PORT"
else
    URL="$PRE$HOST"
fi

curl -k -s -I -w "%{size_header} bytes in %{time_total} seconds\n\n" $URL >/tmp/$HOST.header.txt


case "$?" in
    "7")
    MSG=`cat /tmp/$HOST.header.txt`
    echo "CRITICAL - Failed to connect => $MSG"
    exit 2
    ;;
    "0")
    STAT=`grep seconds /tmp/$HOST.header.txt`

    SRV=`grep Server /tmp/$HOST.header.txt | awk '{print $2}'`
    echo "OK - $SRV => $STAT"
    rm -f /tmp/$HOST.header.txt
    exit 0
    ;;
esac

And we save this in /usr/lib/nagios/plugins as testweb.sh and make it executable:

chmod 755 /usr/lib/nagios/plugins/testweb.sh

Let's see how to use the plugin:

[radar@test2 nagios]$ /usr/lib/nagios/plugins/testweb.sh -h testbox -S
OK - Apache/2.0.52 => 199 bytes in 0.354 seconds

[radar@test2 nagios]$ /usr/lib/nagios/plugins/testweb.sh -h testbox -H
OK - Apache/2.0.52 => 199 bytes in 0.008 seconds

SSL seems considerably slower, as can be expected.

We can use this now to define a new service. Let's edit /etc/nagios/custom_cmds.cfg and add a command.

define command{
       command_name    check_apache_also
       command_line    $USER1$/testweb.sh -h $ARG1$ -S
       }

Now we edit services.cfg and define the service:

define service{
        use                             generic-service         ; Name of service template to use
        host_name                       testbox
        service_description             Check Apache Also

        is_volatile                     0
        check_period                    24x7
        max_check_attempts              4
        normal_check_interval           5
        retry_check_interval            1

        contact_groups                  admins
        notification_options            w,u,c,r
        notification_interval           960
        notification_period             24x7
        check_command                   check_apache_also!testbox

        }

And we verify our changes with nagios:

[radar@test2 nagios]$ sudo nagios -v /etc/nagios/nagios.cfg

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Now restart nagios:

[radar@test2 nagios]$ sudo /sbin/service nagios restart

The service will show pending, so force its schedule as before. And we see it works!

Conclusion

It took a little configuration, but it's quite easy to have a functioning Nagios install, with reliable checks. There is quite a bit more to nagios, all of which you'll want to get working. Things like service groups, notifications, dependencies and escalations will further refine the way Nagios works for you. Nagios is well documented - you can view the help files right from within a working install, or go over to Nagios's project site.

--
Nirmal D Pathak.
+91-9898173175.

Reply all

Reply to author

Forward

0 new messages