Adding more monitoring details...

Sean Alderman

unread,

Sep 3, 2013, 4:39:16 PM9/3/13

to example42-pu...@googlegroups.com

Al,
I'm sorry to bug you, I know you're fresh off of PuppetConf and have a load on your todo. I have two topics I'm hoping you can provide shed some light on...

A) Could you recommend a tool that works within the confines of your monitoring modules that can be used to provide RRD/Trending of data collected by the monitors (I'm using nagios)? I've been trying to look at pnp4nagios, but its not playing as well as I had hoped. This is a situation where we have two separate packages, one (pnp4nagios) requires modifications to the other's (nagios) config. Can I use an exported resource to tell nagios to use a pnp4nagios config template?...otherwise we'd include the pnp4nagios class (which might provide a nagios.cfg template), then source the custom template in the nagios class, which just seems like it would be confusing...as opposed to your Site Class concept to store site specific files.

B) Do you have any thoughts on how to deal with the need for more fine-grained monitors for Nagios? I have spent some time adding a few files manually, and while this works its clearly not optimal especially for service checks for hosts that E42 monitor/nagios/nrpe manages. Consider monitors for things like:

DHCPD where we want to monitor the process and port, but we also might want to monitor the DHCP scope usage.
Apache where we want to monitor idle workers, SSL certificate status, process, port, etc.
Tomcat where we want to monitor a host of JMX attributes beyond process and port.

I still haven't solved my Nagios Contact management issue yet, but I'm getting hammered by not having monitors on servers that have higher level monitoring needs that nagios can support.

Kind regards,

Sean

Alessandro Franceschi

unread,

Sep 3, 2013, 5:19:39 PM9/3/13

to example42-pu...@googlegroups.com

Hi Sean

On Tuesday, September 3, 2013 10:39:16 PM UTC+2, Sean Alderman wrote:

Al,
I'm sorry to bug you, I know you're fresh off of PuppetConf and have a load on your todo. I have two topics I'm hoping you can provide shed some light on...

A) Could you recommend a tool that works within the confines of your monitoring modules that can be used to provide RRD/Trending of data collected by the monitors (I'm using nagios)? I've been trying to look at pnp4nagios, but its not playing as well as I had hoped. This is a situation where we have two separate packages, one (pnp4nagios) requires modifications to the other's (nagios) config. Can I use an exported resource to tell nagios to use a pnp4nagios config template?...otherwise we'd include the pnp4nagios class (which might provide a nagios.cfg template), then source the custom template in the nagios class, which just seems like it would be confusing...as opposed to your Site Class concept to store site specific files.

For trending I've used the Munin module, which works well.
It doesn't need to be set as monitor tool, you just have to inlcude in in your monitored nodes, set the IP of the munin server.
On the munin server set the parameter $munin_server_local to true.

The module automatically executes a daily cronjob (you can disable it) to autoconfigure the plugins (it should understand by itself what's there to monitor, in some case you might need to configure the singleplugins, with credentials and other params (munin::params).
Munin main drawback is that it doesn't scale well, but something better should be done with the newest versions (such as graphing on demand).

B) Do you have any thoughts on how to deal with the need for more fine-grained monitors for Nagios? I have spent some time adding a few files manually, and while this works its clearly not optimal especially for service checks for hosts that E42 monitor/nagios/nrpe manages. Consider monitors for things like:
DHCPD where we want to monitor the process and port, but we also might want to monitor the DHCP scope usage.
Apache where we want to monitor idle workers, SSL certificate status, process, port, etc.
Tomcat where we want to monitor a host of JMX attributes beyond process and port.

On the relevant role classes (or anyway in a class included by your nodes) you can place custom defines like monitor::plugin which may use any kind of Nagios plugin, and eventually configure it both on the central nagios server and locally for puppi checks (if you have $monitor_tool = [ 'puppi' , 'nagios' ].
If you use only Nagios and are not interested in the puppi checks, just can place directly nagios define: nagios::service.

Note that you might need to configure also NRPE, so it might make sense to use monitor::plugin in any case, as it can configure the relevant NRPE entries.
Give a look at the options and the code of https://github.com/example42/puppet-monitor/blob/master/manifests/plugin.pp for some details.

I still haven't solved my Nagios Contact management issue yet, but I'm getting hammered by not having monitors on servers that have higher level monitoring needs that nagios can support.

Hope these infos will help.
About contacts management, is the problem somewhere in the Nagios module (missing or buggy features)?
Note that in any case you can add custom Nagios configurations placing the files you want without using a dedicated define.

Good luck

al

Sean Alderman

unread,

Sep 4, 2013, 11:30:35 AM9/4/13

to example42-pu...@googlegroups.com

Al,
Thanks for you speedy response! I have more comments inline below...

On Tuesday, September 3, 2013 5:19:39 PM UTC-4, Alessandro Franceschi wrote:

Hi Sean

For trending I've used the Munin module, which works well.
It doesn't need to be set as monitor tool, you just have to inlcude in in your monitored nodes, set the IP of the munin server.
On the munin server set the parameter $munin_server_local to true.
The module automatically executes a daily cronjob (you can disable it) to autoconfigure the plugins (it should understand by itself what's there to monitor, in some case you might need to configure the singleplugins, with credentials and other params (munin::params).
Munin main drawback is that it doesn't scale well, but something better should be done with the newest versions (such as graphing on demand).

I guess I was spoiled at my last place of employment with BigBrother and later Xymon, in terms of Graphs. [1] The ~2 hour, ~2 day, ~2 week, ~2 month, ~2 year roll up graphs were great for studying and understanding behaviour of servers. Like Nagios its config file based, but unlike the trending is built in. That said I am not a fan of Xymon other than the trending feature. You mention that munin runs daily, I'm wondering if can build graphs regularly at a more granular interval than days and weeks. I'm not sure if the feature is gone in Xymon or if we had a custom hack, but I don't see the ~2 hour graph on their page.

Both pnp4nagios and nagiosgraph (which I just discovered today) have integration into the host template and service templates to add the action_url entry for graphs on mouseovers. The also add to /etc/nagios/objects/command.cfg and /etc/nagios/nagios.cfg to enable data collection. It would seem that some of this could be done with custom baseservices, host, and nagios cfg templates, but I would have to add a service_template parameter to override from the top scope like you do with host_template for any other service not in the base.

[1] http://www.xymon.com/xymon-cgi/showgraph.sh?host=blixen.hswn.dk&service=la&graph_width=864&graph_height=180&disp=blixen.hswn.dk&nostale&color=green&graph_start=1378042247&graph_end=1378301447&action=menu

On the relevant role classes (or anyway in a class included by your nodes) you can place custom defines like monitor::plugin which may use any kind of Nagios plugin, and eventually configure it both on the central nagios server and locally for puppi checks (if you have $monitor_tool = [ 'puppi' , 'nagios' ].
If you use only Nagios and are not interested in the puppi checks, just can place directly nagios define: nagios::service.
Note that you might need to configure also NRPE, so it might make sense to use monitor::plugin in any case, as it can configure the relevant NRPE entries.
Give a look at the options and the code of https://github.com/example42/puppet-monitor/blob/master/manifests/plugin.pp for some details.

I was thinking that might be where to go... I have a wrapper Module to manage monitoring, that simply includes nrpe and sets allowed_hosts. Perhaps I should subclass that for various custom monitors. I am curious though since nagios::plugin includes nrpe, would that cause failure on duplicate class definition when the agent runs?

Hope these infos will help.
About contacts management, is the problem somewhere in the Nagios module (missing or buggy features)?
Note that in any case you can add custom Nagios configurations placing the files you want without using a dedicated define.

No, not a problem with the module. At the moment the only servers in Foreman/Puppet/Nagios are mine, so I haven't needed to deal with multiple contacts or how to apply them to hosts/services. Its one of many todos to resolve as I roll all of this stuff out.

Thanks again for your thoughts.

Good luck
al

Alessandro Franceschi

unread,

Sep 4, 2013, 12:00:05 PM9/4/13

to example42-pu...@googlegroups.com

Il giorno mercoledì 4 settembre 2013 17:30:35 UTC+2, Sean Alderman ha scritto:

Al,
Thanks for you speedy response! I have more comments inline below...

On Tuesday, September 3, 2013 5:19:39 PM UTC-4, Alessandro Franceschi wrote:
Hi Sean

For trending I've used the Munin module, which works well.
It doesn't need to be set as monitor tool, you just have to inlcude in in your monitored nodes, set the IP of the munin server.
On the munin server set the parameter $munin_server_local to true.
The module automatically executes a daily cronjob (you can disable it) to autoconfigure the plugins (it should understand by itself what's there to monitor, in some case you might need to configure the singleplugins, with credentials and other params (munin::params).
Munin main drawback is that it doesn't scale well, but something better should be done with the newest versions (such as graphing on demand).

I guess I was spoiled at my last place of employment with BigBrother and later Xymon, in terms of Graphs. [1] The ~2 hour, ~2 day, ~2 week, ~2 month, ~2 year roll up graphs were great for studying and understanding behaviour of servers. Like Nagios its config file based, but unlike the trending is built in. That said I am not a fan of Xymon other than the trending feature. You mention that munin runs daily, I'm wondering if can build graphs regularly at a more granular interval than days and weeks. I'm not sure if the feature is gone in Xymon or if we had a custom hack, but I don't see the ~2 hour graph on their page.

Oops, I didn't express well with Munin.
By default it collects data every 5 minutes and produces the typical hourly/daily/weekly/monthly rrd graphs, so your "resolution" on the metrics gathered is of 5 mins (1 day would be a bit too much ;-)
This is a sample of what you can expect from Munin: http://demo.munin-monitoring.org/

What happens everyday is the autodiscovery (via a cron job placed by the module) of the plugins so that if you add a service that's monitored automatically starting from the next day.

Note that there are fancier or quicker tools around like Collectd or Graphite, but I currently haven't modules for them.

Both pnp4nagios and nagiosgraph (which I just discovered today) have integration into the host template and service templates to add the action_url entry for graphs on mouseovers. The also add to /etc/nagios/objects/command.cfg and /etc/nagios/nagios.cfg to enable data collection. It would seem that some of this could be done with custom baseservices, host, and nagios cfg templates, but I would have to add a service_template parameter to override from the top scope like you do with host_template for any other service not in the base.

Add a feature request on GitHub.
If we did it for hosts we can do it for services, if it can be common for all the services of an host, otherwise a totally different approach has to be thought of.

[1] http://www.xymon.com/xymon-cgi/showgraph.sh?host=blixen.hswn.dk&service=la&graph_width=864&graph_height=180&disp=blixen.hswn.dk&nostale&color=green&graph_start=1378042247&graph_end=1378301447&action=menu

On the relevant role classes (or anyway in a class included by your nodes) you can place custom defines like monitor::plugin which may use any kind of Nagios plugin, and eventually configure it both on the central nagios server and locally for puppi checks (if you have $monitor_tool = [ 'puppi' , 'nagios' ].
If you use only Nagios and are not interested in the puppi checks, just can place directly nagios define: nagios::service.
Note that you might need to configure also NRPE, so it might make sense to use monitor::plugin in any case, as it can configure the relevant NRPE entries.
Give a look at the options and the code of https://github.com/example42/puppet-monitor/blob/master/manifests/plugin.pp for some details.

I was thinking that might be where to go... I have a wrapper Module to manage monitoring, that simply includes nrpe and sets allowed_hosts. Perhaps I should subclass that for various custom monitors. I am curious though since nagios::plugin includes nrpe, would that cause failure on duplicate class definition when the agent runs?

You can have many different "include nrpe" in your catalog but the nrpe class is included only once, without duplicated resources errors.

Hope these infos will help.
About contacts management, is the problem somewhere in the Nagios module (missing or buggy features)?
Note that in any case you can add custom Nagios configurations placing the files you want without using a dedicated define.

No, not a problem with the module. At the moment the only servers in Foreman/Puppet/Nagios are mine, so I haven't needed to deal with multiple contacts or how to apply them to hosts/services. Its one of many todos to resolve as I roll all of this stuff out.

Thanks again for your thoughts.

Cheers,

al

Sean Alderman

unread,

Sep 6, 2013, 3:33:11 PM9/6/13

to example42-pu...@googlegroups.com

Ok, for munin, the demo looked like it didn't provide a very short term granularity. Anyway, munin might be a nice option, but what I don't want at the moment is "one more place to go" for data to manage systems. Other tools in the mix are Splunk, Foreman, and now Nagios. The main Nagios server needs to be the one stop shop for system/service availability and notifications (as soon as I wrap up the contacts config :) )

In the meantime, what I've done is used your standard42 template to create a pnp4nagios module, and modified your nagios module to integrate with it. It's a total KLUDGE at the moment, and doesn't accomplish the end result elegantly like I'd like, but it works. Feel free to take a look at the two git repos...

I was hoping to be able to make a conditional using $enablepnp to reset various template params from the default to the pnp4nagios provided version, but I just learned that params are not aloud to be changed within the same scope, once they're defined. As a result, I exposed all the templates to be overridden by the top scope. In my case, that means Foreman params, which I can live with.

I'm not sure how to accomplish what I was looking to do. I want the module to say, "If enablepnp = true, then use this other set of params and resources," but I'm not sure I can do this via myclass or a set subclass.

Alessandro Franceschi

unread,

Sep 7, 2013, 8:12:27 AM9/7/13

to example42-pu...@googlegroups.com

On Friday, September 6, 2013 9:33:11 PM UTC+2, Sean Alderman wrote:

Ok, for munin, the demo looked like it didn't provide a very short term granularity. Anyway, munin might be a nice option, but what I don't want at the moment is "one more place to go" for data to manage systems. Other tools in the mix are Splunk, Foreman, and now Nagios. The main Nagios server needs to be the one stop shop for system/service availability and notifications (as soon as I wrap up the contacts config :) )

In the meantime, what I've done is used your standard42 template to create a pnp4nagios module, and modified your nagios module to integrate with it. It's a total KLUDGE at the moment, and doesn't accomplish the end result elegantly like I'd like, but it works. Feel free to take a look at the two git repos...
https://github.com/salderma/puppet-pnp4nagios
https://github.com/salderma/puppet-nagios
I was hoping to be able to make a conditional using $enablepnp to reset various template params from the default to the pnp4nagios provided version, but I just learned that params are not aloud to be changed within the same scope, once they're defined. As a result, I exposed all the templates to be overridden by the top scope. In my case, that means Foreman params, which I can live with.
I'm not sure how to accomplish what I was looking to do. I want the module to say, "If enablepnp = true, then use this other set of params and resources," but I'm not sure I can do this via myclass or a set subclass.

Typically you introduce a new variable, whose value can be the one of the almost omonimous one , or different.
They generally have real_ or managed_ or true_ prefixes.

So you can do things like (instead of a selector you can use a case or an if ):
$real_param = $enablepnp ? {

true => ',,,,',
false => $param,

}

If you need to refer in a module a parameter defined in another module you must use its fully qualified name, ie $nagios::enablepnp

Not that generally is not a good thing to refer in a module variables define din another one, because it creates an inter dependency which conflicts with the pattern of having modues indipendent and autonomous.

This is anyway already done (maybe too many times) in some Example42 modules, included the nagios one.

Sean Alderman

unread,

Sep 10, 2013, 10:12:55 AM9/10/13

to example42-pu...@googlegroups.com

Al,

Thanks for the heads up on this. The struggle I had with setting this up was that I started off trying to only use nrpe::plugin, then looking at the note below, I switched to monitor::plugin. But, the problem was that if the monitor was a custom one (not a standard plugin), both are required. :)

On Tuesday, September 3, 2013 5:19:39 PM UTC-4, Alessandro Franceschi wrote:

Reply all

Reply to author

Forward