Prometheus meets Nagios/Icinga

468 views
Skip to first unread message

martialblog

unread,
Sep 20, 2023, 6:51:35 AM9/20/23
to Prometheus Users
Hi,

I just wanted to spread the word that me and my colleagues release a little tool that helps to integrate Prometheus into monitoring tools like Nagios/Icinga.

It's a Nagios-style monitoring plugin that talk with the Prometheus API and transforms the response into the OK,WARNING,CRITICAL semantic. All packed into a Golang Binary, released under GPL-2.0 license.

https://github.com/NETWAYS/check_prometheus

Current features are:
  • health, Checks the health or readiness status of the Prometheus server
  • alert, Checks the status of one or more Prometheus alerts
  • query, Checks the status of a PromQL query
Feedback is most welcome!

Regards
Markus

Brian Candler

unread,
Sep 20, 2023, 7:43:13 AM9/20/23
to Prometheus Users
Cool. How does this compare with https://github.com/claranet/nagitheus ?

martialblog

unread,
Sep 20, 2023, 8:40:49 AM9/20/23
to Prometheus Users
From what I can tell nagitheus as well as https://github.com/prometheus/nagios_plugins can only be used for PromQL checks.

We wanted to have a tool that's also able to do other things, like a simple heath check or alerts.

I also hope that the we can extend the CLI in the future if other features are required, thus the subcommand pattern.

Brian Candler

unread,
Sep 20, 2023, 1:04:43 PM9/20/23
to Prometheus Users
Thanks.

It wasn't clear to me how the -c (critical) and -w (warning) thresholds work. I had to dig through source and I found my way to a dependency: https://github.com/NETWAYS/go-check#thresholds

There, the README shows an example "~:3" but not what it actually means. In the source (which presumably ends up in godoc) I found:

// Defining a threshold for any numeric value
//
// Format: [@]start:end
//
// Threshold  Generate an alert if x...
// 10         < 0 or > 10, (outside the range of {0 .. 10})
// 10:        < 10, (outside {10 .. ∞})
// ~:10       > 10, (outside the range of {-∞ .. 10})
// 10:20      < 10 or > 20, (outside the range of {10 .. 20})
// @10:20     ≥ 10 and ≤ 20, (inside the range of {10 .. 20})
//
// Reference: https://www.monitoring-plugins.org/doc/guidelines.html#THRESHOLDFORMAT

So my main feedback is, a direct documentation link from check_prometheus to THRESHOLDFORMAT would be very helpful :-)

(I guess this is standard for nagios though. I know check_snmp works in this way)

Conall O'Brien

unread,
Sep 20, 2023, 2:04:41 PM9/20/23
to Brian Candler, Prometheus Users
On Wed, 20 Sept 2023 at 14:04, Brian Candler <b.ca...@pobox.com> wrote:
Thanks.

It wasn't clear to me how the -c (critical) and -w (warning) thresholds work. I had to dig through source and I found my way to a dependency: https://github.com/NETWAYS/go-check#thresholds

There, the README shows an example "~:3" but not what it actually means. In the source (which presumably ends up in godoc) I found:

It's been a long time since I used nagios, but nagios warning vs critical thresholds are akin to using a Prometheus metric in multiple alerts, with different alerting thresholds and severity labels for each alert definition.
 
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ee5d6a30-8b5b-4cb1-bc36-1aa9768776e1n%40googlegroups.com.


--

Conall O'Brien

martialblog

unread,
Sep 21, 2023, 6:56:58 AM9/21/23
to Prometheus Users
See now that is something that one would fail to mentions because when working with Nagios/Icinga all day it's essential. In German we call it "betriebsblind".

I'll update some READMEs to make things clearer.

Thanks
Reply all
Reply to author
Forward
0 new messages