Prometheus Alert on Switch Port Saturation

96 views
Skip to first unread message

Brian Gates

unread,
Apr 19, 2024, 9:51:20 AM4/19/24
to Prometheus Users
hey all could use some assistance creating a Prometheus alert.

It seems simple but after some trial and error its more advanced than i thought.


We are looking to create alerts for "Link Saturation"


We have dashboards that show us the current bandwidth of the ports, and the queries look like this:
(
  sum by (snmp_target, ifDescr) (irate(ifHCOutOctets{job_snmp=~"integrations/snmp", snmp_target=~"$Switches", ifDescr=~"$ifName"}[$__interval])) +
  sum by (snmp_target, ifDescr) (irate(ifHCInOctets{job_snmp=~"integrations/snmp", snmp_target=~"$Switches", ifDescr=~"$ifName"}[$__interval]))
) * 8

And this query we use in a table to show the Ports Linked Speed:
last_over_time(ifHighSpeed{snmp_target="$Switches"}[$__interval])

So now here begs the question how can i create and alert that basically says

If the Value of (
  sum by (snmp_target, ifDescr) (irate(ifHCOutOctets{job_snmp=~"integrations/snmp", snmp_target=~"$Switches", ifDescr=~"$ifName"}[$__interval])) +
  sum by (snmp_target, ifDescr) (irate(ifHCInOctets{job_snmp=~"integrations/snmp", snmp_target=~"$Switches", ifDescr=~"$ifName"}[$__interval]))
) * 8 is Greater than last_over_time(ifHighSpeed{snmp_target="$Switches"}[$__interval]) fire an alert 

Ben Kochie

unread,
Apr 19, 2024, 10:05:32 AM4/19/24
to Brian Gates, Prometheus Users
Two things,

Full-duplex means you shouldn't add up `ifHCOutOctets` and `ifHCInOctets`. What you probably want is two alerts, one for "Port In Saturation" and one for "Port Out Saturation".

Because ifHighSpeed is megabits/sec, you'll have to do a bit more math. If you want to deal with half-duplex ports, you'll probably need to do some additional scraping of EtherLike-MIB and even a bit more math. It gets a bit complicated.

But the trivial alert would be something like this:

- alert: PortOutSaturation
  expr: >
    (
      sum by (snmp_target,ifIndex,ifAlias,ifDescr,ifName) (rate(ifHCOutOctets[5m]))
      * 8
      / 1000000
    )
    /
    avg by (snmp_target,ifIndex,ifAlias,ifDescr,ifName) (avg_over_time(ifHighSpeed[5m]))
    * 100
    > 90

This would alert when the port is over 90% of the rated link speed.



--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a23721bb-826a-4b95-8e99-a902aa891138n%40googlegroups.com.

Brian Gates

unread,
Apr 19, 2024, 10:09:56 AM4/19/24
to Prometheus Users
That makes sense, i am using the values of bytes/sec(IEC) to get a correct display of my bandwidth values, for reference i am polling Meraki Switches via SNMP

Im going to give the above alert a try and see if that works as expected

Ben Kochie

unread,
Apr 19, 2024, 10:14:54 AM4/19/24
to Brian Gates, Prometheus Users
You can always try the query in a graph, playing with the saturation threshold.

If the query returns data, there would be an alert, if no data, no alert.

Brian Gates

unread,
Apr 19, 2024, 10:17:20 AM4/19/24
to Prometheus Users
when i put in in explore and set the threshold to 10% it returned some data so i think this might just work afterall :) im going to make some fake network traffic and see if i cant get it to fire off.
Reply all
Reply to author
Forward
0 new messages