snmp_exporter tricky alert query

56 views
Skip to first unread message

George B

unread,
May 5, 2021, 4:04:02 PM5/5/21
to Prometheus Users
We need to generate an alert  - via Prometheus snmp_exporter metrics - when less than 80% of the nodes on our active bigip F5 load balancer is up (i.e. ).  I think we have the percentage of up hosts, but am not sure how to ensure that we are only alerting on the active F5 load balancer node.  In the snmp_exporter each F5 node is a distinct instance label name.

Here are the two metrics in question.
host up metric: ltmPoolMemberMonitorState = 4
f5 node active metric: sysCmFailoverStatusId = 4

Below are counting the number of ltmPoolMemberNodeName with a ltmPoolMemberNodeName that includes "prod" that are up, divided by the total number of ltmPoolMemberNodeName.  Then we appended the OR operator to provide a 0 when all hosts are in a down state (i.e. ltmPoolMemberMonitorState is not 4). See below:

count(count by (ltmPoolMemberNodeName) (ltmPoolMemberMonitorState{ltmPoolMemberNodeName=~".*prod.*"} == 4)) / count(count by (ltmPoolMemberNodeName) (ltmPoolMemberMonitorState{ltmPoolMemberNodeName=~".*prod.*"})) OR on() vector(0)

Now we need to ensure that we are only deriving the calculation from the active f5 node instance metrics (i.e. when the metric sysCmFailoverStatusId is equal to 4 for a particular instance).  I tried with (instance) and on (instance) to keep the metrics on same F5 node instance label, but haven't had any luck.  Any recommendations would be greatly appreciated. 
Reply all
Reply to author
Forward
0 new messages