Excluding particular network interfaces from monitoring for some servers.

106 views
Skip to first unread message

Yagyansh S. Kumar

unread,
Apr 20, 2020, 3:45:45 AM4/20/20
to Prometheus Users
Hi. I have configured an alert to get notified whenever any of my Network Interfaces goes down. Now, on some servers some interfaces we have made down intentionally and I want to exclude those interfaces for those particular servers from the alert.
What is the best possible way to do this? I know I can silence the alert, but that wouldn't be a good solution as the interface would be down for indefinite time.

My configured alert looks like this:
  - alert: Network-Interface-Down
    expr: node_network_up{device!~"lo.*"} * on(instance) group_left(nodename) node_uname_info * on(instance,device) group_left(address) node_network_info == 0
    for: 5m
    labels:
      severity: "CRITICAL"
    annotations:
      summary: "Network interface down - {{ $labels.device }} on {{  $labels.nodename }}"

I want to disable the alert for eth1 on Host1, eth5,8 on Host2.

Julius Volz

unread,
Apr 20, 2020, 5:03:26 AM4/20/20
to Yagyansh S. Kumar, Prometheus Users
You have a couple of options:

A) If those interfaces are never relevant for anything, use the Node Exporter's "--collector.netdev.device-blacklist" flag on the respective machines to not even emit metrics for the intentionally disabled interfaces.

B) Produce time series (sample value doesn't matter) that tell Prometheus which interfaces on which host are supposed to be down, and join them into your alerting expression with an "unless" ("<your expression> unless on(instance, device) node_network_device_disabled) to remove alerts for those. The production of those time series can either be done via static recording rules (see e.g. the last code box in https://www.robustperception.io/using-time-series-as-alert-thresholds), or could come from an exporter, maybe even from the textfile module on each machine's Node Exporter, if you have the info there.

C) Hardcode the selection of those excepted devices directly into your alerting expression, with multiple "unless"-es. Probably a bit ugly :)

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4f018fd8-ae22-4dbe-bbcd-2fcdb4299895%40googlegroups.com.

Yagyansh S. Kumar

unread,
Apr 20, 2020, 5:31:14 AM4/20/20
to Prometheus Users
Thanks for the multiple suggestions.

Creating a recording rule makes sense and I actually tried creating a recording rule for this, but for some reason that is not working.
Can you help in the recording rule here a bit?
Thanks.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Brian Candler

unread,
Apr 20, 2020, 6:04:41 AM4/20/20
to Prometheus Users
Please think carefully about what you've just written.  What you've said in effect is:

"My configuration isn't working.  Please help me.  But I'm not going to show you my configuration.  I'm also not going to tell you what unexpected behaviour I'm seeing. I want you to guess what the problem might be."

This is not an effective use of our time, so don't be surprised if nobody helps you.  There's really good advice about how to ask questions effectively here:

If you are not prepared to share your configs, and you are only looking for sample working ones to guide you, then the link already posted has them:

Yagyansh S. Kumar

unread,
Apr 20, 2020, 11:13:03 AM4/20/20
to Prometheus Users
Sorry, I didn't notice, that I have forgot to share the configuration of the recording rule. I thought I have pasted it.
Anyways, you could have pointed that our nicely too.

Ben Kochie

unread,
Apr 20, 2020, 11:19:48 AM4/20/20
to Yagyansh S. Kumar, Prometheus Users
A recording rule is going to be tricky to do because the instance label needs to match.

I usually recommend doing this by Julius's other suggestion, the node_exporter textfile collector. This makes it easy to integrate the "ignore this interface" metric into your configuration management on the node.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/944a5e49-2d7f-4202-8454-095a0799783b%40googlegroups.com.

Yagyansh S. Kumar

unread,
Apr 20, 2020, 11:34:55 AM4/20/20
to Prometheus Users
Thanks for the suggestion Ben.

Good to see there are still some people who know how to talk to people on a forum. :)
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Julius Volz

unread,
Apr 20, 2020, 1:58:57 PM4/20/20
to Ben Kochie, Yagyansh S. Kumar, Prometheus Users
On Mon, Apr 20, 2020 at 5:19 PM Ben Kochie <sup...@gmail.com> wrote:
A recording rule is going to be tricky to do because the instance label needs to match.

I usually recommend doing this by Julius's other suggestion, the node_exporter textfile collector. This makes it easy to integrate the "ignore this interface" metric into your configuration management on the node.

Then again, if you're already doing per-machine configuration modifications, you may as well just set the blacklist flag on the Node Exporter I guess... if it's a relatively permanent thing per machine.
 
On Mon, Apr 20, 2020 at 5:13 PM Yagyansh S. Kumar <yagyans...@gmail.com> wrote:
Sorry, I didn't notice, that I have forgot to share the configuration of the recording rule. I thought I have pasted it.
Anyways, you could have pointed that our nicely too.

On Monday, April 20, 2020 at 3:34:41 PM UTC+5:30, Brian Candler wrote:
Please think carefully about what you've just written.  What you've said in effect is:

"My configuration isn't working.  Please help me.  But I'm not going to show you my configuration.  I'm also not going to tell you what unexpected behaviour I'm seeing. I want you to guess what the problem might be."

This is not an effective use of our time, so don't be surprised if nobody helps you.  There's really good advice about how to ask questions effectively here:

If you are not prepared to share your configs, and you are only looking for sample working ones to guide you, then the link already posted has them:

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/944a5e49-2d7f-4202-8454-095a0799783b%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Ben Kochie

unread,
Apr 20, 2020, 2:54:14 PM4/20/20
to Julius Volz, Yagyansh S. Kumar, Prometheus Users
Depends on the use case. Ignore for alerting, but include for traffic metrics.
Reply all
Reply to author
Forward
0 new messages