Using Regex in the Annotations of Alert.

45 views
Skip to first unread message

Yagyansh S. Kumar

unread,
Mar 15, 2020, 3:12:20 AM3/15/20
to Prometheus Users
Hi. I want to add the dashboard link in the alert of that particular service. That dashboard takes the server IP and hostname as input. From the instance label, I want to remove the port number and pass it as input to be dashboard.
Configured Alert:
  - alert: OutOfDiskSpace-Crit
    expr: ((node_filesystem_size_bytes{fstype!~"nfs.*"} - node_filesystem_avail_bytes{fstype!~"nfs.*"}) / node_filesystem_size_bytes{fstype!~"nfs.*"} * 100 > 90) * on (instance) group_left(nodename) node_uname_info
    for: 5m
    labels:
      severity: "CRITICAL"
    annotations:
      summary: "High Disk Usage on *{{ $labels.instance }}* - *{{ $labels.nodename }}* on the filesystem *{{ $labels.mountpoint }}*"
      description: "Current Usage of {{ $labels.mountpoint }} = *{{ $value | humanize }}%*"
      dashboard: "DashbordLink?orgId=1&var-node=&var-name={{ $labels.nodename }}&fullscreen&panelId=174"

In the var-name variable, I am directly passing the hostname using $labels.nodename , I want to pass the IP(without port) in the var-node variable.

Thanks!

Brian Candler

unread,
Mar 15, 2020, 5:02:03 AM3/15/20
to Prometheus Users
On Sunday, 15 March 2020 08:12:20 UTC+1, Yagyansh S. Kumar wrote:
Hi. I want to add the dashboard link in the alert of that particular service. That dashboard takes the server IP and hostname as input. From the instance label, I want to remove the port number and pass it as input to be dashboard.

I strongly recommend you do the opposite: make sure the instance label doesn't contain the port number in the first place.

This is described in outline here:

The general idea is: copy the __address__ label explicitly to 'instance', and then add :9100 (or whatever) to the __address__ label.  For example, the instance label is "foo.example.com" and the scraped address is "foo.example.com:9100"

You can take this further, and have the instance label be a short/friendly name, while the address is either a FQDN or an IP address.  For a working config which does this see https://github.com/candlerb/netbox-prometheus#targets

Yagyansh S. Kumar

unread,
Mar 15, 2020, 5:07:24 AM3/15/20
to Prometheus Users
Thanks for the quick response. I appreciate your advice and I know that instance label shouldn't contain the port number and that should be the ideal way forward, but now my setup is huge. I'll have to change things over all my dashboards and I am monitoring close to 3000 servers. Eventually, I am going to change the instance label altogether but currently I need a short term solution to remove the port from alert annotations. Is it at all possible? 

Christian Hoffmann

unread,
Mar 15, 2020, 7:56:47 AM3/15/20
to Yagyansh S. Kumar, Prometheus Users
Hi,

On 3/15/20 10:07 AM, Yagyansh S. Kumar wrote:
> Thanks for the quick response. I appreciate your advice and I know that
> instance label shouldn't contain the port number and that should be the
> ideal way forward, but now my setup is huge. I'll have to change things
> over all my dashboards and I am monitoring close to 3000 servers.
> Eventually, I am going to change the instance label altogether but
> currently I need a short term solution to remove the port from alert
> annotations. Is it at all possible?

Some ideas come to mind. I would call them all workarounds, as they can
become rather ugly maintenance-wise.

1) Have another meta metric which provides an instance="a:9100" to
"instance_ip" mapping. This could be generated via a recording rule in
combination with label_replace(), e.g.:

- record: instance_to_ip_mapping
expr: count by(instance, instance_ip) (label_replace(up,
"instance_ip", "$1", "instance", "(.*):\d+"))

https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace

You could then mix in this metric into the relevant alerts.


2) You could attach such an instance_ip label to some or all of your
metrics using relabeling. This will likely cause a larger overhead though.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs


3) You could run an additional query in your alerts, e.g.

dashboard: "DashbordLink?orgId=1&var-node={{ printf
"label_replace(up{instance='%s'}, 'instance_ip', '$1', 'instance',
'(.*):\d+'" .Labels.instance | query | first | label 'instance_ip'
}}&var-name={{ $labels.nodename }}&fullscreen&panelId=174"


4) You could try to offload this mapping to Grafana, i.e. make your
dashboard work with just a name selected and let Grafana do the mapping.
Not sure, if/how this is possible, might also depend on your dashboard
and the variables.


All in all, I would still follow Brian's suggestion of getting the
instance= label right in the first place. It is possible -- we went
through the same process at some time. ;)

As always, all untested examples -- it's likely I've got some
syntax/quoting/escaping wrong there. If you end up using one of those
examples, please report back with the final solution, so that others can
also benefit from this. :)


Kind regards,
Christian

Yagyansh S. Kumar

unread,
Mar 15, 2020, 10:40:40 AM3/15/20
to Prometheus Users
Thanks a lot, Christian. Will try them out and report back.
Also, according to you will the Step 3 add any significant overhead? I mean will it cause any kind of slowness?

Christian Hoffmann

unread,
Mar 15, 2020, 10:48:08 AM3/15/20
to Yagyansh S. Kumar, Prometheus Users
On 3/15/20 3:40 PM, Yagyansh S. Kumar wrote:
> Thanks a lot, Christian. Will try them out and report back.
> Also, according to you will the Step 3 add any significant overhead? I
> mean will it cause any kind of slowness?

I don't think it would cause slowness per-se as the cardinality will be
in the same scale as your number of systems, but the additional data
will have to be stored. So it will at least need some additional
space/RAM per metric.

I'm not that much into the TSDB internals to be able to give exact
numbers. This is more a gut feeling that you should try the other
options first. ;)

Kind regards,
Christian

Yagyansh S. Kumar

unread,
Mar 15, 2020, 10:50:36 AM3/15/20
to Prometheus Users
Okay, cool. Thanks for the heads up.
Reply all
Reply to author
Forward
0 new messages