How to define metric type as a variable

39 views
Skip to first unread message

Agarwal ,Naveen

unread,
Mar 23, 2023, 8:57:16 PM3/23/23
to promethe...@googlegroups.com
Hi:

Our prometheus database contains around 5k+ unique type of metrics. With time, we have defined alerting rules to detect deviations. 

However, given the number of growing metrics, it is becoming difficult to expand the alerting rules. 

Generally we are interested in increase/decrease of values in the metrics when compared to a previous time-interval. Keeping this in mind, is it possible to write a query where metric name is not specified, instead it picks up all metric names available in database in sequence. 


e.g. #metrics(5min) /#metrics(30 mins) > 50
all unique metric names are picked from database. 

Thanks, 
Naveen

Brian Candler

unread,
Mar 24, 2023, 3:35:30 AM3/24/23
to Prometheus Users
No, because binary operators like division are designed to work between different metrics (with the same set of labels, but different metric name), e.g.

    node_filesystem_avail_bytes / node_filesystem_size_bytes

You can however generate your alerting rules programatically: make a script that writes out a rules file, then hits the reload endpoint.

Agarwal ,Naveen

unread,
Mar 24, 2023, 4:41:12 AM3/24/23
to Brian Candler, Prometheus Users
Thanks Brian. Insightful. 

From: promethe...@googlegroups.com <promethe...@googlegroups.com> on behalf of Brian Candler <b.ca...@pobox.com>
Sent: Friday, March 24, 2023 1:05:30 PM
To: Prometheus Users <promethe...@googlegroups.com>
Subject: [prometheus-users] Re: How to define metric type as a variable
 
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/11bed9ec-d186-4757-ac29-4d6147d61e76n%40googlegroups.com.

Brian Candler

unread,
Mar 30, 2023, 8:45:26 AM3/30/23
to Prometheus Users
> #metrics(5min) /#metrics(30 mins) > 50

Thinks: if you're only interested in the *number* of timeseries for each metric name, then you can do

    count by (__name__) ({__name__=~".+"})

(warning: potentially expensive query if you have many timeseries). Then you could move the metric name into a label:

    label_replace(count by (__name__) ({__name__=~".+"}), "metric", "$1", "__name__", "(.+)") * 1

At that point, you have something you could alert on. Example: find metrics which have at least 1% more timeseries than they did 30 minutes ago:

    (label_replace(count by (__name__) ({__name__=~".+"}), "metric", "$1", "__name__", "(.+)") * 1) / (label_replace(count by (__name__) ({__name__=~".+"} offset 30m), "metric", "$1", "__name__", "(.+)") * 1) > 1.01

This won't detect *completely new* metrics which appear, but you could have a separate rule for these, e.g. (untested):

    (label_replace(count by (__name__) ({__name__=~".+"}), "metric", "$1", "__name__", "(.+)") * 1) unless (label_replace(count by (__name__) ({__name__=~".+"} offset 30m), "metric", "$1", "__name__", "(.+)") * 1)

Or to detect *every* new timeseries, including new timeseries for existing metrics:

    {__name__=~".+"} unless {__name__=~".+"} offset 30m
Reply all
Reply to author
Forward
0 new messages