Compare metrics with differents labels

232 views
Skip to first unread message

Robson Jose

unread,
Apr 19, 2024, 11:31:23 AM4/19/24
to Prometheus Users
Good afternoon, I would like to know if it is possible to do this query, the value that should return is applications with a value of 0 in the first query and greater than one in the 2nd

(
  sum by (consumergroup, topic) (delta(kafka_consumergroup_current_offset{}[5m])/5) ==bool 0
)
and (
  sum by (topic) (delta(kafka_consumergroup_current_offset{}[5m])/5) >bool 1
)

Brian Candler

unread,
Apr 19, 2024, 1:28:29 PM4/19/24
to Prometheus Users
Can you give examples of the metrics in question, and what conditions you're trying to check for?

Looking at your specific PromQL query: Firstly, in my experience, it's very unusual in Prometheus queries to use ==bool or >bool, and in this specific case definitely seems to be wrong.

Secondly, you won't be able to join the LH and RH sides of your expression with "and" unless either they have exactly the same label sets, or you modify your condition using "and on (...)" or "and ignoring (...)".

"and" is a vector intersection operator, where the result vector includes a value if the labels match, and the value is taken from the LHS, and that means it doesn't combine the values like you might be used to in other programming languages. For example,

vector(0) and vector(1)  => value is 0
vector(1) and vector(0)  => value is 1
vector(42) and vector(99)  => value is 42

This is as described in the documentation:

vector1 and vector2 results in a vector consisting of the elements of vector1 for which there are elements in vector2 with exactly matching label sets. Other elements are dropped. The metric name and values are carried over from the left-hand side vector.

PromQL alerts on the presence of values, and in PromQL you need to think in terms of "what (labelled) values are present or absent in this vector", using the "and/unless" operators to suppress elements in the result vector, and the "or" operator to add additional elements to the result vector.

Maybe these explanations help:

Brian Candler

unread,
Apr 19, 2024, 1:30:21 PM4/19/24
to Prometheus Users

Brian Candler

unread,
Apr 19, 2024, 2:36:44 PM4/19/24
to Prometheus Users
Maybe what you're trying to do is:

sum by (consumergroup, topic) (rate(kafka_consumergroup_current_offset[5m]) * 60) == 0
unless sum by (topic) (rate(kafka_consumergroup_current_offset[5m]) * 60) < 1

That is: alert on any combination of (consumergroup,topic) where the 5-minute rate of consumption is zero, unless the rate for that topic across all consumers is less than 1 per minute.

As far as I can tell, kafka_consumergroup_current_offset is a counter, and therefore you should use either rate() or increase().  The only difference is that rate(foo[5m]) gives the increase per second, while increase(foo[5m]) gives the increase per 5 minutes.

Hence:
rate(kafka_consumergroup_current_offset[5m]) * 60
increase(kafka_consumergroup_current_offset[5m]) / 5
should both be the same, giving the per-minute increase.

Robson Jose

unread,
Apr 30, 2024, 8:51:50 AM4/30/24
to Prometheus Users

Hello, Thanks for responding in case

I want that if the consumption of messages in the topic in the last 5 minutes is 0 and the production of messages is greater than 1 in the topic, then the group of consumers is not consuming messages and I wanted to return which groups and topics these would be

Brian Candler

unread,
Apr 30, 2024, 11:14:23 AM4/30/24
to Prometheus Users
Without seeing examples of the exact metrics you are receiving then it's hard to be sure what the right query is.

> I want that if the consumption of messages in the topic in the last 5 minutes is 0 and the production of messages is greater than 1 in the topic

Then you'll want metrics for the consumption (consumer group offset) and production (e.g. partition long-end offset or consumer group lag)

Robson Jose

unread,
Apr 30, 2024, 11:23:15 AM4/30/24
to Prometheus Users
like this

  sum by (consumergroup, topic) (delta(kafka_consumergroup_current_offset{}[5m])/5)

{consumergroup="consumer-shop", topic="SHOP-EVENTS"}
1535.25
{consumergroup="$Default", topic="TOPIC-NOTIFICATION"}
1.5
{consumergroup="$Default", topic="TOPIC-NOTIFICATION-CHAT"}
0.25
{consumergroup="consumer-email", topic="TOPIC-NOTIFICATION-EMAIL"}
0
{consumergroup="$Default", topic="TOPIC-NOTIFICATION-TESTE"}
1.25
{consumergroup="$Default", topic="TOPIC-NOTIFICATION-SMS"}
0
{consumergroup="$Default", topic="TOPIC-NOTIFICATION-WHATSAPP"}
0
{consumergroup="consumer-user-event", topic="TOPIC-USER-EVENTS"}
0

Brian Candler

unread,
Apr 30, 2024, 11:28:29 AM4/30/24
to Prometheus Users
You're showing aggregates, not the raw metrics.

Robson Jose

unread,
Apr 30, 2024, 1:30:24 PM4/30/24
to Prometheus Users
like this ?

kafka_consumergroup_current_offset{consumergroup="consumer-events", env="prod", instance="kafka-exporter.monitor:9308", job="kafka-exporter", partition="0", topic="TOPIC-EVENTS"}
292350417
kafka_consumergroup_current_offset{consumergroup="$Default", env="prod", instance="kafka-exporter.monitor:9308", job="kafka-exporter", partition="0", topic="TOPIC-NOTIFICATION"}
30027218
kafka_consumergroup_current_offset{consumergroup="$Default", env="prod", instance="kafka-exporter.monitor:9308", job="kafka-exporter", partition="0", topic="TOPIC-NOTIFICATION-CHAT"}
3493310
kafka_consumergroup_current_offset{consumergroup="consumer-email", env="prod", instance="kafka-exporter.monitor:9308", job="kafka-exporter", partition="0", topic="TOPIC-NOTIFICATION-EMAIL"}
82381171
kafka_consumergroup_current_offset{consumergroup="$Default", env="prod", instance="kafka-exporter.monitor:9308", job="kafka-exporter", partition="0", topic="TOPIC-NOTIFICATION-PUSH"}
31267495
kafka_consumergroup_current_offset{consumergroup="$Default", env="prod", instance="kafka-exporter.monitor:9308", job="kafka-exporter", partition="0", topic="TOPIC-NOTIFICATION-SMS"}
366
kafka_consumergroup_current_offset{consumergroup="$Default", env="prod", instance="kafka-exporter.monitor:9308", job="kafka-exporter", partition="0", topic="TOPIC-NOTIFICATION-WHATSAPP"}

Brian Candler

unread,
Apr 30, 2024, 4:57:10 PM4/30/24
to Prometheus Users
There's no metric I see there that tells you whether messages are being produced, only whether they're being consumed.

Without that, then I'm not sure you can do any better than this:

sum by (consumergroup, topic) (rate(kafka_consumergroup_current_offset[5m]) * 60) == 0
unless on (topic) sum by (topic) (rate(kafka_consumergroup_current_offset[5m]) * 60) < 1

The first part:
sum by (consumergroup, topic) (rate(kafka_consumergroup_current_offset[5m]) * 60) == 0
will give you an alert for each (consumergroup,topic) combination which has not consumed anything in the last 5 minutes.

The second part:
unless on (topic) sum by (topic) (rate(kafka_consumergroup_current_offset[5m]) * 60) < 1
will suppress the alert if *no* consumers have consumed at least 1 message per minute.  But this won't be useful unless each topic has at least 2 consumer groups, so that if one is consuming it can alert on the other.

Given the examples you show, it looks like you only have one consumer group per topic.  Therefore, I think you need to find a metric which explicitly gives the publisher offset for each topic/partition.
Reply all
Reply to author
Forward
0 new messages