PromQL: Addition with NaN

1,825 views
Skip to first unread message

Gunther Klein

unread,
Dec 2, 2019, 8:43:09 AM12/2/19
to Prometheus Users
Hi there,

i have a promql query which adds up sums of two different metrics like this:

sum(increase(pushes_android{stage="dev"}[10m])) + sum(increase(pushes_ios{stage="dev"}[10m]))

This works fine if both metrics have values defined. However if one of both sums results in NaN the overall sum is also NaN, rather than a sum where NaN is treated as 0 (e.g. 15 + NaN = 15).
Any ideas how i can achieve that?

Regards,
Gunther

Brian Brazil

unread,
Dec 2, 2019, 9:12:29 AM12/2/19
to Gunther Klein, Prometheus Users
The problem here is that you have a NaN, which shouldn't be possible in the first place when using that query on counters. Do you know where the NaN is coming from?

Brian 


Regards,
Gunther

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5c1e8ca4-7b0c-43b5-9718-e0550cd2f2a9%40googlegroups.com.

Gunther Klein

unread,
Dec 2, 2019, 11:03:04 AM12/2/19
to Prometheus Users
Hmm, that's a good point. The NaN seems to occur only when a counter has not been incremented yet, which in our happens quite often in the morning on our dev stages (we stop servers in the evening and start them in the morning) or after new server deployments. The project we are using is Aerogear UPS (mobile pushes), which uses prometheus simple client for metric exporting: https://github.com/aerogear/aerogear-unifiedpush-server/blob/master/service/src/main/java/org/jboss/aerogear/unifiedpush/service/metrics/PrometheusExporter.java

After a fresh server restart the push metrics are not exported:

When i trigger the first android pushes it returns:
# curl -s http://localhost:8080/ag-push/rest/prometheus/metrics | grep aerogear
aerogear_ups_push_requests_android
2.0
aerogear_ups_push_requests_total
2.0

, which seems correct. However the other metrics are not inititialzed at all until they are incremented the very first time. I could not reproduce this behaviour locally (i suppose perhaps because i have micrometer in the classpath which may default initialize counter values?).
According to https://github.com/prometheus/client_golang/issues/190 and your comment there it seems that prometheus-simple-client-java is expected to just behave that way. Any ideas how i can workaround these issues from outside. E.g. metric relabeling?

Gunther



Brian Brazil

unread,
Dec 2, 2019, 11:07:36 AM12/2/19
to Gunther Klein, Prometheus Users
On Mon, 2 Dec 2019 at 16:03, 'Gunther Klein' via Prometheus Users <promethe...@googlegroups.com> wrote:
Hmm, that's a good point. The NaN seems to occur only when a counter has not been incremented yet, which in our happens quite often in the morning on our dev stages (we stop servers in the evening and start them in the morning) or after new server deployments. The project we are using is Aerogear UPS (mobile pushes), which uses prometheus simple client for metric exporting: https://github.com/aerogear/aerogear-unifiedpush-server/blob/master/service/src/main/java/org/jboss/aerogear/unifiedpush/service/metrics/PrometheusExporter.java

After a fresh server restart the push metrics are not exported:

When i trigger the first android pushes it returns:
# curl -s http://localhost:8080/ag-push/rest/prometheus/metrics | grep aerogear
aerogear_ups_push_requests_android
2.0
aerogear_ups_push_requests_total
2.0

, which seems correct. However the other metrics are not inititialzed at all until they are incremented the very first time.

They're produce nothing then, not NaN.
 
I could not reproduce this behaviour locally (i suppose perhaps because i have micrometer in the classpath which may default initialize counter values?).
According to https://github.com/prometheus/client_golang/issues/190 and your comment there it seems that prometheus-simple-client-java is expected to just behave that way. Any ideas how i can workaround these issues from outside. E.g. metric relabeling?

If a client library can produce a NaN for a directly instrumented counter, that is a bug.

Brian
 

Gunther



--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Aliaksandr Valialkin

unread,
Dec 2, 2019, 5:57:46 PM12/2/19
to Gunther Klein, Prometheus Users
Hi Gunther,

Binary operators (i.e. +, -, /, *, etc.) in Prometheus return NaNs if at least a single operand is NaN, while `sum()` skips NaNs. So you can rewrite the query into something like:

sum(increase({__name__=~"pushes_(android|ios)", stage="dev"}[10m]))

and it should work as you expect, i.e. ignore NaNs during calculating the aggregate sum.


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.


--
Best Regards,

Aliaksandr

Brian Brazil

unread,
Dec 2, 2019, 6:21:05 PM12/2/19
to Aliaksandr Valialkin, Gunther Klein, Prometheus Users
On Mon, 2 Dec 2019 at 22:57, Aliaksandr Valialkin <val...@gmail.com> wrote:
Hi Gunther,

Binary operators (i.e. +, -, /, *, etc.) in Prometheus return NaNs if at least a single operand is NaN,

This is standard floating point behaviour, which we preserve.
 
while `sum()` skips NaNs.


If you've an example where PromQL is doing otherwise, please let us know.

sum(increase({__name__=~"pushes_(android|ios)", stage="dev"}[10m])) 

and it should work as you expect, i.e. ignore NaNs during calculating the aggregate sum.

This query will likely error out due to duplicate series in the result of increase. And even if that weren't the case, it'd still produce a NaN here.

Brian
 


On Mon, Dec 2, 2019 at 3:43 PM 'Gunther Klein' via Prometheus Users <promethe...@googlegroups.com> wrote:
Hi there,

i have a promql query which adds up sums of two different metrics like this:

sum(increase(pushes_android{stage="dev"}[10m])) + sum(increase(pushes_ios{stage="dev"}[10m]))

This works fine if both metrics have values defined. However if one of both sums results in NaN the overall sum is also NaN, rather than a sum where NaN is treated as 0 (e.g. 15 + NaN = 15).
Any ideas how i can achieve that?

Regards,
Gunther

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5c1e8ca4-7b0c-43b5-9718-e0550cd2f2a9%40googlegroups.com.


--
Best Regards,

Aliaksandr

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Aliaksandr Valialkin

unread,
Dec 3, 2019, 9:04:08 AM12/3/19
to Brian Brazil, Gunther Klein, Prometheus Users
On Tue, Dec 3, 2019 at 1:21 AM Brian Brazil <brian....@robustperception.io> wrote:
On Mon, 2 Dec 2019 at 22:57, Aliaksandr Valialkin <val...@gmail.com> wrote:
Hi Gunther,

Binary operators (i.e. +, -, /, *, etc.) in Prometheus return NaNs if at least a single operand is NaN,

This is standard floating point behaviour, which we preserve.
 
while `sum()` skips NaNs.


If you've an example where PromQL is doing otherwise, please let us know.

compare query_range results for the following queries:

sum(
    minute(vector(time())) > 30
) +
sum(label_replace(
    minute(vector(time())) < 40,
"foo", "bar", "", ""))

vs

sum(
    minute(vector(time())) > 30 or
    label_replace(minute(vector(time())) < 40, "foo", "bar", "", "")
)



sum(increase({__name__=~"pushes_(android|ios)", stage="dev"}[10m])) 

and it should work as you expect, i.e. ignore NaNs during calculating the aggregate sum.

This query will likely error out due to duplicate series in the result of increase. And even if that weren't the case, it'd still produce a NaN here.

Oops - the query returns "vector cannot contain metrics with the same labelset" error, because `increase` removes metric names :(

--
Best Regards,

Aliaksandr

Brian Brazil

unread,
Dec 3, 2019, 9:21:32 AM12/3/19
to Aliaksandr Valialkin, Gunther Klein, Prometheus Users
On Tue, 3 Dec 2019 at 14:04, Aliaksandr Valialkin <val...@gmail.com> wrote:


On Tue, Dec 3, 2019 at 1:21 AM Brian Brazil <brian....@robustperception.io> wrote:
On Mon, 2 Dec 2019 at 22:57, Aliaksandr Valialkin <val...@gmail.com> wrote:
Hi Gunther,

Binary operators (i.e. +, -, /, *, etc.) in Prometheus return NaNs if at least a single operand is NaN,

This is standard floating point behaviour, which we preserve.
 
while `sum()` skips NaNs.


If you've an example where PromQL is doing otherwise, please let us know.

compare query_range results for the following queries:

sum(
    minute(vector(time())) > 30
) +
sum(label_replace(
    minute(vector(time())) < 40,
"foo", "bar", "", ""))

vs

 
sum(
    minute(vector(time())) > 30 or
    label_replace(minute(vector(time())) < 40, "foo", "bar", "", "")
)

These expressions inside the sum() can only produce samples with real numbers, so I'm not seeing anything to do with NaNs. The difference in results here is standard vector matching.

Brian




sum(increase({__name__=~"pushes_(android|ios)", stage="dev"}[10m])) 

and it should work as you expect, i.e. ignore NaNs during calculating the aggregate sum.

This query will likely error out due to duplicate series in the result of increase. And even if that weren't the case, it'd still produce a NaN here.

Oops - the query returns "vector cannot contain metrics with the same labelset" error, because `increase` removes metric names :(

--
Best Regards,

Aliaksandr
Reply all
Reply to author
Forward
0 new messages