number of occurrences in a day

591 views
Skip to first unread message

BHARATH KUMAR

unread,
Mar 22, 2023, 1:19:01 AM3/22/23
to Prometheus Users
Hello All,

I have a Prometheus metric that will give output as 0 or 1 or 2. It can be anything(0 or 1 or 2). Could you tell me the number of 1's that occurred in the last 24 hours?

I tried with count_over_time. but I am getting errors. I tried sum_over_time, but it is not working for a few test cases.

Any lead?

I really appreciate any help you can provide.

Thanks & regards,
Bharath Kumar

Brian Candler

unread,
Mar 22, 2023, 4:06:23 AM3/22/23
to Prometheus Users
It's not easy to do exactly.

To get a rough answer, you can do a subquery:  (foo == 1)[24h:1m] will resample the timeseries at 1 minute intervals, then you can wrap that with count_over_time, giving:
    count_over_time((foo == 1)[24h:1m])

But if you weren't scraping at exactly 1 minute intervals, the count may not be accurate. Also if there are missed samples, the value of foo at time T will look back for the previous value (up to 5 minutes by default), which means in that situation some samples may be double-counted (in effect, assuming the metric value remained constant over that time, when you don't actually know what value it had).

The only way I know to get an exact answer is to send the range vector query "foo[24h]" to the *instant* query endpoint, then filter and count the samples client-side.  A range vector like that gives the raw values with their raw timestamps as stored in the TSDB.

For this use case it would be nice if Prometheus were to allow certain operators to work directly on range vectors, so you could write
    foo[24h] == 1
But that would add quite a lot of complexity into the semantics of the query language, which already has to consider argument combinations for (scalar, scalar), (scalar, instant vector) and (instant vector, instant vector).

Brian Candler

unread,
Mar 22, 2023, 4:45:35 AM3/22/23
to Prometheus Users
I also note that you're only sampling periodically whether something is in state 0, 1 or 2: you don't really know what happened in between those samples, so you're never going to get a truly accurate value for how long it was in each state. For example, it could flip from 1 to 2 and back to 1 between scrapes.

If you want *really* accurate answers for how long your application has been in state 0, 1 or 2 then you need to reinstrument it with metrics which accumulate time in each state:

application_state_seconds_total{state="0"} xxx
application_state_seconds_total{state="1"} xxx
application_state_seconds_total{state="2"} xxx

Every time your application changes state, you add the number of seconds it was in that state to the total.  In case it stays in the same state for a long time you also do this periodically, e.g. if the application has remained in the same state for 1 second, then you add 1 second to the metric and subtract 1 second from from the time you're accumulating.

This is basically how many of the disk I/O metrics work.

Then:
    increase(application_status_seconds_total{status="1"}[24h])
will give you a *very* accurate estimate of how long the application was in that state, even if it switches many times within the same second.

BHARATH KUMAR

unread,
Mar 22, 2023, 11:34:06 PM3/22/23
to Prometheus Users
(foo==1)[24h:1m]. I tried this query. But I am getting an error message " bad_data: invalid expression type "range vector" for range query, must be Scalar or instant Vector ".

foo[24h] == 1. Error message: " bad_data: 1:1: parse error: binary expression must contain only scalar and instant vector types "
I am scraping at 1-minute intervals. 

Brian: I also note that you're only sampling periodically whether something is in state 0, 1, or 2: you don't really know what happened in between those samples, so you're never going to get a truly accurate value for how long it was in each state. For example, it could flip from 1 to 2 and back to 1 between scrapes.

Me: It's fine. Even if it flips from 1 to 2 and back to 1 between scrapes. I only bother about the output samples. For example last 24 hours I will have 1440 samples. I want to find a number of 1's. It's fine even though it flips in between scrapes.

Brian Candler

unread,
Mar 23, 2023, 3:55:10 AM3/23/23
to Prometheus Users
(foo==1)[24h:1m]
creates a range vector, which you then need to process further. If you run a function over it to count the values in the range you will get an instant vector, which as I showed in my original post makes this:
count_over_time((foo==1)[24h:1m])
That is valid (I've tested it with 'up' instead of 'foo'), and that is the solution I propose.

foo[24h] == 1
is not currently valid prometheus syntax, as I already said. There are some use cases where it would be nice *if* prometheus implemented this - but it doesn't.
Reply all
Reply to author
Forward
0 new messages