I have read here
https://www.robustperception.io/what-range-should-i-use-with-rate that I should stick to a single range for my rate() queries - especially in recording rules - and use avg_over_time() when I want to aggregate data for example in Grafana. Common ranges I also come across in the wild are 1m and 5m. But what I don't fully understand it
what factors are decisive when choosing either 1m or 5m range as the base?
Looking at GitLab, they have recording rules covering multiple ranges. And it's the same in their panels. Sometimes data with 5m ranges, sometimes 1m. I was not able to find an applicable pattern.
My environment: At any time a few hundred small targets online with a few hundred k up to single-digit million number of time series. Fairly low usage. So far from hundreds of requests per second or even minute. Meaning basically all my application specific counters are slow-moving.
Considering all this: Would you go for 1m or 5m range (to use in my recording rules)? I definitely cannot use ad-hoc queries everywhere because especially dealing with histogram data gets very demanding after exceeding a certain number of time series in a query.
Advantages of going with 1m:
- more "spikey" graphs
- closest to the raw data
Disadvantages of going with 1m:
- too spikey, especially with slow-moving time series. Often I end up with a lot of tall spikes close to each other. is this good or bad?
- users might interpret the graph as "exact" which is not the case.
Advantages of going with 5m:
- smooths out noise at lower time ranges
- looks better at least with slow moving metrics like in my environment
Disadvantages of going with 5m:
I would be happy to know your (probably way more pragmatic) approach to this.
Cheers, T.S.