Impact of rule group iteration misses

Julian Maicher

unread,

Nov 30, 2020, 9:34:26 AM11/30/20

to Prometheus Users

Hi,

we have a set of high-cardinality metrics and currently design recording rules, primarily to improve dashboard performance.

At a certain threshold, we observe group evaluation times exceeding the interval, thus leading to iteration misses [1].
In these cases, we can also see that the next iteration starts at the end of the last evaluation plus the interval. So the the iteration is not really skipped but rather delayed (the schedule has a lag).

What is the impact of this? Do we need to worry about iteration misses?

To be more concrete, here is one of our rule groups:

groups:
- name: http_server_requests_seconds_bucket
rules:
- record: app_method_uri_status_le:http_server_requests_seconds_bucket:rate1m
expr: sum by(app, method, uri, status, le) (rate(http_server_requests_seconds_bucket[1m]))
- record: app_le:http_server_requests_seconds_bucket:rate1m
expr: sum by(app, le) (app_method_uri_status_le:http_server_requests_seconds_bucket:rate1m)

The scrape interval is set to 15s, the evaluation interval to 30s.

With ~3Mio time series [2], we see evaluation times of ~1m.

[1] We use "prometheus_rule_group_iterations_missed_total" to monitor iteration misses
[2] We have a little test tool to simulate load on prometheus before rolling this out. We're trying to find limits of a single prometheus instance before scaling horizontally (federation) or reaching for e.g., Thanos, Cortex.

Stuart Clark

unread,

Nov 30, 2020, 9:50:07 AM11/30/20

to Julian Maicher, Prometheus Users

Are you saying there are 3 million time series for the http_server_requests_seconds_bucket metric, or in total for the server?

Looking at your query the uri label looks very problematic - if that is the URI called by an external Internet user that has infinite cardinality, as they could just make up things. That could completely break your server. If you do want to have some indication of the page requested I'd suggest some sort of processing of the raw value. Remove as much as you can (query parameters, final piece of the path?) and ideally match against an allow list (with "other" for anything that is rejected).

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Julian Maicher

unread,

Nov 30, 2020, 10:27:10 AM11/30/20

to Prometheus Users

Thanks for the reply.

Are you saying there are 3 million time series for the http_server_requests_seconds_bucket metric, or in total for the server?

Yes, 3 million time series for the http_server_requests_seconds_bucket metric. It's still a test scenario, but not too far away.

Looking at your query the uri label looks very problematic - if that is the URI called by an external Internet user that has infinite cardinality, as they could just make up things. That could completely break your server. If you do want to have some indication of the page requested I'd suggest some sort of processing of the raw value. Remove as much as you can (query parameters, final piece of the path?) and ideally match against an allow list (with "other" for anything that is rejected).

We are aware of the cardinality problem and use uri templates, e.g. /users/{id}, and status groups (although that doesn't make a big difference here).

We plan to have a consistent golden signal/RED dashboard for HTTP workloads in a service-oriented architecture, ideally with endpoint drill down for debugging/SLOs (not billing, just signals).

Think in the range of 50-100 services, few with lots of endpoints, many with just a few endpoints.

At some point, a single, vertically-scaled prometheus instance per environment won't be enough. We currently try to find out when.

With the current test scenario (3 million time series, 30d retention, running for 20d so far), we "only" see iteration misses in the rule group evaluation. Otherwise, the instance handles it pretty well.

Ben Kochie

unread,

Dec 1, 2020, 3:02:44 AM12/1/20

to Julian Maicher, Prometheus Users

Now that Prometheus supports isolation, it shouldn't impact the results of the query for evaluation to take longer than the interval. But it will impact your memory and performance.

One possible solution, you could shard your rule evaluation by one of your labels. For example, by app. I've done this for a couple of our services that have similar naming/cardinality issues.

groups:
- name: http_server_requests_seconds_bucket{app="foo"}

rules:
- record: app_method_uri_status_le:http_server_requests_seconds_bucket:rate1m
expr: |
sum by (app, method, uri, status, le) (

rate(http_server_requests_seconds_bucket{app="foo"}[1m])

)
- record: app_le:http_server_requests_seconds_bucket:rate1m
expr: |
sum by(app, le) (

app_method_uri_status_le:http_server_requests_seconds_bucket:rate1m{app="foo"}
)
- name: http_server_requests_seconds_bucket{app="bar"}

rules:
- record: app_method_uri_status_le:http_server_requests_seconds_bucket:rate1m
expr: |
sum by (app, method, uri, status, le) (

rate(http_server_requests_seconds_bucket{app="bar"}[1m])

)
- record: app_le:http_server_requests_seconds_bucket:rate1m
expr: |
sum by(app, le) (

app_method_uri_status_le:http_server_requests_seconds_bucket:rate1m{app="bar"}
)

Note, the name in the rule group is just an identifier, and has no impact on the rule eval. It just needs to be different per group.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/33cb8c8e-7d88-4f27-818f-2ddf0a4bab94n%40googlegroups.com.

Reply all

Reply to author

Forward