concurrent processing of recording rules

287 views
Skip to first unread message

vvuth...@ebay.com

unread,
Jul 10, 2018, 7:05:30 PM7/10/18
to Prometheus Users
Hi,

I have a question about concurrent processing of recording rules. If I put all recording rules in a single group, do they get processed sequentially?   Do I get better parallelism by splitting my rules into different rule groups?

What I am observing is that prometheus_rule_group_duration_seconds average is 50s and 99th percentile spikes to 400s for my setup. (I have lot of recording rules). But 99th percentile of prometheus_rule_evaluation_duration_seconds  is around 15s most of the time

Unless some of the rules within a group are processed sequentially, I don't see why group duration 99th percentile would be so large compared to individual rule duration 99th percentile.

Can someone please throw some light on how recording rules are processed. And what is best way to achieve maximum parallel execution of the rules?

Thanks & Best Regards
Viswa

Brian Brazil

unread,
Jul 11, 2018, 2:36:41 AM7/11/18
to vvuth...@ebay.com, Prometheus Users
On 11 July 2018 at 00:05, <vvuth...@ebay.com> wrote:
Hi,

I have a question about concurrent processing of recording rules. If I put all recording rules in a single group, do they get processed sequentially?

Yes
 
   Do I get better parallelism by splitting my rules into different rule groups?

Yes, though you should generally keep rules for one job in one group so that the timestamps will line up nicely.
 

What I am observing is that prometheus_rule_group_duration_seconds average is 50s and 99th percentile spikes to 400s for my setup. (I have lot of recording rules). But 99th percentile of prometheus_rule_evaluation_duration_seconds  is around 15s most of the time

50s is an expensive set of rules, you should look to see if you can trim them down a bit.

Brian
 

Unless some of the rules within a group are processed sequentially, I don't see why group duration 99th percentile would be so large compared to individual rule duration 99th percentile.

Can someone please throw some light on how recording rules are processed. And what is best way to achieve maximum parallel execution of the rules?

Thanks & Best Regards
Viswa

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/08df6413-09f4-4740-8555-f5ff92f86254%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Alin Sînpălean

unread,
Jul 11, 2018, 3:31:12 AM7/11/18
to Prometheus Users
BTW, you can check the Prometheus rules status page -- http://host:9090/rules -- for stats on how long each rule last took to evaluate. It jumps around a bit on successive evals, but it's very useful for identifying which particular rules are worth optimizing.

There are also exported metrics regarding rule eval latency (look for metrics starting with prometheus_rule_) but they're either per group or aggregated across all rules.

Cheers,
Alin.
Reply all
Reply to author
Forward
0 new messages