Help with naming recording rule levels when aggregating multiple metrics

235 views

Skip to first unread message

Juan Bran

unread,

Sep 16, 2020, 4:49:27 PM9/16/20

to Prometheus Users

Hi,

I've read through the Prometheus Recording Rules document, and I find myself in a situation where I'm not sure what the accepted naming convention is for some instances. Does it make sense to have an empty "level" such that a metric reads `:metrc:operations`? If I'm aggregating across multiple levels like what are some standard practices on keeping the level meaningful without growing out of control?

For example, my recording rule is summing multiple gauges from a legacy application to form a more useful metric. I then roll that up into pop-level metrics, and I'd rather not use "pop_city_region_platform" for the level. The same holds for the initial aggregation. How do you guys balance making the aggregation level informative without it being unwieldy?

- record: server_pop_city_region_platform_service:proxy_err_500

expr: |

sum by (server, pop, city, region, platform, service)

(

Error_Gauge1 +

Error_Gauge2 +

...

Error_Gauge3

)

- record: pop_city_region_platform:proxy_err_500:sum_rate5m

expr: sum by (pop, city, region, platform) (proxy_err_500)

Thanks,

-Juan

Brian Brazil

unread,

Sep 17, 2020, 2:10:43 AM9/17/20

to Juan Bran, Prometheus Users

On Wed, 16 Sep 2020 at 21:49, Juan Bran <jua...@gmail.com> wrote:

Hi,

I've read through the Prometheus Recording Rules document, and I find myself in a situation where I'm not sure what the accepted naming convention is for some instances. Does it make sense to have an empty "level" such that a metric reads `:metrc:operations`?

No, at the least there should always be a job label that you can use.

If I'm aggregating across multiple levels like what are some standard practices on keeping the level meaningful without growing out of control?

For example, my recording rule is summing multiple gauges from a legacy application to form a more useful metric. I then roll that up into pop-level metrics, and I'd rather not use "pop_city_region_platform" for the level. The same holds for the initial aggregation. How do you guys balance making the aggregation level informative without it being unwieldy?

You can sometimes cheat a little with target labels, however not with instrumentation labels. If there's many relevant labels, the question is more if you have too many labels in the first place and should look at cutting down.

- record: server_pop_city_region_platform_service:proxy_err_500

This is missing an operation, sum is a good choice if nothing else seems appropriate.

expr: |
sum by (server, pop, city, region, platform, service)
(
Error_Gauge1 +
Error_Gauge2 +
...
Error_Gauge3
)

- record: pop_city_region_platform:proxy_err_500:sum_rate5m
expr: sum by (pop, city, region, platform) (proxy_err_500)

This rule doesn't make sense, which is the benefit of this naming scheme - it helps you spot weird stuff. The labels look right, however the operation can't be rate5m as the source metric isn't a rate5m. In addition the sum_ can be excluded here, as sum is the default aggregation and thus implicit. You wouldn't do sum_sum_ after all.

Brian

Thanks,
-Juan

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/de3c978f-e6dd-4a78-8af9-3c60c3dd0291n%40googlegroups.com.

Brian Brazil

www.robustperception.io

Reply all

Reply to author

Forward

0 new messages