Federation of aggregated metrics

206 views
Skip to first unread message

hannesst...@gmail.com

unread,
Oct 31, 2023, 12:54:05 PM10/31/23
to Prometheus Users
Is it possible to include an aggregation formula in a federation query?
This would avoid creating and storing aggregated metrics on the lower-level Prometheus, and also remove the delay between aggregation and federation.

Or, can a recording rule fetch metrics directly from another Prometheus server?
This would avoid the delay between federation and execution of the recording rule.

We've got hundreds of pods, delivering millions of metrics. We plan to partition our pods and deploy one Prometheus per partition. A top-level Prometheus will then offer globally aggregated metrics.

How should we set this up? My current assumption is based on aggregation recording rules within each partition, then again at the top-level to get the global aggregation. This seems both complicated and a waste of resources, plus also introduces delays since recording rules and federation cannot be synced to each other.

To minimize delay, each partition-level Prometheus needs to aggregate as often as possible to offer "fresh" metrics to federation requests. Then the top level Prometheus also needs to federate these aggregated metrics as often as possible to offer "fresh" values at the global level. Doing things as often as possible "just in case" seems wasteful, which is why I am asking if this is the right approach for us.

hannesst...@gmail.com

unread,
Nov 16, 2023, 10:51:12 AM11/16/23
to Prometheus Users
I take the silence as a "no, it is not possible to connect recording rules with federation requests (or vice versa)" ...?

Bryan Boreham

unread,
Nov 22, 2023, 7:01:51 AM11/22/23
to Prometheus Users
Federation is a bit of a neglected feature.  The Thanos project is rather more popular as a way to aggregate data from multiple Prometheus.

(Other projects also exist which can let you centralise metrics storage)

Bryan

hannesst...@gmail.com

unread,
Nov 22, 2023, 7:04:26 AM11/22/23
to Prometheus Users
Ok, thanks for confirming my suspicion 👍

Ben Kochie

unread,
Nov 22, 2023, 5:45:57 PM11/22/23
to Bryan Boreham, Prometheus Users
I'd say less neglected, but more obsoleted by. Remote Write and Thanos Sidecar are the more functional modern replacements for the original Federation method. 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6ec7e309-1e6f-4700-97d8-496283d74fb9n%40googlegroups.com.

Bryan Boreham

unread,
Nov 23, 2023, 4:24:19 AM11/23/23
to Prometheus Users
Not going to argue with you but if that is the official position we should put it in the docs.

Bryan

Jseb Tarot

unread,
Nov 23, 2023, 11:09:44 AM11/23/23
to Prometheus Users

The sidecar of Thanos and Prometheus federations are not the same things!

A federation allows aggregate metrics to another Prometheus instance, and this is useful because, for example, you might have 'non-standard' blocks. As a reminder, if that's the case, you have no choice but to go through a Prometheus with fixed 2-hour non-compacted blocks for compatibility with the sidecar. At least, that was in the documentation. However, in practice, it's true that the sidecar push metrics to object storage such as S3, Azure, etc.., even if the blocks are compacted. There is an option for that: '--shipper.upload-compacted'. However, it doesn't tag, at least not to my knowledge! So, no, it's not obsolete; it's quite practical! In other words, you can tag the data coming from multiple Prometheuses by adding regional tags or information that makes it easy to query multisite. However, I haven't seen that the sidecar of Thanos can add tags?

In essence, what needs to be understood in my opinion is:

prometheus1 (with exporter)---- 

prometheus2 (with exporter) | 

prometheus3 (with exporter) |----------------prometheus (federate)------->thanos-sidecars----->S3 storage 

prometheus4 (with exporter)---                 (with different tags for prometheus1,2,3,4

Prometheus 1-4 store no more than 15 days of metrics (default behavior) Prometheus-federate is the same All metrics exceeding 15 days are stored on object storage such as S3, Azure, or any other of your choice!

Indeed, if we could tag from the sidecar, it would allow us to bypass Prometheus federations. But for now, I admit I don't know if it's feasible

Reply all
Reply to author
Forward
0 new messages