I have multiple datacenters, and I plan to place a single Prometheus in each one scraping data from a few thousand of apps in that datacenter. I also want to have a federation to provide a global view aggregating all the app instances of each data center.
The problem I’ve run into is that Federation appears to expect a rule explicitly defined for every single metric name. This is highly impractical to my use case, as app teams may introduce new metric names at any time via a Prometheus library, and these teams expect to be able to see their data aggregated globally. This appears to mean I would then need to add a new aggregation rule on the datacenter-level Prometheus servers every time an app team introduces a new metric name.
Thus, I’m hoping there’s a way I can configure this setup to aggregate every potential metric name with just a single or handful of rules. Ideally, I’d love if I could do something like take the sum of all counter metrics, and the avg of all gauge metrics… but I’m fairly certain Prometheus doesn’t store metric types and thus this isn’t possible.
The next best solution, I think, may be to just have a single rule on each DC-level-Prometheus that applies to absolutely every single metric and makes a SUM-aggregation metric, and then a second rule on each DC-level-Prometheus that applies to every absolutely single metric and makes a AVG-aggregation metric. These aggregation metrics would then of course be collected by the Federation. Is this possible? How would I configure this in the datacenter Prometheus rule file?
Or is there fundamentally a different approach I should be taking in trying to provide app teams a global view of their data? I don’t think simply using multiple datasources in Grafana is a valid solution, because you can’t perform global aggregations across multiple datacenters- (e.g. a team wouldn’t be able to say globally the number of requests they’ve had).