On Wed, Oct 14, 2020 at 10:12 AM Brian Brazil
<
brian....@robustperception.io> wrote:
>
> On Wed, 14 Oct 2020 at 02:32, Peter Bourgon <
pe...@bourgon.org> wrote:
>>
>> The fastly-exporter [0] converts statistics scraped from the Fastly
>> real-time stats API to metrics that can be scraped by Prometheus. When
>> configured with many services that receive a lot of global traffic,
>> the metric cardinality can grow quite large. And I'll soon add a new
>> data source that will potentially ~double metric cardinality again.
>>
>> The exporter can already be configured to export only e.g. specific
>> metrics. But users are suggesting it would be good to be able to
>> manipulate the exported metrics in a more powerful way. For example,
>> they'd like a way to define regions as sets-of-datacenters (a label
>> applied to all metrics) and then "collapse" (merge) the metrics so
>> that the datacenter labels are erased and the region label applied.
>> Or, to define feature groups as sets-of-metric-names, and then turn
>> entire feature groups on or off. Importantly, this should all be done
>> within the exporter itself, not via Prometheus relabeling rules,
>> because we're explicitly trying to reduce load on the Prometheus
>> servers scraping the exporter.
>>
>> I can come up with a little DSL or config format that could get the
>> job done. But is there some prior art here that's been successful?
>
> I'm not aware of anything generic in this space.
As I thought. Thanks for the confirmation.
> Disabling specific expensive collectors via flags isn't unusual, both the mysqld and node exporters have it for example.
In my case, it's not that any given collector or set of metrics are in
themselves expensive, but that all of the metrics and common label
dimensions are duplicated per-service, and power users can have many,
many services. I already have affordances to select metric names and
services in different ways, including "sharding" services over
different exporter instances. But even that is not enough for some
users.
> There's also a handful of things out there where you exclude labels that some metrics would usually have, e.g. removing the datacenter label. Dynamically adding in a region label is a bit odd though, as new instrumentation labels are a breaking change for downstream. If that's a useful label it should be there in the first place, not only something that appears when you're trying to reduce cardinality. I'd suggest keeping with simple flags over creating yet another language that users to have to learn.
To be clear, the notion of regions was just a speculative example of
how users might want to define and re-group metrics, it's not a
well-defined domain concept.