Ways to Mitigate on huge label value pair during high cardinalities

Dinesh N

unread,

Jun 4, 2020, 3:01:02 PM6/4/20

to promethe...@googlegroups.com

Hi Team,

Figuring out ways to optimise high cardinality labels, any suggestions are welcomed here.

Regards

Dinesh

Murali Krishna Kanagala

unread,

Jun 4, 2020, 3:08:03 PM6/4/20

to Dinesh N, Prometheus Users

This should be taken care by the exporter you are collecting metrics from. If you are writing your own exporter then make validate what labels stay unique all the time. For ex. If you are collecting ngnix request metrics then passing request uuid as a label makes the metrics highly cardinal.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAA6KEskjY1TpjVddTSAAx8AL3aH7Zc7yyNcg0McpOBOfrrow0g%40mail.gmail.com.

Dinesh N

unread,

Jun 4, 2020, 4:16:05 PM6/4/20

to Murali Krishna Kanagala, Prometheus Users

Thanks Murali for the quick response

But how do we analyse it and can we below options

1) metrics_relabel_configs

2) recording rules

Aliaksandr Valialkin

unread,

Jun 5, 2020, 7:36:10 AM6/5/20

to Dinesh N, Murali Krishna Kanagala, Prometheus Users

High cardinality labels may be monitored at /api/v1/status/tsdb page. See https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats for details. Note that this page became available starting from Prometheus v2.14 .

Take a look also at https://github.com/open-fresh/bomb-squad project, which detects high cardinality labels and automatically adds relabeling rules in order to reduce the cardinality.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAA6KEs%3D4Zf29aOdMTXG_0JExSkXKaow3JG6aZSh-X2xb1PSHgA%40mail.gmail.com.

--

Best Regards,

Aliaksandr Valialkin, CTO VictoriaMetrics

Dinesh N

unread,

Jun 5, 2020, 7:55:06 AM6/5/20

to Aliaksandr Valialkin, Murali Krishna Kanagala, Prometheus Users

Hi Aliaksandr,

Thanks for the valuable insights..

I shall take a look at bomb-squad in the meanwhile do you foresee any generic options optimizations using metrics_relabel_configs or by using sample_limit can help to reduce the cardinalities.

Time series -

Currently we have close to 8 million time series for a single block which compacts in an event of every 2 hours

Promethus config -

RAM - 120 GB

CPU - 32 core CPU

Storage - 1 TB

Problem statement -

Once the RSS memory spikes more than 110 GB it crashes and which kind of makes our system very unstable ... Even we can't be increasing the resources more as we already operating with higest config.

Any directions/approaches/mechanism are highly appreciated .

Thanks in anticipation

Dinesh

Aliaksandr Valialkin

unread,

Jun 5, 2020, 8:39:42 AM6/5/20

to Dinesh N, Murali Krishna Kanagala, Prometheus Users

On Fri, Jun 5, 2020 at 2:55 PM Dinesh N <dineshnithy...@gmail.com> wrote:

Hi Aliaksandr,

Thanks for the valuable insights..

I shall take a look at bomb-squad in the meanwhile do you foresee any generic options optimizations using metrics_relabel_configs or by using sample_limit can help to reduce the cardinalities.

`sample_limit` won't help here, since it limits the number of samples that can be scraped from a single target. It doesn't limit the number of unique label=value pairs.

The generic solution is to identify label with the biggest number of unique values via `/api/v1/status/tsdb` page and then remove these labels via `metrics_relabel_configs` using `action: labeldrop`.

Time series -

Currently we have close to 8 million time series for a single block which compacts in an event of every 2 hours

Promethus config -

RAM - 120 GB
CPU - 32 core CPU
Storage - 1 TB

Problem statement -

Once the RSS memory spikes more than 110 GB it crashes and which kind of makes our system very unstable ... Even we can't be increasing the resources more as we already operating with higest config.

Any directions/approaches/mechanism are highly appreciated .

Try increasing scrape_interval for all the metrics. This should reduce RAM usage for Prometheus.

Another option is to try VictoriaMetrics - it should use lower amounts of RAM comparing to Prometheus for this workload.

Dinesh N

unread,

Jun 5, 2020, 8:45:02 AM6/5/20

to Aliaksandr Valialkin, Murali Krishna Kanagala, Prometheus Users

Hi Aliaksandr,

We will potentially consider Victoria Metrics in the future, Between how does increasing scrape interval can reduce RAM usage and any other resource level optimization can are also would be off great help.