We may be running into a bug with metrics relabeling, but I wanted to post here in case anyone has an idea of what my team and I are doing wrong here with this relabeling.
Background: We have a metric named "order_total" that has a very high cardinality label "store_number" slowing down our queries and causing massive resource usage that we need to drop during/before ingestion.
We have tried multiple methods and variations to do this, (1) by the labeldrop action-
Everything I'm seeing online and in the
Prometheus documentation says just to add the following to the scrape config target/job and it should work.
And (2) by relabeling this to a single value for all ("0" in our case), just to remove the high cardinality aspect.
1: (green line in screenshot)
metric_relabel_configs:
- action: labeldrop
regex: store_number
2: (blue line in screenshot {store_number="0"})
metric_relabel_configs:
- source_labels: [store_number]
separator: ;
regex: (.*)
target_label: store_number
replacement: "0"
But as you can see, the numbers aren't nearly close to what we are expecting. This is during a simulated load test that is sending a constant rate of orders, hence the reason why the counts are at a constant increase. I am expecting to see both blue and green lines at exponentially higher values than the unique store_number labels. If this isn't a bug, does anyone have a clue what we are doing wrong? Any replies are greatly appreciated.