How to optimize High cardinality labels in Prometheus

Dinesh N

unread,

May 18, 2020, 4:55:01 PM5/18/20

to Prometheus Users

Hi Team,

I have been using Thanos-Prometheus stack and running into high cardinality issues where CPU goes till ~80% and then goes down and this happens when firing high cardinality queries which results in "Http superfluous" exception and then promethus instance goes down.

We are trying following things as listed below -

1) We are running only with 2 instances of Prometheus on top of Thanos querier and need guidance where can we increase more to handle huge queries

2) Any front end cache like cortex cache can help here for high cardinality queries ?

3) Looking for any optimal linux parameters like hugepages which would suffice high cardinality issues

RCA So far, I have observed was CPU was clocking till ~80% and prometheus server was doing down and I also see lot of memory residing at cache memory

.

Even with above options we are not sure whether we are looking things at right direction hence need Your pair of eyes and pointers would be greatly appreciated here Brain.

Thanks and Regards

Dinesh

Dinesh Nithyanandam

unread,

May 19, 2020, 10:29:26 AM5/19/20

to Prometheus Users

Can someone please help here

Stuart Clark

unread,

May 19, 2020, 12:35:08 PM5/19/20

to Dinesh Nithyanandam, Prometheus Users

If you are doing large queries which touch a lot of timeseries you will need lots of memory and CPU.

Ideally you would minimise such queries, or use pre-aggregated metrics (created with recording rules) to simplify what is being requested.

I'd suggest looking at what you are try to achieve. Are you looking over a long period? Could you aggregate?

Scaling horizontally will quite likely not help if you are just asking Prometheus to process huge amounts of data.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply all

Reply to author

Forward