remove a label filter for all PromQL queries

Johny

unread,

Apr 5, 2023, 12:42:33 PM4/5/23

to Prometheus Users

There is a performance related issue we're facing in Prometheus coming from a label with a constant value across all (thousands of) time series. The label filter in query causes a large quantity of metadata to load in memory overwhelming Prometheus backends. Without this filter, the queries run reasonably well. We are planning to exclude this label in ingestion in future, but for now we need a workaround.

my_series{global_label="constant-value", l1="..", l2=".."}

Is there a mechanism to automatically exclude global_label in query configuration: remote_read subsection, or elsewhere?

thanks,

Johny

Brian Candler

unread,

Apr 5, 2023, 1:12:11 PM4/5/23

to Prometheus Users

Adding a constant label to every timeseries should have almost zero impact on memory usage.

Can you clarify what you're saying, and how you've come to your diagnosis? What version of prometheus are you running? When you say "backends" in the plural, how have you set this up?

At one point you seem to be saying it's something to do with ingestion, but then you seem to be saying it's something to do with queries ("Without this filter, the queries run reasonably well"). Can you give specific examples of filters which show the difference in behaviour?

Again: the queries

my_series{global_label="constant-value", l1="..", l2=".."}

my_series{l1="..", l2=".."}

should perform almost identically, as they will select the same subset of timeseries.

Brian Candler

unread,

Apr 5, 2023, 1:13:42 PM4/5/23

to Prometheus Users

Also: how many timeseries are you working with, in terms of the "my_series" that you are querying, and globally on the whole system?

Johny

unread,

Apr 5, 2023, 1:50:02 PM4/5/23

to Prometheus Users

The count of time series/metric for a few selected metrics is close to 2 million today. For scalability, we shard the data onto a few Prometheus instances and use remote read from a front end Prometheus to fetch data from the storage units.

The series' are fetched from time series blocks by taking an intersection of series (or postings) across all label filters in query. First, the index postings are scanned for each label filter; second step finds matching series with an implicit AND operator. From my understanding, the low cardinality label present in all series will cause a large portion of index to load in memory (during the first step). We've also observed memory spikes during query processing when the system gets a steady dose of queries. Without including this filter, the memory usage is lower and query returns much faster.

https://www.timescale.com/blog/how-prometheus-querying-works-and-why-you-should-care/#:~:text=Prometheus%20Storage%3A%20Indexing%20Strategies,-Let's%20now%20look&text=The%20postings%20index%20represents%20the,%3D%E2%80%9D%3A9090%E2%80%9D%7D%20.

So, I believe if we exclude the const label in ingestion, we won't have this problem in the long term. Excluding this filter somewhere in the front end will help mitigate this problem.

Brian Candler

unread,

Apr 5, 2023, 3:50:08 PM4/5/23

to Prometheus Users

I wonder if the filtering algorithm is really as simplistic as the Timescale blog implies ("for every label/value pair, first find *every* possible series which matches; then take the intersection of the results")? I don't know, I'll leave others to answer that. If it had some internal stats so that it could start with the labels which match the fewest number of series, I'd expect it to do that; and the TSDB stats in the web interface suggests that it does.

I ask again: what version(s) of Prometheus are you running?

Are you experiencing this with all prometheus components, i.e. a prometheus front-end talking to prometheus back-ends with remote_read?

I think the ideal thing would be to narrow this down to a reproducible test case: either a particular pattern of remote_read queries which is performing badly at the backend, or a particular query sent to the front-end which is being sent to the backend in a suboptimal way (e.g. not including all possible label filters at once).

You said "for now we need a workaround". Is it not sufficient simply to remove {global_label="constant-value"} from your queries? After all, you're already thinking about removing this label at ingestion time, and if you do that, you won't be able to filter on it anyway.

Johny

unread,

Apr 5, 2023, 4:50:39 PM4/5/23

to Prometheus Users

Prometheus version is 2.39.1

There are many users and some legacy clients that add friction to changing queries across the board.

During ingestion, we can make use of relabeling to drop labels automatically.

I am fairly certain this is the root cause for performance degradation in the system, as we're able to reproduce the problem in a load test --- simulating queries with/without the concerning label filter, the latter performing much better with no memory problems.

Johny

unread,

Apr 5, 2023, 4:51:16 PM4/5/23

to Prometheus Users

Also, all the problems are in the DBs (backend prometheus) not front end.

Brian Candler

unread,

Apr 6, 2023, 12:03:38 PM4/6/23

to Prometheus Users

On Wednesday, 5 April 2023 at 21:50:39 UTC+1 Johny wrote:

During ingestion, we can make use of relabeling to drop labels automatically.

Sure. But doesn't that imply that you will have to modify all queries, *not* to filter on the (now missing) label? In which case, why not just modify the queries now?

Ben Kochie

unread,

Apr 6, 2023, 12:19:26 PM4/6/23

to Brian Candler, Prometheus Users

Sorry, I had to catch up on this thread.

The description is correct. The label inverse index works as described. It's one of the down sides to doing an inverted index that allows for arbitrary combinations of labels to be filtered on.

Each metric has an internal identifier, and the label index points to all metrics that contain that. This includes the __name__ index.

Right now, this inverted index is not sharded. IMO it would be useful and a good performance improvement to shard the index by metric name, since you almost always have the __name__ value as the first entrypoint into doing a lookup.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/7e4bc28f-6034-4e99-9177-89b36a8c9b4cn%40googlegroups.com.

Brian Candler

unread,

Apr 6, 2023, 1:10:18 PM4/6/23

to Prometheus Users

Many thanks for the clarification.

Setting aside sharding: what if the label selection for timeseries were to start by processing the label/value pair which returned the lowest number of timeseries, and then work up progressively to those which match larger numbers?

I'm thinking about the sort of optimisation that a SQL database does when it has indexes on foo and bar, and you SELECT ... WHERE foo=X and bar=Y. If it knows from index stats that there are many fewer rows with foo=X than bar=Y, then it will start with the rows matching foo and then check bar for those rows against the other index (or vice versa, of course).

Fundamentally this depends on whether you could determine, cheaply, at least an order-of-magnitude estimate of how many items there are in the inverted index for a given label/value pair. It's also complicated by having !=, =~ and !~ for label matchers (but I would be inclined to treat those as "likely to match many series" and therefore do those after = matching).

Reply all

Reply to author

Forward