Doing regex and negative matches against label values

8,604 views
Skip to first unread message

Julius Volz

unread,
Mar 21, 2014, 7:49:20 AM3/21/14
to prometheus-developers
Several Prometheus users have been interested in being able to regex-match label values in queries.

For example:

  foo_operations_total{operation=~"create|update",result="success"}
  host_cpu_usage{host=~"foobar.*"}

Similarly, there has been interest in "not-equals" comparisons:

  http_requests_total{status_code!="503"}

I'm working on implementing it now. Roughly, it could work as follows:

- "!=", "=~", and "!~" will become valid parsing tokens in label selector lists

- AST VectorSelector and MatrixSelector nodes will carry a set of generic "selectors" instead of the current clientmodel.LabelSet. These selectors may be of different types: equals, not-equals, regex-match, regex-negative-match.

- These selectors will be used when gathering the needed fingerprints during the query analysis stage. They are passed into a new TieredStorage method analogous to GetFingerprintsForLabelSet, except that this method will also understand the various types of new selectors.

- For the storage to be able to efficiently lookup fingerprints for the new matchers, we'll need a new index. My suggestion is to introduce an index that stores all the known label values for each label name:

  - LabelName -> LabelValue1, LabelValue2, ...

- For equals-comparisons, the storage can still use the existing LabelPair-to-Fingerprints index to get the fastest lookups possible.

- For the new comparison ops, it would work as follows:

  - Look up all values for the label in question in the new labelname->labelvalues index:

    - not-equals: look up values for label name, keep only those which are not equal
    - regex-match: look up values for label name, keep only those which match the regex
    - regex-negative-match: look up values for label name, keep those which don't match the regex

  - After selecting a series of matching label values, do a LabelPair->Fingerprints lookup for each of them (i.e. for label=value1, label=value2, label=value3, etc.).

- In the end, do the intersection of fingerprints between all selectors to arrive at the final fingerprint list, return that to the query layer. From here on: business as usual.

Does this sound ok? If anyone has any better indexing ideas, let me know. This is just the first approach that came to my mind for this purpose.

Cheers,
Julius

Tobias Schmidt

unread,
Mar 21, 2014, 8:00:52 AM3/21/14
to Julius Volz, prometheus-developers
To better understand the impact of that change, can you also briefly outline the steps currently happening during lookups?

I'm excited to get these operations :)


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Julius Volz

unread,
Mar 21, 2014, 10:14:05 AM3/21/14
to Tobias Schmidt, prometheus-developers
On Fri, Mar 21, 2014 at 1:00 PM, Tobias Schmidt <tob...@gmail.com> wrote:
To better understand the impact of that change, can you also briefly outline the steps currently happening during lookups?

Not sure into how much detail I should go (I could give you a code walk), but what is happening now is just a simpler version of what I described: we look up all label=value pairs in a LabelPair->Fingerprints index (a fingerprint is a unique timeseries ID, so we can look up all timeseries that have a given label=value pair). We do that for every labelpair that you metnion in your query, gathering the fingerprints for each. In the end, we take the intersection of the different groups of fingerprints to arrive at the final list of timeseries to include in the query.
Reply all
Reply to author
Forward
0 new messages