Several Prometheus users have been interested in being able to regex-match label values in queries.
For example:
foo_operations_total{operation=~"create|update",result="success"}
host_cpu_usage{host=~"foobar.*"}
Similarly, there has been interest in "not-equals" comparisons:
http_requests_total{status_code!="503"}
I'm working on implementing it now. Roughly, it could work as follows:
- "!=", "=~", and "!~" will become valid parsing tokens in label selector lists
- AST VectorSelector and MatrixSelector nodes will carry a set of generic "selectors" instead of the current clientmodel.LabelSet. These selectors may be of different types: equals, not-equals, regex-match, regex-negative-match.
- These selectors will be used when gathering the needed fingerprints during the query analysis stage. They are passed into a new TieredStorage method analogous to GetFingerprintsForLabelSet, except that this method will also understand the various types of new selectors.
- For the storage to be able to efficiently lookup fingerprints for the new matchers, we'll need a new index. My suggestion is to introduce an index that stores all the known label values for each label name:
- LabelName -> LabelValue1, LabelValue2, ...
- For equals-comparisons, the storage can still use the existing LabelPair-to-Fingerprints index to get the fastest lookups possible.
- For the new comparison ops, it would work as follows:
- Look up all values for the label in question in the new labelname->labelvalues index:
- not-equals: look up values for label name, keep only those which are not equal
- regex-match: look up values for label name, keep only those which match the regex
- regex-negative-match: look up values for label name, keep those which don't match the regex
- After selecting a series of matching label values, do a LabelPair->Fingerprints lookup for each of them (i.e. for label=value1, label=value2, label=value3, etc.).
- In the end, do the intersection of fingerprints between all selectors to arrive at the final fingerprint list, return that to the query layer. From here on: business as usual.
Does this sound ok? If anyone has any better indexing ideas, let me know. This is just the first approach that came to my mind for this purpose.
Cheers,
Julius