[ANN] pint - Prometheus rule linter

500 views
Skip to first unread message

l.mi...@gmail.com

unread,
Apr 26, 2021, 8:24:12 AM4/26/21
to Prometheus Users
Hi,

https://github.com/cloudflare/pint is a small tool we use at Cloudflare to try to better manage our ever growing collection of recording and alerting rules.
The main motivation for it was to help with pull requests that are adding or editing rule files where we often would need to check:
* how many time series would a new recording rule add
* how many times a new alert will trigger based on historical metrics
* are all time series used in a rule present in our Prometheus instances (we have a non-trivial topology)
And that's on top of simple conventions we have, for example each alert should have a set of well known labels and annotations, like severity or a link to a Grafana dashboard and a runbook. But even those conventions, while simple themselves, only apply to "production" alerts, rather than "test" alerts that are present in config, but not yet paging anyone.

While the code is fairly fresh it's been used internally for a while with good results, so I hope this will be useful for others.

Julius Volz

unread,
Apr 26, 2021, 1:59:37 PM4/26/21
to l.mi...@gmail.com, Prometheus Users
Oh, interesting!

I was always thinking of building something along those lines, but purely live-linting rules loaded into a Prometheus server against the actual data that server (which you are also partially doing already).

It was going to output warnings:

- ...for any referenced metric name that isn't currently known to the Prometheus server
- ...for any label name on a metric name that isn't known
- ...for any common query mistakes, like rate() on a gauge, deriv() on counters, aggregating away the "le" label, etc.

...and potentially give an idea about which rules load how many time series in their current state.

Any of those could generate false positives, so it could output warnings at max, but could still be very helpful.

It seems like your tool already does most of that and more, but the common query gotchas one might be useful at some point too :)

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/895de3f8-35c0-4fee-9807-9225eb1aa330n%40googlegroups.com.


--
Julius Volz
PromLabs - promlabs.com

l.mi...@gmail.com

unread,
Apr 26, 2021, 2:54:10 PM4/26/21
to Prometheus Users
We currently use it to aid with reviewing PRs, so I took the approach of fewer insight but also fewer false positives. Especially that (for example) label checks on alert rules are suppose to help new employees write correct rules without having to run in past other people.

The next biggish feature I plan is to turn pint into an exporter - run it as a sidecar along each Prometheus we run and report any missing series (used in alerts but not present in Prometheus) via  metrics, so we can alert if (for example) we upgrade node-exporter to a version that renamed a bunch of metrics we rely on, and we stop getting some alerts (a pet peeve of mine).

In general Prometheus configuration and workflow is fairly lax - empty query results are either a bug or not depending on deeper context and so on. We want to it be more strict, so we have more confidence that it all work together. Pint aims to give us that, plus some other feedback, like raise early warning when someone adds a recording rule that would generate a ton of new series, eating memory as a result.

Evelyn Pereira Souza

unread,
Apr 27, 2021, 10:53:46 AM4/27/21
to promethe...@googlegroups.com
Hi

thanks. How to install that? Is there a brew formula? There is nothing
about it in README.md and also no releases.

kind regards
Evelyn
OpenPGP_0x61776FA8E38403FB.asc
OpenPGP_signature

l.mi...@gmail.com

unread,
Apr 29, 2021, 9:30:21 AM4/29/21
to Prometheus Users
I've added some extra docs on that.

l.mi...@gmail.com

unread,
Mar 2, 2022, 10:18:11 AM3/2/22
to Prometheus Users
You might find useful that https://github.com/cloudflare/pint got recently a "watch" mode, where it runs as a daemon and continuously runs checks against it.
This is still under development but the goal is to have a way of alerting when alerts get broken, for example because someone renames metrics they are trying to use and queries won't be able to return anything any more.
Reply all
Reply to author
Forward
0 new messages