Certificate expiration remaining days - exporter for StackDriver

102 views
Skip to first unread message

Ricardo Katz

unread,
Feb 19, 2021, 5:24:37 PM2/19/21
to kubernetes-wg-k8s-infra
Hey folks,

After our last meeting that I've presented the different approaches studied to a certificate expiration alerter, I was reflecting on how to make the best approach (In my Opinion!) work and what I was missing.

tl;dr this exporter was mostly designed to scrape and work Kubernetes discovered metrics, but the good news is that a 'not well documented' flag allows us to scrape and pass any metric from a Prometheus server. (note to myself: open a PR and document this, write about the usage of this thing)

Just to clarify my position, I agree with Justin that having a home developed generic scrapper that works with any cloud provider would be AMAZING and really fun to implement as a medium to long range goal!! But we needed a short path solution and this was bothering me. We can now develop this with much more calm :)

So, ok. After figuring out this, I could reach a simple out of the box deployment that can can work in our cluster and "documented" the deployment here  and basically:

- Prometheus:
* Scrape cert-manager metrics, rename the 'namespace' label to 'k8s_ns' label (will explain that further)
* We then apply a recording rule that turns the metric "certmanager_certificate_expiration_timestamp_seconds" into something alertable, calculating the remaining seconds and dividing by 86400 so we can get the remaining days
* Write the logs in an emptyDir, because we don't need to keep this history once stackdriver sidecar send those metrics to Stackdriver

- Sidecar container from GCP team:
* Read the WAL (Write Ahead Logs) from Prometheus
* Configure Stackdriver exporter to use a specific GCP Project (well, we can use some ENV here)
* Configure Stackdriver exporter to insert GENERIC METADATA (and here is the trick, we're not exporting k8s metrics, but generic metrics).
** Because we need to, somehow 'hardcode' the namespace tag, it conflicts with the 'namespace' tag exported by cert-manager. So this is why we need a different label to represent the namespace from the certificate
* Include only the 'certificate_expire_remaining_days' so we don't get a huge bill because of certificate metrics :)
* Apply a specific configuration from stackdriver that adds metadata on those metrics, because Prometheus Recording rules does not create those metrics with metadata (see issue here)

Everything here is deployed on kube-system because I'm no expert on GCP to know how to make this thing call stackdriver with proper IAM permissions, but I'm sure you will know how to deal with that :P And yeah, we need to put on the deployment some things like dropCapabilities, not run as root, etc etc :)

And that's it. After this we can simply configure stackdriver to alert us whenever some certificate_expire_remaining_days is below a threshold of days, and send this alert to Slack using its own stackdriver alerter (no need to webhooks, etc).

A final consideration: Nowadays cert-manager exporter does not carry labels or annotations, so this way we cannot properly export who is a certificate owner. IMO this is fine, as soon as we receive an alert and may have on the certificate the owner to know who to poke :)

Have a nice weekend you all!


Reply all
Reply to author
Forward
0 new messages