"Correct" usage of pushgateway for short running daily jobs

David Leonhartsberger

unread,

Aug 22, 2021, 11:15:01 AM8/22/21

to Prometheus Users

I have red the documentation about prometheus and the pushgateway but I still dont understand how I am supposed to "properly" setup the prometheus stack to get the metrics of my jobs.

Use case:
We have a couple of jobs which run every day for about 30-90 seconds.

So the following instances are run every day:

- JobA for customerX

- JobA for customerY

- JobB for customerX

- JobC for customerZ

So for JobA there are 2 "instances" running every day, each instance needs to collect some data for one of our customers.

It seems the pushgateway is the way to go here as the jobs are not bound to any machine (k8s cronjob scheduled on some node in the pool) and the average runtime is about 1 minute each.

Two things still confuse me a lot:

1. what "job-name" and "instance-id" (if any) should I use when pushing the metrics,

2. when and how should I delete the metrics from the pushgateway

About 1)
For the "job-name" I was going to use "JobA", "JobB" etc

For "instance-id" I was thinking about using "customerX-$timestamp"

About 2)

I am really clueless here.

I guess if I dont use an "instance-id" I can delete all metrics by "job-name" once I dont need the job anymore but this could be years in the future..

If I use an "instance-id", the only way to delete the metrics (without keeping track of what instance-ids where used) is to delete all metrics via the admin api. So it seems here I either need to keep state or delete everything, both doesnt sound right..

I would really appreciate some help here.

Br David

Tristan Colgate

unread,

Aug 22, 2021, 12:44:42 PM8/22/21

to David Leonhartsberger, Prometheus Users

Remember the goal is to have time series in prom, so you shouldn't think in terms of the individual instances/runs. Each run pushes sone info for the jobs to prom their push gateway, prom watches those change over time. You don't need to delete them unless you no longer need data for one of the jobs/customers.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/fb41b909-774c-490f-ae17-f53ba473da73n%40googlegroups.com.

Stuart Clark

unread,

Aug 22, 2021, 2:02:50 PM8/22/21

to David Leonhartsberger, Prometheus Users

It sounds like you just need a label for the customer (X, Y or Z) and the job (A, B or C). You don't want anything about timestamps in the labels.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

David Leonhartsberger

unread,

Aug 22, 2021, 3:36:27 PM8/22/21

to Prometheus Users

First of all, thanks a lot for the answers.

I think I mixed up some names when explaining my use case.

I asked for which "job-name" and "instance-id" to use when pushing the metrics. Here I wasnt talking about the labels which are attached to a metric, instead I was talking about the "group" to use when pushing the metrics.

Example:

HTTP API:
http://pushgateway.example.org:9091/metrics/job/jobA/instance/customerx-2021-01-01

Java API:
String job = "jobA"
String instance = "customerX-2021-01-01"

PushGateway pg = ...
pg.pushAdd(registry, job, instance)

Here I am not sure if I even need to specify an "instance" or if its enough to just use "job". Is it a good idea to also specify an "instance", and if so what should I use here?
And I am also not sure if I really need to delete the metrics from the pushgateway after some time. In my case I only push a handful of metrics every day (so its really little data)

Alerting:

Ideally I want to create alerts that tell me which instance of the job has failed, so that I know for which date I need to re-run the job.

Tristan Colgate

unread,

Aug 22, 2021, 3:57:16 PM8/22/21

to David Leonhartsberger, Prometheus Users

There not really a distinction. Instance, in the normal sense,is not normally used with push gateway. Think of it as one remote , labeled time series, that you can update the values of. Only one process should update one remote time series. One time series should be updated and periodically scraped by prom.

--

You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/3a012bd9-ae2e-41c6-99e2-ca4374bdc356n%40googlegroups.com.

Stuart Clark

unread,

Aug 22, 2021, 4:09:58 PM8/22/21

to David Leonhartsberger, Prometheus Users

You don't want to include any sort of timestamp otherwise you won't be replacing the previous set of results with the new ones.

David Leonhartsberger

unread,

Aug 22, 2021, 4:58:30 PM8/22/21

to Prometheus Users

Ok I understand that u would not include a timestamp.

But would u push the metrics using an instance like this?
http://pushgateway.example.org:9091/metrics/job/jobA/instance/customerx
or without an instance like this?
http://pushgateway.example.org:9091/metrics/job/jobA

Unfortunately I dont understand the reason when and why I would include an instance when pushing to the pushgateway.

Stuart Clark

unread,

Aug 23, 2021, 2:02:49 AM8/23/21

to David Leonhartsberger, Prometheus Users

Yes I would. The path is determining what gets replaced each time you push, which wants to be everything that a particular version of a job produces. So at any one time you will have a set of metrics for the most recent run of each one (with nothing missing and no duplications).

In terms of the metrics themselves I'd recommend ensuring that a push occurs immediately prior to the job finishing, even in the case of some sort of error. I'd also look at adding metrics where the value is a timestamp or a boolean / status code, so you can detect (and graph and alert) if a job fails, hasn't run for a while or is taking too long.

Reply all

Reply to author

Forward