Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

yet another pushgateway question

19 views
Skip to first unread message

rohit ahuja

unread,
Jan 28, 2025, 12:50:00 PMJan 28
to Prometheus Developers
Hello,
I have an spring batch (java) application that has ~70 different batch jobs in it, duration batch jobs varies from 10 secs to 5 mins. each batch job runs in separate JVM. spring batch provides metrics via micrometer and I have plugged Prometheus as my metrics vendor. my application sends metrics to pushgateway. 

Question 1 - what should be my deletion policy to delete stale metrics from pushgateway?
should it be after my batches are complete for the day? batches runs for 2-3 hours

Question 2 - I want to define an email alert if any of batch fails. although spring batch provide this metric but it is not working. So i defined a Counter that is available to me in prometheus like this --> app_job_status_total{status="FAILED"} 1. Problem is it always gives me the same value. Using functions increase() or rate() does not help as well. as the value of metric once set is not changing over the evaluated interval. Please advice

Bjoern Rabenstein

unread,
Jan 28, 2025, 1:24:28 PMJan 28
to rohit ahuja, Prometheus Developers
On 28.01.25 09:29, rohit ahuja wrote:
>
> Question 1 - what should be my deletion policy to delete stale metrics from
> pushgateway?
> should it be after my batches are complete for the day? batches runs for
> 2-3 hours

Ideally never. I would set up your batches in a way that the metrics
of each day produce the same metrics. Then you have a fixed set of
metrics that will live on the PGW forever, overwritten each day.

> Question 2 - I want to define an email alert if any of batch fails.
> although spring batch provide this metric but it is not working. So i
> defined a Counter that is available to me in prometheus like this
> --> app_job_status_total{status="FAILED"} 1. Problem is it always gives me
> the same value. Using functions increase() or rate() does not help as well.
> as the value of metric once set is not changing over the evaluated
> interval. Please advice

The Pushgateway is not a distributed counter. If you have separate
metrics for each of your daily batch job, you could just have a gauge
that is 0 or 1 depending on success or failure, and have an alert
watching all those.

If you cannot avoid the "distributed counter" use case, you could try
a statsd setup and funnel the stastd metrics into Prometehus via the
statsd exporter. Or you try out the prom-aggregation-gateway,
https://github.com/zapier/prom-aggregation-gateway

See also https://github.com/prometheus/pushgateway?tab=readme-ov-file#non-goals

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in
Reply all
Reply to author
Forward
0 new messages