On 28.01.25 09:29, rohit ahuja wrote:
>
> Question 1 - what should be my deletion policy to delete stale metrics from
> pushgateway?
> should it be after my batches are complete for the day? batches runs for
> 2-3 hours
Ideally never. I would set up your batches in a way that the metrics
of each day produce the same metrics. Then you have a fixed set of
metrics that will live on the PGW forever, overwritten each day.
> Question 2 - I want to define an email alert if any of batch fails.
> although spring batch provide this metric but it is not working. So i
> defined a Counter that is available to me in prometheus like this
> --> app_job_status_total{status="FAILED"} 1. Problem is it always gives me
> the same value. Using functions increase() or rate() does not help as well.
> as the value of metric once set is not changing over the evaluated
> interval. Please advice
The Pushgateway is not a distributed counter. If you have separate
metrics for each of your daily batch job, you could just have a gauge
that is 0 or 1 depending on success or failure, and have an alert
watching all those.
If you cannot avoid the "distributed counter" use case, you could try
a statsd setup and funnel the stastd metrics into Prometehus via the
statsd exporter. Or you try out the prom-aggregation-gateway,
https://github.com/zapier/prom-aggregation-gateway
See also
https://github.com/prometheus/pushgateway?tab=readme-ov-file#non-goals
--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email]
bjo...@rabenste.in