Hello,
I am new the Prometheus and had a question about how to use PushGateway properly.
Basically, we have different kinds batch job that get scheduled by our application. These batch jobs are run as separate process by the scheduler framework on different worker hosts, they do bunch of things, and go away within a minute or so. Seems to be a good use case for using PushGateway. However, I am not able to set things up properly to get correct metrics values.
I want to measure the rate for how many times batch jobs are running per minute. Inside batch job, I am using Counter, which gets incremented at the end of batch job with batch job name as label. I using new Registry (non-default one) for Counter and also use "pushadd_to_gateway" Python Prometheus client API to push metrics to gateway at the end. However -
* It seems that PushGateway always overwrites the metrics every time I push ephemeral Counter value from the short-lived batch job process.
* Providing "grouping keys" does not help as grouping keys seem to get used to figure out which metrics get overwritten based on grouping key values.
* Using unique job names or unique values for grouping key won't overwrite the earlier metrics on push gateway, but then this would end up creating separate time series every time I push same metric, which would not be good.
Isn't there an option with Push Gateway, where it would increment earlier pushed metrics every time I push metrics from ephemeral process instead of overwriting? How do I get rate of batch jobs, latencies of batch jobs, etc. with Push Gateway?
Thanks in advance.