So during a single day I might have a job "foobar" which is executed every hour. Thus every hour I will send for example a metric "spark_job_duratop{job_name="foobar", uuid="123"} where the uuid will change every hour (in addition I'm sending JVM metrics and spark specific metrics which all share these labels). Once the job is done no another job will ever send metrics with the same label combinations.
It seems that the pushgateway doesn't expire the sent metrics in any way, so my push gateway will fill up different metrics until there are so much of them that Prometheus can't even get all of them before the http request timeouts.
I've searched github issues and this email list and there have been a few mentions about "ttl for pushgateway" but the use case has been a bit different.
I'm wondering am I doing something wrong, or is there just a simple toggle how to clear old inactive metrics from the pushgateway?
Thanks.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/14501e21-2130-4037-ac4f-2afa3a012e97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
What MR said.
To add 2¢: Your scenario now really seems like one where you either
want to change things to fit the pull-based Prometheus collection
model, or you want to switch to a push-based monitoring
system. Turning Prometheus into a push-based monitoring system is
going to hit you with most of the combined problems that either
approach has, while it will give you very little of the
benefits. That's why the Prometheus developers don't recommend it and
why we are reluctant to add features that will mostly serve that
discouraged use case.