I have an R&D environment where we use Prometheus to monitor about 200 test VMs. At any point, we may turf these VMs and create a new set with new host names.
Instead of changing the Prometheus yaml file each time, we opted to push metrics to a gateway every 15 seconds and have Prometheus scrape the gateway solely. This provides the flexibility to add/remove monitored systems without having to alter anything on the Prometheus configuration. It also makes on-prem/cloud hybrid solutions easy. ie. pushing out of a system is more secure and less painful than allowing something to pull from the box.
However, I'm running into a staleness issue. Let's say one of my 200 VMs spontaneously combusts and no longer pushes metrics to the gateway. The gateway retains the most recent metrics that were pushed and this data gets happily scraped each interval. In chart format it looks like the patient flatlined at what ever the most recent data value was, say 42.
My ultimate goal would be for Prometheus to only scrape up-to-date data and not continue scraping the value of 42 each time.
One work around was to flush the gateway every x minutes or hours and that way we would at least only get a limited amount of stale data. Of course, if Prometheus wants to scrape at that moment it will get an error.
Questions
Is using the push gateway a good way to avoid having to modify the prometheus yaml file every time I need to change the hosts?
If not, what might a good alternative approach be that still keeps a push model?
If yes, is there a way to tell the gateway to discard any metrics that are greater than x seconds old?
Any help is very much appreciated!
Douglas
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.