Hi Brian,This has been mentioned to me before, and I'm not sure I fully understand. If I'm running a batch job or some processes are pushing metrics to the push gateway, is there something fundamentally wrong about that? If thousands of batch jobs suddenly push metrics to the push gateway, we still need some sort of mechanism to delete stuff from it's memory, is that not correct?
In addition, I noticed in the past that when the push gateway had a lot of metrics in memory, the Prometheus scrape took a long time and caused the CPU usage on the Prometheus process to climb.
On Tuesday, 23 January 2018 23:10:41 UTC, Brian Brazil wrote:On 23 January 2018 at 22:32, Khusro Jaleel <kerne...@gmail.com> wrote:Hi, I know that the push gateway accumulates metrics forever unless you clear them, either by restarting it, or using an API call.We have several pods that send metrics to our push gateway periodically, but what would be the best mechanism for "clearing" it out on Kubernetes, where it's running as it's own pod?We could give it a small amount of memory, and when it runs out, Kubernetes would automatically restart it, but this might mean that we will lose some metrics (if they have not yet been scraped).I could create a Kubernetes cronjob that somehow calls the API and clears it out, but again, how will I know I'm not clearing something that's not been scraped yet, and might be needed still?It sounds like you're trying to see the pushgateway for something it's not intended for, see https://prometheus.io/docs/practices/pushing/--Brian Brazil
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a823b5b4-f77f-408b-a11c-267e518050e9%40googlegroups.com.
On 23 January 2018 at 23:19, khusro.jaleel via Prometheus Users <promethe...@googlegroups.com> wrote:Hi Brian,This has been mentioned to me before, and I'm not sure I fully understand. If I'm running a batch job or some processes are pushing metrics to the push gateway, is there something fundamentally wrong about that? If thousands of batch jobs suddenly push metrics to the push gateway, we still need some sort of mechanism to delete stuff from it's memory, is that not correct?The only reason to remove data from the pushgateway is if the batch job is never expected to run again, so deleting the data is one more step in your turndown docs.Beyond that you want the pushed data to stay there forever.Why do you have thousands of service-level batch jobs?Brian--In addition, I noticed in the past that when the push gateway had a lot of metrics in memory, the Prometheus scrape took a long time and caused the CPU usage on the Prometheus process to climb.
On Tuesday, 23 January 2018 23:10:41 UTC, Brian Brazil wrote:On 23 January 2018 at 22:32, Khusro Jaleel <kerne...@gmail.com> wrote:Hi, I know that the push gateway accumulates metrics forever unless you clear them, either by restarting it, or using an API call.We have several pods that send metrics to our push gateway periodically, but what would be the best mechanism for "clearing" it out on Kubernetes, where it's running as it's own pod?We could give it a small amount of memory, and when it runs out, Kubernetes would automatically restart it, but this might mean that we will lose some metrics (if they have not yet been scraped).I could create a Kubernetes cronjob that somehow calls the API and clears it out, but again, how will I know I'm not clearing something that's not been scraped yet, and might be needed still?It sounds like you're trying to see the pushgateway for something it's not intended for, see https://prometheus.io/docs/practices/pushing/--Brian Brazil
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a823b5b4-f77f-408b-a11c-267e518050e9%40googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/OvlqmzQSO2E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLqo-k1BnjNCZmu4Go05eR3QPv_MMs-s8cUkSRVPFj8CRQ%40mail.gmail.com.
We don’t actually have thousands of batch jobs, I was just using that as an example.What we would like to do, is to capture metrics from our micro services running in Kubernetes right before they terminate. These are last gasp metrics so to speak. These processes will disappear and not come back, however another pod may take their place of course.
So that’s the only use case for the push gateway for us. What we found in the past, however, is that if we never deleted the metrics from the push gateway or flushed it somehow, it would adversely affect the Prometheus process that was scraping it, resulting in higher and higher scrape times, CPU usage and timeouts. That’s why we need to keep clearing it, but only after we are sure that those metrics have been scraped.
On Tue, 23 Jan 2018 at 23:25, Brian Brazil <brian.brazil@robustperception.io> wrote:
On 23 January 2018 at 23:19, khusro.jaleel via Prometheus Users <prometheus-users@googlegroups.com> wrote:Hi Brian,This has been mentioned to me before, and I'm not sure I fully understand. If I'm running a batch job or some processes are pushing metrics to the push gateway, is there something fundamentally wrong about that? If thousands of batch jobs suddenly push metrics to the push gateway, we still need some sort of mechanism to delete stuff from it's memory, is that not correct?The only reason to remove data from the pushgateway is if the batch job is never expected to run again, so deleting the data is one more step in your turndown docs.Beyond that you want the pushed data to stay there forever.Why do you have thousands of service-level batch jobs?Brian--In addition, I noticed in the past that when the push gateway had a lot of metrics in memory, the Prometheus scrape took a long time and caused the CPU usage on the Prometheus process to climb.
On Tuesday, 23 January 2018 23:10:41 UTC, Brian Brazil wrote:On 23 January 2018 at 22:32, Khusro Jaleel <kerne...@gmail.com> wrote:Hi, I know that the push gateway accumulates metrics forever unless you clear them, either by restarting it, or using an API call.We have several pods that send metrics to our push gateway periodically, but what would be the best mechanism for "clearing" it out on Kubernetes, where it's running as it's own pod?We could give it a small amount of memory, and when it runs out, Kubernetes would automatically restart it, but this might mean that we will lose some metrics (if they have not yet been scraped).I could create a Kubernetes cronjob that somehow calls the API and clears it out, but again, how will I know I'm not clearing something that's not been scraped yet, and might be needed still?It sounds like you're trying to see the pushgateway for something it's not intended for, see https://prometheus.io/docs/practices/pushing/--Brian Brazil
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a823b5b4-f77f-408b-a11c-267e518050e9%40googlegroups.com.--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/OvlqmzQSO2E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLqo-k1BnjNCZmu4Go05eR3QPv_MMs-s8cUkSRVPFj8CRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
On 23 January 2018 at 23:40, Khusro Jaleel <kerne...@gmail.com> wrote:We don’t actually have thousands of batch jobs, I was just using that as an example.What we would like to do, is to capture metrics from our micro services running in Kubernetes right before they terminate. These are last gasp metrics so to speak. These processes will disappear and not come back, however another pod may take their place of course.That's not what the pushgateway is meant for, nor is it required. Scrape them normally, and rate() will do the right thing.Brian
So that’s the only use case for the push gateway for us. What we found in the past, however, is that if we never deleted the metrics from the push gateway or flushed it somehow, it would adversely affect the Prometheus process that was scraping it, resulting in higher and higher scrape times, CPU usage and timeouts. That’s why we need to keep clearing it, but only after we are sure that those metrics have been scraped.
On Tue, 23 Jan 2018 at 23:25, Brian Brazil <brian....@robustperception.io> wrote:
On 23 January 2018 at 23:19, khusro.jaleel via Prometheus Users <promethe...@googlegroups.com> wrote:Hi Brian,This has been mentioned to me before, and I'm not sure I fully understand. If I'm running a batch job or some processes are pushing metrics to the push gateway, is there something fundamentally wrong about that? If thousands of batch jobs suddenly push metrics to the push gateway, we still need some sort of mechanism to delete stuff from it's memory, is that not correct?The only reason to remove data from the pushgateway is if the batch job is never expected to run again, so deleting the data is one more step in your turndown docs.Beyond that you want the pushed data to stay there forever.Why do you have thousands of service-level batch jobs?Brian--In addition, I noticed in the past that when the push gateway had a lot of metrics in memory, the Prometheus scrape took a long time and caused the CPU usage on the Prometheus process to climb.
On Tuesday, 23 January 2018 23:10:41 UTC, Brian Brazil wrote:On 23 January 2018 at 22:32, Khusro Jaleel <kerne...@gmail.com> wrote:Hi, I know that the push gateway accumulates metrics forever unless you clear them, either by restarting it, or using an API call.We have several pods that send metrics to our push gateway periodically, but what would be the best mechanism for "clearing" it out on Kubernetes, where it's running as it's own pod?We could give it a small amount of memory, and when it runs out, Kubernetes would automatically restart it, but this might mean that we will lose some metrics (if they have not yet been scraped).I could create a Kubernetes cronjob that somehow calls the API and clears it out, but again, how will I know I'm not clearing something that's not been scraped yet, and might be needed still?It sounds like you're trying to see the pushgateway for something it's not intended for, see https://prometheus.io/docs/practices/pushing/--Brian Brazil
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a823b5b4-f77f-408b-a11c-267e518050e9%40googlegroups.com.--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/OvlqmzQSO2E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLqo-k1BnjNCZmu4Go05eR3QPv_MMs-s8cUkSRVPFj8CRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
The Prometheus Pushgateway exists to allow ephemeral and batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long enough to be scraped, they can instead push their metrics to a Pushgateway. The Pushgateway then exposes these metrics to Prometheus.
Thank you for your response. I am quite new to Prometheus so still trying to grasp my head around this. Could you read the following and give some feedback please? I would really appreciate it.Let's assume that we have a web app that runs asynchronous tasks in the background. There is a long-living worker running in the background that schedules and runs assigned tasks (python functions).Suppose that we would like to collect metrics on those background tasks. Since we are dealing with python functions, the prometheus client library would allow us to send metrics to an endpoint, not the push gateway, even though the "tasks (python functions)" are short-lived. Would this eliminate the need for a push gateway?
"Since these kinds of jobs may not exist long enough to be scraped, they can instead push their metrics to a Pushgateway"Now, referencing the quote above, the jobs may not exist long enough to be scraped. Does this mean that when Prometheus scrapes an endpoint, the endpoint won't contain older metrics (for ex. 5 minutes ago), so for a short-lived job, a metric sent to this endpoint in plain text won't even be picked up by prometheus at all? If that is the case, then I can see why we would need the pushgateway as a "metrics cache."