sum(increase(log_message_count{level="error"}[1m])) without
(instance) > 0 or ((log_message_count{level="error"} != 0 unless
log_message_count{level="error"} offset 1m))" or "sum(max_over_time(workflow_action_executions_count{result="ok"}[1m])
or vector(0)) - sum(max_over_time(workflow_action_executions_count{result="ok"}[1m]
offset 1m) or vector(0))", which sorta-kinda work in special situations but break down past those, and are terrifying to look at anyways.
So looking at what python prometheus_client exports, I noticed that my counters (heck, all my metrics) have a "created" timestamp.I thought to myself - hey, this might be used to signal that the metric was reset via process restart, so we should treat it as if it restarted from 0!Except... it doesn't.So if I start a process and very quickly increment a counter beyond zero, increase()/rate()/etc. still thinks that my value has increased by "0". Even worse, I launch 20 processes, all which increase a few metrics by 1 or 2, before the first poll from the prometheus server, then I increase(), then sum(), and my 25 events are now 0!
Searching the help pages gives no solution to this problem, simply "set the field to 0 before the first query", and other Q&As giving bizzare incantations such as "sum(increase(log_message_count{level="error"}[1m])) without (instance) > 0 or ((log_message_count{level="error"} != 0 unless log_message_count{level="error"} offset 1m))" or "sum(max_over_time(workflow_action_executions_count{result="ok"}[1m]) or vector(0)) - sum(max_over_time(workflow_action_executions_count{result="ok"}[1m] offset 1m) or vector(0))", which sorta-kinda work in special situations but break down past those, and are terrifying to look at anyways.Is there some way to configure prometheus to perhaps store this _created field, and if it's changed, assume we re-started from zero? Some config I'm missing?
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/90068849-5d92-4d99-ad16-71739014d98c%40googlegroups.com.
On Tue, Aug 20, 2019 at 8:15 AM Khazhismel Kumykov <kha...@gmail.com> wrote:So looking at what python prometheus_client exports, I noticed that my counters (heck, all my metrics) have a "created" timestamp.I thought to myself - hey, this might be used to signal that the metric was reset via process restart, so we should treat it as if it restarted from 0!Except... it doesn't.So if I start a process and very quickly increment a counter beyond zero, increase()/rate()/etc. still thinks that my value has increased by "0". Even worse, I launch 20 processes, all which increase a few metrics by 1 or 2, before the first poll from the prometheus server, then I increase(), then sum(), and my 25 events are now 0!`increase(q[d])` is calculated as `vLast - vFirst` for each `q` on the time range `d`. `vLast` is the last value on the time range `d`, while `vFirst` is the first value on the time range. When Prometheus scrapes the first sample for new time series, then only `vLast` exists, while `vFirst` doesn't exist. So Prometheus cannot calculate `increase` for the first sample on the new time series and returns 0 assuming it missed the previous scrape and the value didn't change. This is valid assumption, but it leads to invalid calculations for the first sample in time series as in your example. Possible fix is to assume that the time series had zero value on the previous scrape. Such a fix has been recently implemented in VictoriaMetrics. This fix can break if the time series had long gaps because of failed scrapes. In this case it will show too big values for the first samples after each gap. Such invalid values can be filtered out with `clamp_max` as a temporary workaround until better solution appears.
----Searching the help pages gives no solution to this problem, simply "set the field to 0 before the first query", and other Q&As giving bizzare incantations such as "sum(increase(log_message_count{level="error"}[1m])) without (instance) > 0 or ((log_message_count{level="error"} != 0 unless log_message_count{level="error"} offset 1m))" or "sum(max_over_time(workflow_action_executions_count{result="ok"}[1m]) or vector(0)) - sum(max_over_time(workflow_action_executions_count{result="ok"}[1m] offset 1m) or vector(0))", which sorta-kinda work in special situations but break down past those, and are terrifying to look at anyways.Is there some way to configure prometheus to perhaps store this _created field, and if it's changed, assume we re-started from zero? Some config I'm missing?
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/90068849-5d92-4d99-ad16-71739014d98c%40googlegroups.com.
--Best Regards,
Aliaksandr
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAPbKnmCMT6-7ViEatbtBO7KgX4Oa_EQyaJwysNAORKXknzczpQ%40mail.gmail.com.
On Wed, 21 Aug 2019 at 11:11, Aliaksandr Valialkin <val...@gmail.com> wrote:On Tue, Aug 20, 2019 at 8:15 AM Khazhismel Kumykov <kha...@gmail.com> wrote:So looking at what python prometheus_client exports, I noticed that my counters (heck, all my metrics) have a "created" timestamp.I thought to myself - hey, this might be used to signal that the metric was reset via process restart, so we should treat it as if it restarted from 0!Except... it doesn't.So if I start a process and very quickly increment a counter beyond zero, increase()/rate()/etc. still thinks that my value has increased by "0". Even worse, I launch 20 processes, all which increase a few metrics by 1 or 2, before the first poll from the prometheus server, then I increase(), then sum(), and my 25 events are now 0!`increase(q[d])` is calculated as `vLast - vFirst` for each `q` on the time range `d`. `vLast` is the last value on the time range `d`, while `vFirst` is the first value on the time range. When Prometheus scrapes the first sample for new time series, then only `vLast` exists, while `vFirst` doesn't exist. So Prometheus cannot calculate `increase` for the first sample on the new time series and returns 0 assuming it missed the previous scrape and the value didn't change. This is valid assumption, but it leads to invalid calculations for the first sample in time series as in your example. Possible fix is to assume that the time series had zero value on the previous scrape. Such a fix has been recently implemented in VictoriaMetrics. This fix can break if the time series had long gaps because of failed scrapes. In this case it will show too big values for the first samples after each gap. Such invalid values can be filtered out with `clamp_max` as a temporary workaround until better solution appears.This is generally unsafe, we can't tell the difference between a metric that was just created and one that has existed for years but Prometheus only started scraping it now.
Brian----Searching the help pages gives no solution to this problem, simply "set the field to 0 before the first query", and other Q&As giving bizzare incantations such as "sum(increase(log_message_count{level="error"}[1m])) without (instance) > 0 or ((log_message_count{level="error"} != 0 unless log_message_count{level="error"} offset 1m))" or "sum(max_over_time(workflow_action_executions_count{result="ok"}[1m]) or vector(0)) - sum(max_over_time(workflow_action_executions_count{result="ok"}[1m] offset 1m) or vector(0))", which sorta-kinda work in special situations but break down past those, and are terrifying to look at anyways.Is there some way to configure prometheus to perhaps store this _created field, and if it's changed, assume we re-started from zero? Some config I'm missing?
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/90068849-5d92-4d99-ad16-71739014d98c%40googlegroups.com.
--Best Regards,
Aliaksandr
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAPbKnmCMT6-7ViEatbtBO7KgX4Oa_EQyaJwysNAORKXknzczpQ%40mail.gmail.com.
--Brian Brazil
On Wed, Aug 21, 2019 at 1:15 PM Brian Brazil <brian....@robustperception.io> wrote:On Wed, 21 Aug 2019 at 11:11, Aliaksandr Valialkin <val...@gmail.com> wrote:On Tue, Aug 20, 2019 at 8:15 AM Khazhismel Kumykov <kha...@gmail.com> wrote:So looking at what python prometheus_client exports, I noticed that my counters (heck, all my metrics) have a "created" timestamp.I thought to myself - hey, this might be used to signal that the metric was reset via process restart, so we should treat it as if it restarted from 0!Except... it doesn't.So if I start a process and very quickly increment a counter beyond zero, increase()/rate()/etc. still thinks that my value has increased by "0". Even worse, I launch 20 processes, all which increase a few metrics by 1 or 2, before the first poll from the prometheus server, then I increase(), then sum(), and my 25 events are now 0!`increase(q[d])` is calculated as `vLast - vFirst` for each `q` on the time range `d`. `vLast` is the last value on the time range `d`, while `vFirst` is the first value on the time range. When Prometheus scrapes the first sample for new time series, then only `vLast` exists, while `vFirst` doesn't exist. So Prometheus cannot calculate `increase` for the first sample on the new time series and returns 0 assuming it missed the previous scrape and the value didn't change. This is valid assumption, but it leads to invalid calculations for the first sample in time series as in your example. Possible fix is to assume that the time series had zero value on the previous scrape. Such a fix has been recently implemented in VictoriaMetrics. This fix can break if the time series had long gaps because of failed scrapes. In this case it will show too big values for the first samples after each gap. Such invalid values can be filtered out with `clamp_max` as a temporary workaround until better solution appears.This is generally unsafe, we can't tell the difference between a metric that was just created and one that has existed for years but Prometheus only started scraping it now.
On Wed, Aug 21, 2019, 03:28 Aliaksandr Valialkin <val...@gmail.com> wrote:On Wed, Aug 21, 2019 at 1:15 PM Brian Brazil <brian....@robustperception.io> wrote:On Wed, 21 Aug 2019 at 11:11, Aliaksandr Valialkin <val...@gmail.com> wrote:On Tue, Aug 20, 2019 at 8:15 AM Khazhismel Kumykov <kha...@gmail.com> wrote:So looking at what python prometheus_client exports, I noticed that my counters (heck, all my metrics) have a "created" timestamp.I thought to myself - hey, this might be used to signal that the metric was reset via process restart, so we should treat it as if it restarted from 0!Except... it doesn't.So if I start a process and very quickly increment a counter beyond zero, increase()/rate()/etc. still thinks that my value has increased by "0". Even worse, I launch 20 processes, all which increase a few metrics by 1 or 2, before the first poll from the prometheus server, then I increase(), then sum(), and my 25 events are now 0!`increase(q[d])` is calculated as `vLast - vFirst` for each `q` on the time range `d`. `vLast` is the last value on the time range `d`, while `vFirst` is the first value on the time range. When Prometheus scrapes the first sample for new time series, then only `vLast` exists, while `vFirst` doesn't exist. So Prometheus cannot calculate `increase` for the first sample on the new time series and returns 0 assuming it missed the previous scrape and the value didn't change. This is valid assumption, but it leads to invalid calculations for the first sample in time series as in your example. Possible fix is to assume that the time series had zero value on the previous scrape. Such a fix has been recently implemented in VictoriaMetrics. This fix can break if the time series had long gaps because of failed scrapes. In this case it will show too big values for the first samples after each gap. Such invalid values can be filtered out with `clamp_max` as a temporary workaround until better solution appears.This is generally unsafe, we can't tell the difference between a metric that was just created and one that has existed for years but Prometheus only started scraping it now._created + ntp/timestamps/cleverness + interpolation seems like it'd be a "safe" way to solve this. At least, better than data loss, without spikes necessarily. Hence asking it anyone uses this.Are there any plans for using this open metrics in promethous?
So it looks like the best solution would be to skip results if `vFirst` is missing, since both approaches mentioned above have real-life issues.Brian----Searching the help pages gives no solution to this problem, simply "set the field to 0 before the first query", and other Q&As giving bizzare incantations such as "sum(increase(log_message_count{level="error"}[1m])) without (instance) > 0 or ((log_message_count{level="error"} != 0 unless log_message_count{level="error"} offset 1m))" or "sum(max_over_time(workflow_action_executions_count{result="ok"}[1m]) or vector(0)) - sum(max_over_time(workflow_action_executions_count{result="ok"}[1m] offset 1m) or vector(0))", which sorta-kinda work in special situations but break down past those, and are terrifying to look at anyways.Is there some way to configure prometheus to perhaps store this _created field, and if it's changed, assume we re-started from zero? Some config I'm missing?
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/90068849-5d92-4d99-ad16-71739014d98c%40googlegroups.com.
--Best Regards,
Aliaksandr
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAPbKnmCMT6-7ViEatbtBO7KgX4Oa_EQyaJwysNAORKXknzczpQ%40mail.gmail.com.
--Brian Brazil--Best Regards,
Aliaksandr
On Wed, 21 Aug 2019 at 19:00, Khazhismel Kumykov <kha...@gmail.com> wrote:On Wed, Aug 21, 2019, 03:28 Aliaksandr Valialkin <val...@gmail.com> wrote:On Wed, Aug 21, 2019 at 1:15 PM Brian Brazil <brian....@robustperception.io> wrote:On Wed, 21 Aug 2019 at 11:11, Aliaksandr Valialkin <val...@gmail.com> wrote:On Tue, Aug 20, 2019 at 8:15 AM Khazhismel Kumykov <kha...@gmail.com> wrote:So looking at what python prometheus_client exports, I noticed that my counters (heck, all my metrics) have a "created" timestamp.I thought to myself - hey, this might be used to signal that the metric was reset via process restart, so we should treat it as if it restarted from 0!Except... it doesn't.So if I start a process and very quickly increment a counter beyond zero, increase()/rate()/etc. still thinks that my value has increased by "0". Even worse, I launch 20 processes, all which increase a few metrics by 1 or 2, before the first poll from the prometheus server, then I increase(), then sum(), and my 25 events are now 0!`increase(q[d])` is calculated as `vLast - vFirst` for each `q` on the time range `d`. `vLast` is the last value on the time range `d`, while `vFirst` is the first value on the time range. When Prometheus scrapes the first sample for new time series, then only `vLast` exists, while `vFirst` doesn't exist. So Prometheus cannot calculate `increase` for the first sample on the new time series and returns 0 assuming it missed the previous scrape and the value didn't change. This is valid assumption, but it leads to invalid calculations for the first sample in time series as in your example. Possible fix is to assume that the time series had zero value on the previous scrape. Such a fix has been recently implemented in VictoriaMetrics. This fix can break if the time series had long gaps because of failed scrapes. In this case it will show too big values for the first samples after each gap. Such invalid values can be filtered out with `clamp_max` as a temporary workaround until better solution appears.This is generally unsafe, we can't tell the difference between a metric that was just created and one that has existed for years but Prometheus only started scraping it now._created + ntp/timestamps/cleverness + interpolation seems like it'd be a "safe" way to solve this. At least, better than data loss, without spikes necessarily. Hence asking it anyone uses this.Are there any plans for using this open metrics in promethous?Prometheus already supports scraping OpenMetrics. Having rate() use _created is a more complicated question.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLp%2BR%2Bc1nB6rJqHQ%2Bka1aivSHqn80zSF9-DY266kTWB8%2BQ%40mail.gmail.com.