--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f8560b07-9b00-4dfc-9671-667368ddd530%40googlegroups.com.
You can use something like `avg_over_time(node_processes_state{state='D'}[10m])` to smooth over missed scrapes. Depending on how sensitive you want this to be, you can also do `max_over_time()`.
On Wed, Jun 3, 2020 at 9:49 AM 林浩 <haow...@gmail.com> wrote:
--We use node export to monitor os D state process, when D state process number > 500, it triggers pager alert rule like these- alert: Node_Process_In_D_State_Count_Criticalexpr: node_processes_state{state='D'} > 500for: 10mbut the problem is when OS running into problem status (too much D state process), looks like node export agent also running in problem status, it can NOT report correct D state process metric to Prometheus server.from the below screenshot, we can see some data points missing. This causes alert flapping, when data missing, the alert gets resolved.is any way to avoid alert auto resolved when some data points missed?
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9c94720c-7347-448f-8790-26f745739e9e%40googlegroups.com.
OK - then as Ben says, use avg_over_time or max_over_time.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/bf0545a1-e7a7-44b6-80a5-6000c6d521dc%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAEyhnpJNRbApW%3DFytNcGq88NnexWnznk6zLVr-gP7GbABjS3ow%40mail.gmail.com.
it would be helpful to have timer based resolved similar to firing.
Ok, thanks for the clarification. It is a necessary feature rather than a good-to-have feature.As prometheus usage is getting to multiple industries, would it be possible to consider this https://github.com/prometheus/alertmanager/issues/204 under prometheus.
The approach you mentioned doesn't scale well in large enterprise environments.On Mon, Jun 8, 2020 at 12:06 PM Brian Candler <b.ca...@pobox.com> wrote:On 08/06/2020 07:31, Rajesh Reddy Nachireddi wrote:
> Thanks Brian. Do we have this issue open under prometheus and with
> examples working .
No. Brian Brazil considers this feature unnecessary.