host rebooted PromQL query

245 views
Skip to first unread message

Deeraj V

unread,
Apr 27, 2022, 9:43:20 AM4/27/22
to Prometheus Users
Hello,

I would like to monitor reboot of the server, when server reboots need to send alert via alertmanager.

Currently I am using following expression 
expr: rate(node_boot_time{job="kubernetes-service-endpoints"} [1h])* on(instance) group_left(nodename) group(node_uname_info{})by(instance,nodename)>0 or

rate(node_boot_time{job="kubernetes-service-endpoints"} [1h])>0

But I can see wrong alerts are keep populating in my mail box.

Could you please provide me expression we can use other than what I had used.

Thanks
DV

Brian Candler

unread,
Apr 28, 2022, 1:31:37 AM4/28/22
to Prometheus Users
I don't know what you mean by "wrong alerts", but I use the following:

      - alert: rebootNode
        expr: node_boot_time_seconds > (node_boot_time_seconds offset 5m + 5)
        labels:
          severity: warning
        annotations:
          summary: 'Device rebooted at {{ $value | humanizeTimestamp }}'


Julius Volz

unread,
Apr 28, 2022, 9:20:00 AM4/28/22
to Brian Candler, Prometheus Users
You could also use something like:

    changes(node_boot_time_seconds[1h]) > 0

...to tell you whether there was a reboot in the last hour. Note that all of these types of alerts in Prometheus will eventually auto-resolve since they alert based on a steadily moving window.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e5dbc0b4-452e-44ec-b1ed-dfb27e73b88bn%40googlegroups.com.


--
Julius Volz
PromLabs - promlabs.com
Reply all
Reply to author
Forward
0 new messages