Calculate cluster uptime percentage

34 views
Skip to first unread message

Shubham Shrivastav

unread,
Jul 19, 2022, 10:55:54 PM7/19/22
to Prometheus Users
Hi all,

I have a two-node cluster that I'm trying to monitor.

We send a custom metric on individual nodes
# HELP platform_uptime_state  Overall platform status is 1 when up, 0 otherwise
# TYPE platform_uptime_state gauge
platform_uptime_state 1


The cluster is expected to be UP when at least one of the nodes has platform_uptime_state set to 1.

I need to calculate the cluster uptime percent since the cluster was started, but I'm not able to formulate a query.



Any help is appreciated!

TIA,
Shubham



Ben Kochie

unread,
Jul 19, 2022, 11:51:58 PM7/19/22
to Shubham Shrivastav, Prometheus Users
avg_over_time(platform_uptime_state[1h]) * 100

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/66466c99-6b1f-4a77-9ba2-b93a58bf6969n%40googlegroups.com.

Shubham Shrivastav

unread,
Jul 20, 2022, 12:12:15 AM7/20/22
to Prometheus Users
Thanks!

But avg_over_time(platform_uptime_state[1h]) * 100  gives me uptime for one node. 

I need to check uptime for the cluster (two nodes). Clustered nodes have the same environment_id label.

I use sum by (environment_id) (platform_uptime_state) to track the number of nodes connected.

I thought I could setup a formula like:
cluster_uptime % = 1 - ( total seconds when both nodes were down / total seconds both nodes were connected (up/down) ) * 100

Is that possible?

Also, I have some custom metrics coming in: https://pastebin.com/sdFcNucA

The cluster is expected to be up when at least one of these nodes is up. 

Reply all
Reply to author
Forward
0 new messages