Windows Metrics Sends same output for All the servers

97 views
Skip to first unread message

Freddy Mack

unread,
Jun 19, 2020, 1:06:44 AM6/19/20
to Prometheus Users
Hello Ergo- I have script to sort the CPU, Filesystem for windows client on Grafana but I see the same CPU and same Filessystem output on all the servers instead of the current output for the current server

Please see the attachements for the clarification.

If we see both servers has the same output
This is CPU script:   100 - avg (irate(windows_cpu_time_total{mode="idle"}[2m])) * 100
This is Filesystem : 100.0 - 100 * ((windows_logical_disk_free_bytes{} / 1024 / 1024 ) / (windows_logical_disk_size_bytes{} / 1024 / 1024))
what needs to be adjusted for these scripts

S1.JPG
S2.JPG

Brian Candler

unread,
Jun 19, 2020, 3:57:55 AM6/19/20
to Prometheus Users
You have configured some sort of dashboard variable (the drop-down menu in the top-left corner), but you have not configured your queries to use it.

For example, if the variable is called "$instance" your queries might need to be:

 100 - avg (irate(windows_cpu_time_total{instance="$instance",mode="idle"}[2m])) * 100

100.0 - 100 * ((windows_logical_disk_free_bytes{instance="$instance"} / 1024 / 1024 ) / (windows_logical_disk_size_bytes{instance="$instance"} / 1024 / 1024))

Can't give a more specific answer without seeing the definition of the variable.

Also you probably want to set a "legend" on your queries.  For Linux node_exporter I use {{ instance}} {{fstype}} {{ mountpoint }} - check the labels of your metrics and you should be able to find something similar for Windows.

image-1.png

 

Freddy Mack

unread,
Jun 19, 2020, 3:33:07 PM6/19/20
to Prometheus Users
Hello Brian,

I have tried with instance and its saying No Data error. For Linux the same queries working fine.  Can we schedule some time to have a google meeting to go over this.?
I am in Chicago ,so Central Time zone.

Brian Candler

unread,
Jun 20, 2020, 4:42:38 AM6/20/20
to Prometheus Users
Sorry but I am fully booked at the moment.  Here are some companies that offer direct prometheus support:

Basically all you need to do is to go into the prometheus console, look at your metrics like "windows_cpu_time_total" and the full set of labels, check they have an "instance" label, and check the Grafana dashboard variable is generating the correct value.

If it's working for Linux but not for Windows it may be because your dashboard variable is querying a metric which only exists for Linux instances.  Try making a dashboard variable like this:

Name: instance
Data source: prometheus
Query: label_values(windows_cpu_time_total, instance)

Another issue sometimes is if your instance variable includes the port number (which will be different between Linux and Windows instances).  It shouldn't be a problem here, but long term, it's a good idea to get rid of the port numbers from the instance labels entirely:


Message has been deleted

Freddy Mack

unread,
Jun 20, 2020, 10:38:05 PM6/20/20
to Prometheus Users
Hello Brian,

Really Appreciate your help.
The Variable settings makes the Trick - Now I am good with seeing the Metrics for their respective Servers.
But on Variable setting if I make the query
Query: label_values(windows_cpu_time_total, instance)
Does this good for all other Metrics like , Server Status,Memory,File System,Disk INODE, Server Network,System Process Monitoring

Also I have this query for Linux Dashboard
label_values(up, instance)
Is this good for all the metrics ?


Can I get the syntax for these Metrics for windows to go in alert.rules file and in Grafana:

        System messages (ex. Errors in /var/log/messages)          Any critical errors reported in log
    Disk INODE utilization                     Warning at 90%, Alarm at 95%
    System process monitoring (ex. Set of services on a server if they are running or not)  Alarm if services are not running.

Brian Candler

unread,
Jun 21, 2020, 3:32:05 AM6/21/20
to Prometheus Users
On Sunday, 21 June 2020 03:38:05 UTC+1, Freddy Mack wrote:
But on Variable setting if I make the query
Query: label_values(windows_cpu_time_total, instance)
Does this good for all other Metrics like , Server Status,Memory,File System,Disk INODE, Server Network,System Process Monitoring


It's only querying the values of the "instance" label which exist on this particular metric, but if this is a representative metric which is present for all Windows machines, it should be OK.

Another option: if all your Windows machines are in the same prometheus scrape job, and that job is called "windows" say, then you could use

     label_values(up{job="windows"}, instance)
 
Also I have this query for Linux Dashboard
label_values(up, instance)
Is this good for all the metrics ?


Yes that's fine too - although it will let you select *all* instances, including ones which aren't Linux machines (e.g. scrape jobs for Windows machines; scrape jobs for blackbox_exporter, snmp_exporter etc)

So if your dashboard shows metrics from node_exporter, and the scrape job is called "node", you could limit it to

label_values(up{job="node"}, instance)

 

Can I get the syntax for these Metrics for windows to go in alert.rules file and in Grafana:

        System messages (ex. Errors in /var/log/messages)          Any critical errors reported in log

You will need an exporter which parses your log files.  Options include mtail and grok_exporter.

Alternatively: store your logs in Loki.  As well as storing/archiving your logs and making them visible in Grafana panels, can use LogQL queries which are similar to PromQL queries.

 
    Disk INODE utilization                     Warning at 90%, Alarm at 95%
    System process monitoring (ex. Set of services on a server if they are running or not)  Alarm if services are not running.


I don't really do Windows - you'll have to look at what metrics the wmi_exporter returns and find the ones which apply.  Look at the exporter documentation, or just use the prometheus GUI to explore all metrics whose names start with "windows_"

Good luck,

Brian.

Freddy Mack

unread,
Jun 23, 2020, 1:03:41 PM6/23/20
to Prometheus Users
Thanks Brian,


1) The New Issue in my Windows and Linux Dashboard  its showing Network Interface (more than ones- as the server has only 1 Network interface ) in both Linux and Windows:
node_network_up{device!="lo"}   - Using this syntax for Linux and Windows (what is for windows)

2) The Filesystem in Windows showing extra Disks as server has 3 Drives but on grafana it shows 4 also how can we show Drive names as well in Windows.

3) SystemD in Linux it is showing DOWN for the services which are Active as well and shows a big list of Services which are not Activated yet but showing DOWN.
node_systemd_unit_state{state="failed" , instance=~"$instance"} 

And What would be the Syntax in Windows for SystemD
      



Brian Candler

unread,
Jun 23, 2020, 2:09:51 PM6/23/20
to Prometheus Users
You haven't shown any of your queries, and I am not inclined to guess.  But "node_network_up" will show interfaces across all devices.  You probably want something like:

node_network_up{instance="$instance",device!="lo"}

As I said, it's up to you to explore the wmi_xxx or windows_xxx metrics, as I don't do Windows.  As far as I know, Windows doesn't have systemd.

As for your problem with systemd metrics on Linux: it's a good idea to explore these in the Prometheus web UI (x.x.x.x:9090) or the Grafana "explore" panel.  If you do, you'll see that the systemd metrics have a separate metric for each state, with value 1 for the current one and 0 for the others. e.g.

node_systemd_unit_state{name="ssh.service",state="activating",type="notify"} 0
node_systemd_unit_state{name="ssh.service",state="active",type="notify"} 1
node_systemd_unit_state{name="ssh.service",state="deactivating",type="notify"} 0
node_systemd_unit_state{name="ssh.service",state="failed",type="notify"} 0
node_systemd_unit_state{name="ssh.service",state="inactive",type="notify"} 0

So your grafana / PromQL query needs to take this into account.  The query "node_systemd_unit_state == 1" will filter this to just the one which represents the current state.
Message has been deleted
Message has been deleted

Freddy Mack

unread,
Jun 23, 2020, 3:58:19 PM6/23/20
to Prometheus Users
Hello Brian,
The Network issue resolved with your syntax.
The SystemD is showing for all the servers instead of current with this syntax:
node_systemd_unit_state{instance=~"$instance"}

z1.JPG

Brian Candler

unread,
Jun 24, 2020, 6:11:40 AM6/24/20
to Prometheus Users
On Tuesday, 23 June 2020 20:58:19 UTC+1, Freddy Mack wrote:
The SystemD is showing for all the servers instead of current with this syntax:
node_systemd_unit_state{instance=~"$instance"}


It should show all systemd units on a single server, with multiple entries for each unit with the different states.

You can trim it to a single state per server like this:

node_systemd_unit_state{instance=~"$instance"} == 1

Freddy Mack

unread,
Jun 24, 2020, 12:43:23 PM6/24/20
to Prometheus Users
Appreciated Brian
Reply all
Reply to author
Forward
0 new messages