Monitoring GPU utilisation with Prometheus + Grafana?

1,931 views
Skip to first unread message

jake.l....@gmail.com

unread,
Sep 14, 2016, 7:56:36 PM9/14/16
to Prometheus Developers
Hi there!

I've got a large array of nVidia K80 GPU's in a HPC facility that I'd like to know a lot more about my fine grained utilisation of. I've never seen a means to monitor GPU utilisation by anything other than nvidia's SMI utilities. Like so:

[root@somewhere me]# nvidia-smi
Thu Sep 15 09:55:54 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.99 Driver Version: 352.99 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:08:00.0 Off | 0 |
| N/A 56C P8 29W / 149W | 55MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 0000:09:00.0 Off | 0 |
| N/A 52C P0 73W / 149W | 158MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40c Off | 0000:86:00.0 Off | 0 |
| 26% 51C P0 67W / 235W | 98MiB / 11519MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 2787 C /usr/local/svi/bin/huygenspro.bin 101MiB |
| 2 2787 C /usr/local/svi/bin/huygenspro.bin 73MiB |
+-----------------------------------------------------------------------------+


How might one go about using Prometheus to monitor GPU core utilisation?

Thanks!

-jc

Jeffrey Ollie

unread,
Sep 14, 2016, 8:54:03 PM9/14/16
to jake.l....@gmail.com, Prometheus Developers
You'll need to write a custom exporter. It looks like the nvidia-smi command has a switch to export data as XML, so it shouldn't be too terribly hard to massage that into something that Prometheus can consume.


--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Jeff Ollie
The majestik møøse is one of the mäni interesting furry animals in Sweden.

Tobias Schmidt

unread,
Sep 14, 2016, 8:56:59 PM9/14/16
to Jeffrey Ollie, jake.l....@gmail.com, Prometheus Developers
As these are machine related metrics, you could also use the textfile collector and periodically update a textfile in the node_exporter's textfile directory.


To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Jeff Ollie
The majestik møøse is one of the mäni interesting furry animals in Sweden.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages