Prometheus on HPC Clusters?

442 views

Skip to first unread message

LJ Medina

unread,

Jun 19, 2018, 1:12:23 AM6/19/18

to Prometheus Users

Hello!

I'm testing prometheus on both a "test" Hadoop cluster and one of our "small" HPC clusters ( so far using node exporter). The possibilities have me dreaming. I was wondering, for HPC jobs (parallel batch jobs submitted via a scheduler that may run for hours/days) what would be the best way to gather both cumulative and individual host metrics of the node(s) a job runs on? For example, Job A is running on node 1,2,3 and I want to see the metrics of nodes 1,2,3, for only the lifespan of Job A on those nodes. I'm continuing to read/learn about prometheus to see if the scenario above is possible but I was hoping someone could point me in the right direction. Thanks!

Ben Kochie

unread,

Jun 19, 2018, 3:23:16 AM6/19/18

to jose.me...@gmail.com, Prometheus Users

That depends a bit on the HPC cluster controller, the jobs, and what things about them you want to monitor.

If the jobs run for more than a few minutes, you can easily monitor those applications directly. What you will need in this case is a service discovery method that can update itself automatically when jobs are created and complete.

On Tue, Jun 19, 2018 at 7:12 AM LJ Medina <jose.me...@gmail.com> wrote:

Hello!
I'm testing prometheus on both a "test" Hadoop cluster and one of our "small" HPC clusters ( so far using node exporter). The possibilities have me dreaming. I was wondering, for HPC jobs (parallel batch jobs submitted via a scheduler that may run for hours/days) what would be the best way to gather both cumulative and individual host metrics of the node(s) a job runs on? For example, Job A is running on node 1,2,3 and I want to see the metrics of nodes 1,2,3, for only the lifespan of Job A on those nodes. I'm continuing to read/learn about prometheus to see if the scenario above is possible but I was hoping someone could point me in the right direction. Thanks!

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/93b61abb-5d27-4cfc-9137-76b2c8f9e60d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages