[slurm-users] Slurm node history / log ?

2,299 views
Skip to first unread message

Bill Benedetto

unread,
Jul 5, 2023, 1:21:22 PM7/5/23
to slurm...@lists.schedmd.com

Good day.

 

Is there some command that I can use in Slurm to see a node’s history?

 

Not the job history, but the state history.

 

Something like:

Jul  5 13:11:01 node01 taken offline by slurmctld because node01 not responding

 

And/Or:

Jul  5 13:11:01 node01 taken offline by USER1 state=DRAIN reason=”System acting up, going to reboot”

 

And/Or:

Jul  5 13:11:01 node01 online by USER1

 

My goal/idea is to see if a node has been having problems according to Slurm itself.

Or if someone DOWNed a node for some reason.

Or to see if a node was down and just returned to service recently.

 

Does anything like that already exist in Slurm?

 

Thanks!

 

- Bill

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

Bill Benedetto     bbene...@goodyear.com    The Goodyear Tire & Rubber Co.

I don't speak for Goodyear and they don't speak for me.  We're both happy.

Roberto Monti

unread,
Jul 5, 2023, 1:28:12 PM7/5/23
to Slurm User Community List

Hi Bill,

Your best bet is probably /var/log/slurmctld on the server that is acting as active controller.

 

Best,

 

--

Roberto P. Monti

DevOps Engineer I

robert...@jax.org

 

The Jackson Laboratory

United States | China | Japan

www.jax.org

---

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

Ford, Steven

unread,
Jul 5, 2023, 1:57:02 PM7/5/23
to Slurm User Community List

Hi Bill,

 

I think the command you’re looking for is `sacctmgr show event`.

 

Best,

Steve

 

From: slurm-users <slurm-use...@lists.schedmd.com> On Behalf Of Bill Benedetto

Reply all
Reply to author
Forward
0 new messages