[slurm-users] I just had a "conversation" with ChatGPT about working DMTCP, OpenMPI and SLURM. Here are the results

Analabha Roy

unread,

Feb 10, 2023, 2:07:40 PM2/10/23

to slurm...@lists.schedmd.com

Hi,

I'm having some complex issues coordinating OpenMPI, SLURM, and DMTCP in my cluster. On a whim, I logged into ChatGPT and asked the AI about it.

It told me things that I couldn't find in the current version of the SLURM docs (I looked). Since ChatGPT is not always reliable, I reproduce the

contents of my chat session in my GitHub repository for peer review and commentary by you fine folks.

https://github.com/hariseldon99/buparamshavak/blob/main/chatgpt.md

I apologize for the poor formatting. I did this in a hurry, and my knowledge of markdown is rudimentary.

Please do comment on the veracity and reliability of the AI's response.

AR

--

Analabha Roy

Assistant Professor

Department of Physics

The University of Burdwan

Golapbag Campus, Barddhaman 713104

West Bengal, India

Emails: dan...@utexas.edu, ar...@phys.buruniv.ac.in, harise...@gmail.com

Webpage: http://www.ph.utexas.edu/~daneel/

Diego Zuccato

unread,

Feb 13, 2023, 2:32:19 AM2/13/23

to slurm...@lists.schedmd.com

Hi.

I'm no expert, but it seems ChatGPT is confusing "queued" and "running"
jobs. Assuming you are interested in temporarily shutting down slurmctld
node for maintenance.

If the jobs are still queued ( == not yet running) what do you need to
save? The queue order is dynamically adjusted by slurmctld based on the
selected factors, there's nothing special to save.
For the running jobs, OTOH, you have multiple solutions:
1) drain the cluster: safest but often impractical
2) checkpoint: seems fragile, expecially if jobs span multiple nodes
3) have a second slurmd node (a small VM is sufficient) that takes over
the cluster management when the master node is down (be *sure* the state
dir is shared and quite fast!)
4) just hope you'll be able to recover the slurmctld node before a job
completes *and* the timeouts expire

While 4 is relatively risky (you could end up with runaway jobs that
you'll have to fix afterwards), it does not directly impact users: their
jobs will run and complete/fail regardless of slurmctld state. At most
the users won't receive a completion mail and they will be billed less
than expected.

Diego

Il 10/02/2023 20:06, Analabha Roy ha scritto:
> Hi,
>
> I'm having some complex issues coordinating OpenMPI, SLURM, and DMTCP in
> my cluster. On a whim, I logged into ChatGPT and asked the AI about it.
> It told me things that I couldn't find in the current version of the
> SLURM docs (I looked). Since ChatGPT is not always reliable, I reproduce
> the
> contents of my chat session in my GitHub repository for peer review and
> commentary by you fine folks.
>
> https://github.com/hariseldon99/buparamshavak/blob/main/chatgpt.md
> <https://github.com/hariseldon99/buparamshavak/blob/main/chatgpt.md>
>
> I apologize for the poor formatting. I did this in a hurry, and my
> knowledge of markdown is rudimentary.
>
> Please do comment on the veracity and reliability of the AI's response.
>
> AR
>
> --
> Analabha Roy
> Assistant Professor
> Department of Physics

> <http://www.buruniv.ac.in/academics/department/physics>
> The University of Burdwan <http://www.buruniv.ac.in/>

> Golapbag Campus, Barddhaman 713104
> West Bengal, India

> Emails: dan...@utexas.edu <mailto:dan...@utexas.edu>,
> ar...@phys.buruniv.ac.in <mailto:ar...@phys.buruniv.ac.in>,
> harise...@gmail.com <mailto:harise...@gmail.com>
> Webpage: http://www.ph.utexas.edu/~daneel/
> <http://www.ph.utexas.edu/~daneel/>

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Analabha Roy

unread,

Feb 18, 2023, 1:40:59 PM2/18/23

to Slurm User Community List

Hi,

On Mon, 13 Feb 2023, 13:04 Diego Zuccato, <diego....@unibo.it> wrote:

Hi.

I'm no expert, but it seems ChatGPT is confusing "queued" and "running"
jobs.

That's what I also suspected.

Assuming you are interested in temporarily shutting down slurmctld
node for maintenance.

Temporarily and daily.

If the jobs are still queued ( == not yet running) what do you need to
save? The queue order is dynamically adjusted by slurmctld based on the
selected factors, there's nothing special to save.
For the running jobs, OTOH, you have multiple solutions:
1) drain the cluster: safest but often impractical
2) checkpoint: seems fragile, expecially if jobs span multiple nodes

I just have one node, but the bigger problem with check pointing is that GPUs don't seem to be supported.

3) have a second slurmd node (a small VM is sufficient) that takes over
the cluster management when the master node is down (be *sure* the state
dir is shared and quite fast!)

I've just got that one "node" for compute and login and storage and everything.

It's a Tyrone server with 64 cores and a couplea raided hdds. Just wanna run some DFT/QM/MM simulations for myself and departmental colleagues, and do some exact diagonalization problems.

4) just hope you'll be able to recover the slurmctld node before a job
completes *and* the timeouts expire

I booted into gparted live and beefed up the swap space to 200 gigs (the ram is 93 G). I've setup a mandatory (through qos settings) Slurm reservation that kills all running jobs in the normal qos after 8:30 pm everyday and a cron job that starts @ 835 pm, drains the partitions, suspends all jobs running on elevated qos privileges, then hibernates the whole sumbich to swap. Another script runs whenever the fella comes outta hibernation, resets the slurm partitions and resumes the suspended jobs.

Its an ugly jugaad, I know.

I guess it's tough noogies for the normal qos people if their jobs ran past the reservation or were not properly checkpointed before a blackout, but I don't see any other alternative.

My department refuses to let me run my thingie 24/7, and power outages occur frequently round here.

I'm concerned about implementing a failsafe in case this Rube Goldberg like setup takes a hard left.

Was thinking about a systemd service that kills all running jobs, then simply runs "scontrol shutdown" to preserve the state of queued jobs and then resumes a regular system shutdown. In that case, automatic checkpointing of the jobs with dmtcp/mana would be cool, and I was encouraged when chatgpt claimed that slurm supported this. But the recent docs don't corroborate this claim,so I guess it got deprecated or something...

Christopher Samuel

unread,

Feb 18, 2023, 3:38:37 PM2/18/23

to slurm...@lists.schedmd.com

On 2/10/23 11:06 am, Analabha Roy wrote:

> I'm having some complex issues coordinating OpenMPI, SLURM, and DMTCP in
> my cluster.

If you're looking to try checkpointing MPI applications you may want to
experiment with the MANA ("MPI-Agnostic, Network-Agnostic MPI") plugin
for DMTCP here: https://github.com/mpickpt/mana

We (NERSC) are collaborating with the developers and it is installed on
Cori (our older Cray system) for people to experiment with. The
documentation for it may be useful to others who'd like to try it out -
it's got a nice description of how it works too which even I as a
non-programmer can understand.
https://docs.nersc.gov/development/checkpoint-restart/mana/

Pay special attention to the caveats in our docs though!

I've not used it myself, though I'm peripherally involved to give advice
on system related issues.

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Analabha Roy

unread,

Feb 19, 2023, 3:43:24 PM2/19/23

to Slurm User Community List

Hi,

Thanks for the advice. I already tried out mana, but at present it only works with mpich, not openmpi, which is what I've setup via Ubuntu.

AR

Reply all

Reply to author

Forward