[slurm-users] Moving Slurmctld and slurmdbd to a new host

605 views
Skip to first unread message

Prentice Bisbal

unread,
Jan 15, 2021, 1:44:13 PM1/15/21
to slurm...@lists.schedmd.com
Slurm users,

I'm planning on moving slurmctld and slurmdbd to a new host. I know how
to dump the MySQL DB from the old server and import it to the new
slurmdbd host, and I know how to copy the job state directories to the
new host. I plan on doing this during our next maintenance window when
there are no jobs running on the cluster.

However, there will be plenty of jobs in the queue, so my question is
this: What will happen to jobs in the queue when I do this? Is the queue
information stored in the database or the job state directories, or a
third location? How can I make sure I don't lose the state of the queue?

--
Prentice


Ryan Novosielski

unread,
Jan 15, 2021, 10:49:36 PM1/15/21
to Slurm User Community List
My understanding is job state directory. Theoretically if you back it up, screw up and lose it, you can restore it and try again. There’s some mention of this in the upgrade docs if I’m not mistaken (as they suggest backing it up in case you mess up during). 

--
#BlackLivesMatter
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novo...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Jan 15, 2021, at 13:44, Prentice Bisbal <pbi...@pppl.gov> wrote:

Slurm users,

Michael Gutteridge

unread,
Jan 16, 2021, 1:44:26 PM1/16/21
to Slurm User Community List
I'd confirm that as well.  The state directory has all of that information.  We just upgraded from 18.05 to 20.02 on a different host and while the cluster was quiet (we had a maintenance reservation in place) there were running jobs which survived the upgrade.

I think the big thing to watch out for is setting the slurmdtimeout in your config prior to the update.  Might not be necessary depending on the exact steps you're using, but it's useful insurance against job loss.

HTH

 - Michael

Prentice Bisbal

unread,
Jan 19, 2021, 2:21:43 PM1/19/21
to slurm...@lists.schedmd.com

Thanks to both of you for your replies. I did the move this morning, and it went off without a hitch. It does appear that the job state directory keeps track of the queue data, because as soon as I copied those dirs over, I was able to see the queue information on the new Slurm controller.

I had done this operation once before, but it was a couple years ago, so I just wanted to be safe rather than sorry. Thanks for the help.

Prentice

-- 
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
http://www.pppl.gov
Reply all
Reply to author
Forward
0 new messages