[slurm-users] Migration of slurm communication network / Steps / how to

212 views
Skip to first unread message

Purvesh Parmar

unread,
Apr 23, 2023, 1:59:33 PM4/23/23
to Slurm User Community List
Hello,

We have slurm 21.08 on ubuntu 20. We have a cluster of 8 nodes. Entire slurm communication happens over 192.168.5.x network (LAN). However as per requirement, now we are migrating the cluster to other premises and there we have 172.16.1.x (LAN). I have to migrate the entire network including SLURMDBD (mariadb), SLURMCTLD, SLURMD. ALso the cluster network is also changing from 192.168.5.x to 172.16.1.x and each node will be assigned the ip address from the 172.16.1.x network. 
The cluster has been running for the last 3 months and it is required to maintain the old usage stats as well.


 Is the procedure correct as below :

1) Stop slurm
2) suspend all the queued jobs
3) backup slurm database
4) change the slurm & munge configuration i.e. munge conf, mariadb conf, slurmdbd.conf, slurmctld.conf, slurmd.conf (on compute nodes), gres.conf, service file 
5) Later, do the update in the slurm database by executing below command
sacctmgr modify node where node=old_name set name=new_name
for all the nodes.
ALso, I think, slurm server name and slurmdbd server names are also required to be updated. How to do it, still checking
6) Finally, start slurmdbd, slurmctld on server and slurmd on compute nodes

Please help and guide for above.

Regards,

Purvesh Parmar
INHAIT

Ryan Novosielski

unread,
Apr 24, 2023, 12:32:45 AM4/24/23
to Slurm User Community List
I think it’s easier than all of this. Are you actually changing names of all of these things, or just IP addresses? It they all resolve to an IP now and you can bring everything down and change the hosts files or DNS, it seems to me that if the names aren’t changing, that’s that. I know that “scontrol show cluster” will show the wrong IP address but I think that updates itself.

The names of the servers are in slurm.conf, but again, if the names don’t change, that won’t matter. If you have IPs there, you will need to change them.

Sent from my iPhone

> On Apr 23, 2023, at 14:01, Purvesh Parmar <purves...@gmail.com> wrote:
> 

Purvesh Parmar

unread,
Apr 24, 2023, 12:59:22 AM4/24/23
to Slurm User Community List
thank you, but its change of hostnames as well, apart from ip addresses  as well of the slurm server, database serverver name and slurmd compute nodes as well.

Ole Holm Nielsen

unread,
Apr 24, 2023, 1:53:36 AM4/24/23
to slurm...@lists.schedmd.com
On 4/24/23 06:58, Purvesh Parmar wrote:
> thank you, but its change of hostnames as well, apart from ip addresses
> as well of the slurm server, database serverver name and slurmd compute
> nodes as well.

I suggest that you talk to your networking people and request that the old
DNS names be created in the new network's DNS for your Slurm cluster.
Then Ryan's solution will work. Changing DNS names is a very simple matter!

My 2 cents,
Ole


> On Mon, 24 Apr 2023 at 10:04, Ryan Novosielski <novo...@rutgers.edu
> <mailto:novo...@rutgers.edu>> wrote:
>
> I think it’s easier than all of this. Are you actually changing names
> of all of these things, or just IP addresses? It they all resolve to
> an IP now and you can bring everything down and change the hosts files
> or DNS, it seems to me that if the names aren’t changing, that’s that.
> I know that “scontrol show cluster” will show the wrong IP address but
> I think that updates itself.
>
> The names of the servers are in slurm.conf, but again, if the names
> don’t change, that won’t matter. If you have IPs there, you will need
> to change them.
>
> Sent from my iPhone
>
> > On Apr 23, 2023, at 14:01, Purvesh Parmar <purves...@gmail.com

Purvesh Parmar

unread,
Apr 24, 2023, 2:09:45 AM4/24/23
to Slurm User Community List
thank you, however, because this is change in the data center, the names of the servers contain datacenter names as well in its hostname and in fqdn as well, hence i have to change both, hostnames as well as ip addresses, compulsorily, to given hostnames as per new DC names. 
 

Ole Holm Nielsen

unread,
Apr 24, 2023, 2:36:57 AM4/24/23
to slurm...@lists.schedmd.com
On 4/24/23 08:09, Purvesh Parmar wrote:
> thank you, however, because this is change in the data center, the names
> of the servers contain datacenter names as well in its hostname and in
> fqdn as well, hence i have to change both, hostnames as well as ip
> addresses, compulsorily, to given hostnames as per new DC names.

Could your data center be persuaded to introduce DNS CNAME aliases for the
old names to point to the new DC names?

If you're forced to use new DNS names only, then it's simple to change DNS
names of compute nodes and partitions in slurm.conf:

NodeName=...
PartitionName=xxx Nodes=...

as well as the slurmdb server name:

AccountingStorageHost=...

What I have never tried before is to change the DNS name of the slurmctld
host:

ControlMachine=...

The critical aspect here is that you need to stop all batch jobs, plus
slurmdbd and slurmctld. Then you can backup (tar-ball) and transfer the
Slurm state directories:

StateSaveLocation=/var/spool/slurmctld

However, I don't know if the name of the ControlMachine is hard-coded in
the StateSaveLocation files?

I strongly suggest that you try to make a test migration of the cluster to
the new DC to find out if it works or not. Then you can always make
multiple attempts without breaking anything.

Best regards,
Ole


> On Mon, 24 Apr 2023 at 11:25, Ole Holm Nielsen <Ole.H....@fysik.dtu.dk
> <mailto:Ole.H....@fysik.dtu.dk>> wrote:
>
> On 4/24/23 06:58, Purvesh Parmar wrote:
> > thank you, but its change of hostnames as well, apart from ip
> addresses
> > as well of the slurm server, database serverver name and slurmd
> compute
> > nodes as well.
>
> I suggest that you talk to your networking people and request that the
> old
> DNS names be created in the new network's DNS for your Slurm cluster.
> Then Ryan's solution will work.  Changing DNS names is a very simple
> matter!
>
> My 2 cents,
> Ole
>
>
> > On Mon, 24 Apr 2023 at 10:04, Ryan Novosielski
> <novo...@rutgers.edu <mailto:novo...@rutgers.edu>
> > <mailto:novo...@rutgers.edu <mailto:novo...@rutgers.edu>>> wrote:
> >
> >     I think it’s easier than all of this. Are you actually changing
> names
> >     of all of these things, or just IP addresses? It they all
> resolve to
> >     an IP now and you can bring everything down and change the
> hosts files
> >     or DNS, it seems to me that if the names aren’t changing,
> that’s that.
> >     I know that “scontrol show cluster” will show the wrong IP
> address but
> >     I think that updates itself.
> >
> >     The names of the servers are in slurm.conf, but again, if the names
> >     don’t change, that won’t matter. If you have IPs there, you
> will need
> >     to change them.
> >
> >     Sent from my iPhone
> >
> >      > On Apr 23, 2023, at 14:01, Purvesh Parmar
> <purves...@gmail.com <mailto:purves...@gmail.com>
> >     <mailto:purves...@gmail.com

Purvesh Parmar

unread,
Apr 24, 2023, 2:57:02 AM4/24/23
to Slurm User Community List

Thank you.. will try this and get back. Any other step being missed here for migration?


Thankyou,


Purvesh 

Ole Holm Nielsen

unread,
Apr 24, 2023, 4:01:35 AM4/24/23
to slurm...@lists.schedmd.com
On 4/24/23 08:56, Purvesh Parmar wrote:
> Thank you.. will try this and get back. Any other step being missed here
> for migration?

I don't know if any steps are missing, because I never tried moving a
cluster like you want to do.

/Ole

> On Mon, 24 Apr 2023 at 12:08, Ole Holm Nielsen <Ole.H....@fysik.dtu.dk
> > <mailto:Ole.H....@fysik.dtu.dk
Reply all
Reply to author
Forward
0 new messages