[slurm-users] GPU GRES verification and some really broad questions.

63 views
Skip to first unread message

Shooktija S N via slurm-users

unread,
May 3, 2024, 8:55:18 AM5/3/24
to slurm...@lists.schedmd.com
Hi,

I am a complete slurm-admin and sys-admin noob trying to set up a 3 node Slurm cluster. I have managed to get a minimum working example running, in which I am able to use a GPU (NVIDIA GeForce RTX 4070 ti) as a GRES. 

This is slurm.conf without the comment lines:
root@server1:/etc/slurm# grep -v "#" slurm.conf
ClusterName=DlabCluster
SlurmctldHost=server1
GresTypes=gpu
ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/affinity,task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=debug3
SlurmdLogFile=/var/log/slurmd.log
NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP
This is gres.conf (only one line), each node has been assigned its corresponding NodeName:
root@server1:/etc/slurm# cat gres.conf
NodeName=server1 Name=gpu File=/dev/nvidia0
Those are the only config files I have.

I have a few general questions, loosely arranged in ascending order of generality:

1) I have enabled the allocation of GPU resources as a GRES and have tested this by running:
shookti@server1:~$ srun --nodes=3 --gpus=3 --label hostname
2: server3
0: server1
1: server2
Is this a good way to check if the configs have worked correctly? How else can I easily check if the GPU GRES has been properly configured?

2) I want to reserve a few CPU cores, and a few gigs of memory for use by non slurm related tasks. According to the documentation, I am to use CoreSpecCount and MemSpecLimit to achieve this. The documentation for CoreSpecCount says "the Slurm daemon slurmd may either be confined to these resources (the default) or prevented from using these resources", how do I change this default behaviour to have the config specify the cores reserved for non slurm stuff instead of specifying how many cores slurm can use?

3) While looking up examples online on how to run Python scripts inside a conda env, I have seen that the line 'module load conda' should be run before running 'conda activate myEnv' in the sbatch submission script. The command 'module' did not exist until I installed the apt package 'environment-modules', but now I see that conda is not listed as a module that can be loaded when I check using the command 'module avail'. How do I fix this?

4) A very broad question: while managing the resources being used by a program, slurm might happen to split the resources across multiple computers that might not necessarily have the files required by this program to run. For example, a python script that requires the package 'numpy' to function but that package was not installed on all of the computers. How are such things dealt with? Is the module approach meant to fix this problem? In my previous question, if I had a python script that users usually run just by running a command like 'python3 someScript.py' instead of running it within a conda environment, how should I enable slurm to manage the resources required by this script? Would I have to install all the packages required by this script on all the computers that are in the cluster?

5) Related to the previous question: I have set up my 3 nodes in such a way that all the users' home directories are stored on a ceph cluster created using the hard drives from all the 3 nodes, which essentially means that a user's home directory is mounted at the same location on all 3 computers - making a user's data visible to all 3 nodes. Does this make the process of managing the dependencies of a program as described in the previous question easier? I realise that programs having to read and write to files on the hard drives of a ceph cluster is not really the fastest so I am planning on having users use the /tmp/ directory for speed critical reading and writing, as the OSs have been installed on NVME drives.

Loris Bennett via slurm-users

unread,
May 10, 2024, 5:00:52 AM5/10/24
to slurm...@lists.schedmd.com, Shooktija S N
Hi,
What do you mean by 'properly configured'? Ultimately you will want to
submit a job to the nodes and use something like 'nvidia-smi' to see
whether the GPUs are actually being used.

> 2) I want to reserve a few CPU cores, and a few gigs of memory for use by non slurm related tasks. According to the documentation, I am to use
> CoreSpecCount and MemSpecLimit to achieve this. The documentation for CoreSpecCount says "the Slurm daemon slurmd may either be confined to these
> resources (the default) or prevented from using these resources", how do I change this default behaviour to have the config specify the cores reserved for non
> slurm stuff instead of specifying how many cores slurm can use?

I am not aware that this is possible.

> 3) While looking up examples online on how to run Python scripts inside a conda env, I have seen that the line 'module load conda' should be run before
> running 'conda activate myEnv' in the sbatch submission script. The command 'module' did not exist until I installed the apt package 'environment-modules',
> but now I see that conda is not listed as a module that can be loaded when I check using the command 'module avail'. How do I fix this?

Environment modules and Conda are somewhat orthogonal to each other.

Environment modules is a mechanism for manipulating environment
variables such as PATH and LD_LIBRARY_PATH. It allows you to provide
easy access for all users to software which has been centrally installed
in non-standard paths. It is not used to provide access to software
installed via 'apt'.

Conda is another approach to providing non-standard software, but is
usually used by individual users to install programs in their own home
directories.

You can use environment modules to allow access to a different version
of Conda than the one you get via 'apt', but there is no necessity to do
that.

> 4) A very broad question: while managing the resources being used by a program, slurm might happen to split the resources across multiple computers that
> might not necessarily have the files required by this program to run. For example, a python script that requires the package 'numpy' to function but that
> package was not installed on all of the computers. How are such things dealt with? Is the module approach meant to fix this problem? In my previous
> question, if I had a python script that users usually run just by running a command like 'python3 someScript.py' instead of running it within a conda
> environment, how should I enable slurm to manage the resources required by this script? Would I have to install all the packages required by this script on all
> the computers that are in the cluster?

In general a distributed or cluster file system, such as NFS, Ceph or
Lustre is used to provide access to multiple nodes. /home would be on
such a files system, as would a large part of the software. You can
use something like EasyBuild which will install software and generate
the relevant module files.

> 5) Related to the previous question: I have set up my 3 nodes in such a way that all the users' home directories are stored on a ceph cluster created using the
> hard drives from all the 3 nodes, which essentially means that a user's home directory is mounted at the same location on all 3 computers - making a user's
> data visible to all 3 nodes. Does this make the process of managing the dependencies of a program as described in the previous question easier? I realise that
> programs having to read and write to files on the hard drives of a ceph cluster is not really the fastest so I am planning on having users use the /tmp/ directory
> for speed critical reading and writing, as the OSs have been installed
> on NVME drives.

Depending on the IO patterns created by a piece of software using the
distributed file system might be fine or a local disk might be needed.
Note that you might experience problems with /tmp filling up, so it may
be better to have a separate /localscratch. In general you probably also
want people to use as much RAM as possible in order to avoid filesystem
IO altogether if this is feasible.

HTH

Loris

--
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

Reply all
Reply to author
Forward
0 new messages