Hi,Did you configure your node definition with the outputs of slurmd -C? Ignore boards. Don't know if it is still true but several years ago declaring boards made things difficult.
Also, if you have hyperthreaded AMD or Intel processors your partition declaration should be overscribe:2
Start with a very simple job with a script containing sleep 100 or something else without any runtime issues.
PartitionName=a Nodes= [301-308] Default=No OverSubscribe=YES:2 MaxTime=Infinite State=Up AllowAccounts=cowboys
In the sbatch/srun the user needs to add a declaration "oversubscribe=yes" telling slurm the job can run on both logical cores available. In the days on Knight's Landing each core could handle four logical cores but I don't believe there are any current AMD or Intel processors supporting more then two logical cores (hyperthreads per core). The conversation about hyperthreads is difficult as the Intel terminology is logical cores for hyperthreading and cores for physical cores but the tendency is to call the logical cores threads or hyperthreaded cores. This can be very confusing for consumers of the resources.
In any case, if you create an array job of 1-100 sleep jobs, my simplest logical test job, then you can use scontrol show node <nodename> to see the nodes resource configuration as well as consumption. squeue -w <nodename> -i 10 will iteratate every ten seconds to show you the node chomping through the job.
Hope this helps. Once you are comfortable I would urge you to use the NodeName/Partition descriptor format above and encourage your users to declare oversubscription in their jobs. It is a little more work up front but far easier than correcting scripts later.
Doug
Hi,
Declaring cores=64 will absolutely work but if you start running MPI you'll want a more detailed config description. The easy way to read it is "128=2 sockets * 32 corespersocket * 2 threads per core".
NodeName=hpc[306-308] CPUs=128 Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=512000 TmpDisk=100But if you just want to work with logical cores the "cpus=128" will work.If you go with the more detailed description then you need to declare oversubscription (hyperthreading) in the partition declaration.
By default slurm will not let two different jobs share the logical cores comprising a physical core. For example if Sue has an Array of 1-1000 her array tasks could each take a logical core on a physical core. But if Jamal is also running they would not be able to share the physical core. (as I understand it).PartitionName=a Nodes= [301-308] Default=No OverSubscribe=YES:2 MaxTime=Infinite State=Up AllowAccounts=cowboys
In the sbatch/srun the user needs to add a declaration "oversubscribe=yes" telling slurm the job can run on both logical cores available.
Hi,
Suggest removing "boards=1", The docs say to include it but in previous discussions with schedmd we were advised to remove it.
When you are running execute "scontrol show node <nodename>" and look at the lines ConfigTres and AllocTres. The former is what the maitre d believes is available, the latter what has been allocated.Then "scontrol show job <jobid>" looking down at the "NumNodes" like which will show you what the job requested.I suspect there is a syntax error in the submit.
Hi,
I forgot one thing you didn't mention. When you change the node descriptors and partitions you have to also restart slurmctld. scontrol reconfigure works for the nodes but the main daemon has to be told to reread the config. Until you restart the daemon it will be referencing the config from the last time it started.