Dear all,
we have a node with 2 x 64 CPUs (with two threads each) and 8
GPUs, running slurm 22.05.5
In order to make use of individual threads, we changed
SelectTypeParameters=CR_Core
NodeName=nodename CPUs=256 Sockets=2 CoresPerSocket=64
ThreadsPerCore=2
to
SelectTypeParameters=CR_CPU
NodeName=nodename CPUs=256
We are now able to allocate individual threads to jobs, despite the following error in slurmd.log:
error: Node configuration differs from hardware: CPUs=256:256(hw) Boards=1:1(hw) SocketsPerBoard=256:2(hw) CoresPerSocket=1:64(hw) ThreadsPerCore=1:2(hw)
However, it appears that since this change, we can only make use
of 4 out of the 8 GPUs.
The output of "sinfo -o %G" might be relevant.
In the first situation it was
$ sinfo -o %G GRES gpu:A100:8(S:0,1)
Now it is:
$ sinfo -o %G GRES gpu:A100:8(S:0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126)
Has anyone faced this or a similar issue and can
give me some directions?
Best wishes
Sebastian
Diego,
Not to start a debate, I guess it is in how you look at it.
From Intel's descriptions:
How does Hyper-Threading work? When Intel® Hyper-Threading Technology is active, the CPU exposes two execution contexts per physical core. This means that one physical core now works like two “logical cores” that can handle different software threads. The ten-core Intel® Core™ i9-10900K processor, for example, has 20 threads when Hyper-Threading is enabled.
Two logical cores can work through tasks more efficiently than a traditional single-threaded core. By taking advantage of idle time when the core would formerly be waiting for other tasks to complete, Intel® Hyper-Threading Technology improves CPU throughput (by up to 30% in server applications3).