[slurm-users] Multithreads config

515 views
Skip to first unread message

david martin

unread,
Feb 16, 2018, 9:29:55 AM2/16/18
to slurm...@lists.schedmd.com

Hi,

I have a single physical server with :

  • 63 cpus (each cpu has 16 cores)
  • 480Gb total memory

 

NodeNAME= Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 Procs=63 REALMEMORY=480000

 

 

This configuration will not work. What is should be ?

Thanks,

David

david MARTIN

unread,
Feb 16, 2018, 9:31:31 AM2/16/18
to slurm...@schedmd.com

Benjamin Redling

unread,
Feb 16, 2018, 9:40:05 AM2/16/18
to slurm...@lists.schedmd.com
Am 16.02.2018 um 15:28 schrieb david martin:
> *I have a single physical server with :*

> * *63 cpus (each cpu has 16 cores) *
> * *480Gb total memory*
>

> *NodeNAME= Sockets=1 CoresPerSocket=16 ThreadsPerCore=1 Procs=63
> REALMEMORY=480000***


> *This configuration will not work. What is should be ?*

A proper configuration that shows basic quantities of effort went into
reading the documentation.

RTFM and use the configurator:
https://slurm.schedmd.com/configurator.html

You failed to define a nodename and apart from that just defining a node
isn't enough -- you need at least a partition that uses that node...

Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html

david martin

unread,
Feb 16, 2018, 10:17:24 AM2/16/18
to Slurm User Community List
I have included in slurm.conf the following (based on web configurator). i have 64 cpus, not 63.

NodeName=obelix CPUs=64 RealMemory=480000  CoresPerSocket=16 ThreadsPerCore=1 state=UNKNOWN

>sinfo -Nl


sinfo: error: NodeNames=obelix CPUs=64 doesn't match Sockets*CoresPerSocket*ThreadsPerCore (16), resetting CPUs
Fri Feb 16 16:02:22 2018

NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
obelix         1    testq*     drained   64   4:16:1 480000        0      1   (null) Low socket*core*thre



what´s wrong ?



On 16/02/2018 15:39, Benjamin Redling wrote:
Am 16.02.2018 um 15:28 schrieb david martin:
*I have a single physical server with :*

      
  * *64 cpus (each cpu has 16 cores) *
  * *480Gb total memory*

Ade Fewings

unread,
Feb 16, 2018, 10:26:10 AM2/16/18
to Slurm User Community List

Log in to the compute node and run 'slurmd -C' to get Slurm's viewpoint:


e.g.

[root@cwc001 ~]# slurmd -C
NodeName=cwc001 CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=1 RealMemory=36138 TmpDisk=92680

~~
A


From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of david martin <vil...@gmail.com>
Sent: 16 February 2018 15:16:35
To: Slurm User Community List
Subject: Re: [slurm-users] Multithreads config
 

   HPC Wales - www.hpcwales.co.uk


The contents of this email and any files transmitted with it are confidential and intended solely for the named addressee only.  Unless you are the named addressee (or authorised to receive this on their behalf) you may not copy it or use it, or disclose it to anyone else.  If you have received this email in error, please notify the sender by email or telephone.  All emails sent by High Performance Computing Wales have been checked using an Anti-Virus system.  We would advise you to run your own virus check before opening any attachments received as we will not in any event accept any liability whatsoever, once an email and/or attachment is received.

High Performance Computing Wales is a private limited company incorporated in Wales on 8 March 2010 as company number 07181701.

Our registered office is at Finance Office, Bangor University, Cae Derwen, College Road, Bangor, Gwynedd. LL57 2DG. UK.

High Performance Computing Wales is part funded by the European Regional Development Fund through the Welsh Government.

Chris Samuel

unread,
Feb 17, 2018, 1:51:37 AM2/17/18
to slurm...@lists.schedmd.com
On Saturday, 17 February 2018 2:16:35 AM AEDT david martin wrote:

> NodeName=obelix CPUs=64 RealMemory=480000 CoresPerSocket=16
> ThreadsPerCore=1 state=UNKNOWN
> >sinfo -Nl
>
> sinfo: error: NodeNames=obelix CPUs=64 doesn't match
> Sockets*CoresPerSocket*ThreadsPerCore (16), resetting CPUs

Basic maths.

1 socket per node * 16 cores per socket = 16 CPUs per node

As Ade said, see what Slurmd says it sees.

Also validate it against "nproc" (for total number of cores) and "lscpu"
output (for more general node config info).

Is this a quad socket box? Or are you confusing multiple threads per core
with CPUs?

Best of luck,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC


Reply all
Reply to author
Forward
0 new messages