slurmctld reload.
AMD nodes remain stable and jobs continue unaffected. No system or Slurm configuration changes were made before the issue started.error: Aborting JobID=1288 due to change in socket/core configuration of allocated nodes
NodeName=smc-h4-u19 CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=1 RealMemory=510976 State=UNKNOWN Feature=model_SSG-1228-B
error: valid_job_resources: smc-h4-u19 sockets:2:2, cores 64,32 error: Node configuration differs from hardware: CPUs=128:128(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)
slurm.conf or slurm_nodes.confscontrol reconfigure) triggers job aborts only on Intel
nodes--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com