[slurm-users] Jobs aborting after slurmctld reload on Intel nodes - AMD unaffected

3 views
Skip to first unread message

Pharthiphan Asokan via slurm-users

unread,
Apr 16, 2026, 8:42:59 AM (8 days ago) Apr 16
to slurm...@lists.schedmd.com
Hi team,
We’re observing job aborts on Intel-based nodes immediately after a slurmctld reload. AMD nodes remain stable and jobs continue unaffected. No system or Slurm configuration changes were made before the issue started.
Error observed:
error: Aborting JobID=1288 due to change in socket/core configuration of allocated nodes
Relevant node configuration (Intel node example):
NodeName=smc-h4-u19 CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 ThreadsPerCore=1 RealMemory=510976 State=UNKNOWN Feature=model_SSG-1228-B
From logs:
error: valid_job_resources: smc-h4-u19 sockets:2:2, cores 64,32 error: Node configuration differs from hardware: CPUs=128:128(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)
What we’ve verified:
  • No changes in BIOS, firmware, or hardware topology
  • No edits to slurm.conf or slurm_nodes.conf
  • Reload (scontrol reconfigure) triggers job aborts only on Intel nodes
  • AMD nodes remain intact through reloads

Ole Holm Nielsen via slurm-users

unread,
Apr 16, 2026, 9:11:08 AM (8 days ago) Apr 16
to slurm...@lists.schedmd.com
On 4/16/26 14:05, Pharthiphan Asokan via slurm-users wrote:
> Hi team,
> We’re observing job aborts on Intel-based nodes immediately after a |
> slurmctld| reload. AMD nodes remain stable and jobs continue unaffected.
> No system or Slurm configuration changes were made before the issue started.
> Error observed:
>
> |error: Aborting JobID=1288 due to change in socket/core configuration of
> allocated nodes |

What's your Slurm version?

Please run "slurmd -C" on each type of node, and verify that your
slurm.conf NodeName=... lines agrees with this output. Any deviation
could cause the problem that you're experiencing.

Example output:

$ slurmd -C
NodeName=a045 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20
ThreadsPerCore=1 RealMemory=385045


IHTH,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark

--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com

John Hearns via slurm-users

unread,
Apr 16, 2026, 11:27:45 AM (8 days ago) Apr 16
to Pharthiphan Asokan, Slurm User Community List
Have you run lstopo on Intel and AMD nodes?

Run it in text mode and graphical mode.

It might be worth running lstopo in the job prolog and epilog and looking if the output changes 

Reply all
Reply to author
Forward
0 new messages