Dear list,
I just upgraded my cluster from SLURM 20.11.8 to 21.08.4. Before the upgrade I updated my configuration based on this comment from the release notes¹:
> -- Removed AccountingStoreJobComment option. Please update your config to use
> AccountingStoreFlags=job_comment instead.
After updating the slurmd.conf I upgraded SLURM, but got this error:
> slurmd[21264]: error: _parse_next_key: Parsing error at unrecognized key: AccountingStoreFlags
> slurmd[21264]: error: Parse error in file /etc/slurm/slurm.conf line 119: "AccountingStoreFlags=job_comment"
> slurmd[21264]: fatal: Unable to process configuration file
Then slurmctld drained all my nodes and all my jobs got cancelled. After I removed the invalid AccountingStoreFlags option and restarted the SLURM daemons on all nodes the jobs got rescheduled, but now all nodes are drained due to "Duplicate jobid". *sigh*.
What happened here? Is this a bug? This is the messiest SLURM upgrade I've had in years... thank you for any advice,
--