[slurm-users] Weird one - deleting a user

459 views
Skip to first unread message

Bill Wichser

unread,
Jul 27, 2021, 4:48:00 PM7/27/21
to slurm...@lists.schedmd.com
[root@della5 bill]# sacctmgr -i delete user mable
Error with request: Job(s) active, cancel job(s) before remove
JobID = 602995 C = tukey A = politics U = mable

Yup, when a user has an active job they cannot be deleted from the
database. The thing is, this cluster tukey has been offline for maybe 5
years now. Probably more.

I don't want to lose the old records in the database. Is there a way to
say,

Hey, that job you believe is still running on tukey, well it doesn't
exist anymore so please close this job

?

I haven't figured out a way to do this outside the database so suspect
that only DB manipulation is the only answer.

Thanks,
Bill

Carlos Fenoy

unread,
Jul 27, 2021, 4:59:51 PM7/27/21
to Slurm User Community List
Hi,

You can cleanup those jobs with sacctmgr. 

sacctmgr show RunawayJobs

This will list the runaway jobs, and if any will ask if you want to fix them. 

Regards,
Carlos
--
--
Carles Fenoy

Douglas Jacobsen

unread,
Jul 27, 2021, 5:06:14 PM7/27/21
to Slurm User Community List
Try running `sacctmgr show runawayjobs` (or similar see manual to be sure), my bet is that the user has a job apparently running according to the database and this will at least tell you about them.
----
Doug Jacobsen, Ph.D.
NERSC Senior Computing Engineer
Group Lead, Computational Systems Group
National Energy Research Scientific Computing Center
dmjac...@lbl.gov

------------- __o
---------- _ '\<,_
----------(_)/  (_)__________________________

Bill Wichser

unread,
Jul 27, 2021, 5:07:50 PM7/27/21
to slurm...@lists.schedmd.com

The cluster doesn't exist though. This was what I tried first.

[root@della5 bill]# sacctmgr show RunawayJobs cluster=tukey
sacctmgr: error: Slurmctld running on cluster tukey is not up, can't
check running jobs

Bill
Reply all
Reply to author
Forward
0 new messages