[slurm-users] Jobs stuck with BeginTime and prolog exit status 99:0

596 views
Skip to first unread message

Chandler

unread,
May 17, 2022, 12:27:40 PM5/17/22
to Slurm User Community List
Could you help me figure out why our jobs are stuck PD because of BeginTime? e.g:

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
24458 defq cromwell smrtanal PD 0:00 1 (BeginTime)

# scontrol show job 24458
JobId=24458 JobName=cromwell_d72d675a_dataset_filter
UserId=smrtanalysis(1002) GroupId=smrtanalysis(1002) MCS_label=N/A
Priority=4294892709 Nice=0 Account=(null) QOS=normal
JobState=PENDING Reason=BeginTime Dependency=(null)
Requeue=1 Restarts=784 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2022-05-17T09:23:03 EligibleTime=2022-05-17T09:25:04
AccrueTime=2022-05-17T09:25:04
StartTime=2022-05-17T09:25:04 EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-05-17T09:23:03
Partition=defq AllocNode:Sid=EagI:2725352
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
BatchHost=EagI
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/data2/pacbio/smrtlink/jobs
StdErr=/data2/pacbio/smrtlink/jobs/cromwell-executions/pb_export_ccs/441e90d6-263b-41a5-bbbb-5009d9a346d9/call-prepare_input/prepare_input/d72d675a-4df7-4e9f-8072-e722742f48e7/call-dataset_filter/execution/stderr
StdIn=/dev/null
StdOut=/data2/pacbio/smrtlink/jobs/cromwell-executions/pb_export_ccs/441e90d6-263b-41a5-bbbb-5009d9a346d9/call-prepare_input/prepare_input/d72d675a-4df7-4e9f-8072-e722742f48e7/call-dataset_filter/execution/stdout
Power=
#

/var/log/slurmctld:
[2022-05-17T09:20:44.366] Requeuing JobId=24458
[2022-05-17T09:23:03.068] backfill: Started JobId=24458 in defq on EagI
[2022-05-17T09:23:03.106] error: prolog_slurmctld JobId=24458 prolog exit status 99:0
[2022-05-17T09:23:03.114] Requeuing JobId=24458

Thanks
--
Chandler Sobel-Sorenson (he/him) / Systems Administrator
Arizona Genomics Institute
www.genome.arizona.edu

Chandler Sobel-Sorenson

unread,
Jan 9, 2023, 12:44:58 PM1/9/23
to Slurm User Community List
Sorry for forgetting to update this, it dropped off my radar after not receiving any responses.  Hopefully my late reply here properly attaches to the original message (May 17, 2022, Message-ID: <338c62e0-c453-6cd3...@genome.arizona.edu>).

This problem with the jobs turned out to be caused by our Bright Cluster Manager license expiring, which is involved with managing the slurm demons, among many other things.  Since we didn't need paid support any longer, I just opted for the free license.  After renewing it, slurm began operating correctly again.

Best,
Chandler

--
The University of Arizona block 'A' logo.
*Chandler Sobel-Sorenson*
Systems Administrator, Senior
Arizona Genomics Institute
School of Plant Sciences, Research
THE UNIVERSITY OF ARIZONA

Thomas W. Keating Bioresearch Bldg. | Rm. 200A24
1657 E. Helen St. | Tucson, AZ 85721
Office: 520-626-9589 | Cell: 520-907-4352

chan...@genome.arizona.edu <mailto: chan...@genome.arizona.edu>
Pronouns: he/him/his
*www.genome.arizona.edu* <https://www.genome.arizona.edu/>

Integrity, Compassion, Exploration, Adaptation, Inclusion, Determination <https://brand.arizona.edu/signature>



Reply all
Reply to author
Forward
0 new messages