Jenkins LTS 2.319.1 failed to restart due to thread-deadlock from pending build in Build Queue

669 views
Skip to first unread message

Do Hoang Khiem

unread,
Dec 15, 2021, 5:26:41 AM12/15/21
to Jenkins Users
We upgraded to LTS 2.319.1 and observed issue when we tried to restart the Jenkins service and there was a job stuck in the Build Queue, Jenkins startup locked and could not get up running. We had to use kill -9 and start the service again. 

Step to reproduces (on Ubuntu 20.04)
- Create a Jenkins pipeline, specifying non-existing agent label, for example 

pipeline {
    agent { label ‘non-existing’ }
    stages {
        stage('build') {
            steps {
                sh ‘echo Hello’
            }
        }
    }
}

- Trigger above pipeline build, it'll be put into Build Queue

- Restart Jenkins service: sudo systemctl restart jenkins

- Now, from Jenkins logs, it got stuck at

2021-12-15 02:56:49.087+0000 [id=248] WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer [#8] locked on hudson.model.RunMap@206de104 (owned by pool-21-thread-1):
at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)
at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)
at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)
at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)
at hudson.model.Run.fromExternalizableId(Run.java:2483)
at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.runForDisplay(ExecutorStepExecution.java:527)
at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getCauseOfBlockage(ExecutorStepExecution.java:425)
at hudson.model.Queue.getCauseOfBlockageForTask(Queue.java:1236)
at hudson.model.Queue.getCauseOfBlockageForItem(Queue.java:1193)
at hudson.model.Queue.maintain(Queue.java:1601)
at hudson.model.Queue$MaintainTask.doRun(Queue.java:2944)

at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:90)
at
jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
at java...@11.0.10/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java...@11.0.10/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java...@11.0.10/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java...@11.0.10/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java...@11.0.10/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java...@11.0.10/java.lang.Thread.run(Thread.java:834)
, pool-21-thread-1 locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@2eaa1aa1 (owned by jenkins.util.Timer [#8]):
at java...@11.0.10/jdk.internal.misc.Unsafe.park(Native Method)
at java...@11.0.10/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at java...@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
at java...@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
at java...@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
at java...@11.0.10/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
at hudson.model.Queue.schedule2(Queue.java:567)
at hudson.model.Queue.schedule2(Queue.java:693)
at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.start(ExecutorStepExecution.java:104)
at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.onResume(ExecutorStepExecution.java:210)
at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:265)
at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:243)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:975)
at org.jenkinsci.plugins.workflow.flow.DirectExecutor.execute(DirectExecutor.java:33)
at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105)
at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:155)
at com.google.common.util.concurrent.Futures.addCallback(Futures.java:985)
at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener.onResumed(FlowExecutionList.java:243)
at org.jenkinsci.plugins.workflow.flow.FlowExecutionListener.fireResumed(FlowExecutionListener.java:84)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:567)
at hudson.model.RunMap.retrieve(RunMap.java:226)
at hudson.model.RunMap.retrieve(RunMap.java:58)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:506)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:488)
at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:386)

at hudson.model.RunMap.getById(RunMap.java:206)

at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:948)

at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:959)

at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getExecution(CpsStepContext.java:217)

at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:242)

at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:236)

at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:293)

at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:75)

at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getNode(ExecutorStepExecution.java:378)

at org.datadog.jenkins.plugins.datadog.listeners.DatadogQueueListener.lambda$getNodeAsync$0(DatadogQueueListener.java:141)

at org.datadog.jenkins.plugins.datadog.listeners.DatadogQueueListener$$Lambda$579/0x00000008410b9040.call(Unknown Source)

at java...@11.0.10/java.util.concurrent.FutureTask.run(FutureTask.java:264)

at java...@11.0.10/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)

at java...@11.0.10/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

at java...@11.0.10/java.lang.Thread.run(Thread.java:834)

, Jenkins initialization thread locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@2eaa1aa1 (owned by jenkins.util.Timer [#8]):

at java...@11.0.10/jdk.internal.misc.Unsafe.park(Native Method)

at java...@11.0.10/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)

at java...@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)

at java...@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)

at java...@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)

at java...@11.0.10/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)

at hudson.model.Queue._withLock(Queue.java:1388)

at hudson.model.Queue.withLock(Queue.java:1266)

at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:241)

at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1661)

at jenkins.model.Jenkins.<init>(Jenkins.java:1008)

at hudson.model.Hudson.<init>(Hudson.java:85)

at hudson.model.Hudson.<init>(Hudson.java:81)

at hudson.WebAppMain$3.run(WebAppMain.java:298)


Do Hoang Khiem

unread,
Dec 15, 2021, 5:42:34 AM12/15/21
to Jenkins Users
it seems to be quite similar to this https://issues.jenkins.io/browse/JENKINS-67351

dnus...@cloudbees.com

unread,
Dec 15, 2021, 5:12:49 PM12/15/21
to Jenkins Users
Hi, thanks for the report.

I added some comments to JENKINS-67351. It looks like this is a bug in the workflow-api (Pipeline: API) and workflow-durable-task-step (Pipeline: Nodes and Processes) plugins caused by some recent changes I made.

While I work on a fix, you may be able to downgrade to workflow-api 2.47 and workflow-durable-task-step 1101.vf832bc1ac745 to avoid the issue.

Thanks,
Devin

Do Hoang Khiem

unread,
Dec 16, 2021, 3:20:07 AM12/16/21
to Jenkins Users
Great, thanks Devin for quick response. 

Btw, if there's fix for these plugins coming soon then we might wait for that and upgrade, just want to avoid any possible pitfalls from downgrade. 

Regards,
Khiem

Reply all
Reply to author
Forward
0 new messages