[JIRA] (JENKINS-60507) Pipeline stuck when allocating machine | node block appears to be neither running nor scheduled

54 views
Skip to first unread message

stoiky@gmail.com (JIRA)

unread,
Dec 16, 2019, 10:34:05 AM12/16/19
to jenkinsc...@googlegroups.com
Mihai Stoichitescu created an issue
 
Jenkins / Bug JENKINS-60507
Pipeline stuck when allocating machine | node block appears to be neither running nor scheduled
Issue Type: Bug Bug
Assignee: Unassigned
Attachments: plugins_versions.txt, queue.logs.zip
Components: core, workflow-durable-task-step-plugin
Created: 2019-12-16 15:33
Priority: Minor Minor
Reporter: Mihai Stoichitescu

 

Our build system is sometimes showing this in the Thread Dump of a Pipeline while waiting for free executors

 

Thread #94
at DSL.node(node block appears to be neither running nor scheduled)
at WorkflowScript.runOnNode(WorkflowScript:1798)
at DSL.timeout(body has another 3 hr 14 min to run)
at WorkflowScript.runOnNode(WorkflowScript:1783)
at DSL.retry(Native Method)
at WorkflowScript.runOnNode(WorkflowScript:1781)
at WorkflowScript.getClosure(WorkflowScript:1901)

 

 

In BlueOcean this appears, but the build queue is empty, and executors are available with those labels.

 

Still waiting to schedule task
Waiting for next available executor on pr&&prod&&mac&&build

 

 

The job can only be completed by aborting or waiting for the timeout step to do it’s work.

 

We started observing it since v2.121.3 (workflow-durable-task-step v2.19) but recently we updated to v2.190.1 (workflow-durable-task-step v2.28) and still seeing stuck pipelines when waiting for executors.

 

The only reference I could find was in the last comment of this issue: https://issues.jenkins-ci.org/browse/JENKINS-42556 and there’s no way we can reproduce it. We’ve noticed this fix made by Jesse Glick but not sure if it will help us. We tried turning on Anonymous for a week and we still saw the problem.

 

Please let me know if there’s more information/logs that I can help with to track down what might be the cause of this. Thanks.

 

I've attached FINEST level logs on hudson.model.Queue, not sure if that will help a lot.

Our Jenkins runs on RedHat, on Tomcat/9.0.14 and Java 1.8.0_171.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

stoiky@gmail.com (JIRA)

unread,
Dec 16, 2019, 10:36:04 AM12/16/19
to jenkinsc...@googlegroups.com
Mihai Stoichitescu updated an issue
Change By: Mihai Stoichitescu
 

Our build system is sometimes showing this in the Thread Dump of a Pipeline while waiting for free executors

 
{code:java}

Thread #94
at DSL.node(node block appears to be neither running nor scheduled)
at WorkflowScript.runOnNode(WorkflowScript:1798)
at DSL.timeout(body has another 3 hr 14 min to run)
at WorkflowScript.runOnNode(WorkflowScript:1783)
at DSL.retry(Native Method)
at WorkflowScript.runOnNode(WorkflowScript:1781)
at WorkflowScript.getClosure(WorkflowScript:1901){code}
 

 

In BlueOcean this appears, but the build queue is empty, and executors are available with those labels.

 
{code:java}

Still waiting to schedule task
Waiting for next available executor on pr&&prod&&mac&&build{code}
 

 

The job can only be completed by aborting or waiting for the timeout step to do it’s work.

 

We started observing it since v2.121.3 (workflow-durable-task-step v2.19) but recently we updated to v2.190.1 (workflow-durable-task-step v2.28) and still seeing stuck pipelines when waiting for executors.

 

The only reference I could find was in the last comment of this issue: https://issues.jenkins-ci.org/browse/JENKINS-42556 and there’s no way we can reproduce it. We’ve noticed this fix made by [~jglick] but not sure if it will help us. We tried turning on Anonymous for a week and we still saw the problem.

 

Please let me know if there’s more information/logs that I can help with to track down what might be the cause of this. Thanks.

 

I've attached _FINEST_ level logs on _hudson.model.Queue_, not sure if that will help a lot.

Our Jenkins runs on RedHat, on Tomcat/9.0.14 and Java 1.8.0_171.

stoiky@gmail.com (JIRA)

unread,
Dec 16, 2019, 10:36:05 AM12/16/19
to jenkinsc...@googlegroups.com

DemenkovK@gmail.com (JIRA)

unread,
Jan 10, 2020, 7:04:02 AM1/10/20
to jenkinsc...@googlegroups.com
Konstantin Demenkov commented on Bug JENKINS-60507
 
Re: Pipeline stuck when allocating machine | node block appears to be neither running nor scheduled

I have the same issue on latest 2.201 LTS. It appears pretty often (10% of jobs) in working with proxmox slaves over proxmox cloud plugin and jnlp. I suspect some incompatibility in timeouts/ connection's logic between master and proxmox slaves, but really don't know, why it happens.

DemenkovK@gmail.com (JIRA)

unread,
Jan 10, 2020, 7:11:02 AM1/10/20
to jenkinsc...@googlegroups.com

DemenkovK@gmail.com (JIRA)

unread,
Jan 10, 2020, 7:49:03 AM1/10/20
to jenkinsc...@googlegroups.com
Konstantin Demenkov edited a comment on Bug JENKINS-60507
I have the same issue on latest 2. 201 204.1 LTS. It appears pretty often (10% of jobs) in working with proxmox slaves over proxmox cloud plugin and jnlp. I suspect some incompatibility in timeouts/ connection's logic between master and proxmox slaves, but really don't know, why it happens.

stoiky@gmail.com (JIRA)

unread,
Mar 18, 2020, 4:38:03 AM3/18/20
to jenkinsc...@googlegroups.com

We are still being hit by the issue from time to time, any ideas/workarounds/help to debug would be appreciated. Thanks

This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages