[JIRA] (JENKINS-61779) Regression: Job stuck in queue waiting forever after upgrade

29 views
Skip to first unread message

peter@softwolves.pp.se (JIRA)

unread,
Apr 2, 2020, 8:50:03 AM4/2/20
to jenkinsc...@googlegroups.com
Peter Krefting updated an issue
 
Jenkins / Bug JENKINS-61779
Regression: Job stuck in queue waiting forever after upgrade
Change By: Peter Krefting
Summary: Regression: Job stock stuck in queue waiting forever after upgrade
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo

dbeck@cloudbees.com (JIRA)

unread,
Apr 3, 2020, 8:42:03 PM4/3/20
to jenkinsc...@googlegroups.com
Daniel Beck commented on Bug JENKINS-61779
 
Re: Regression: Job stuck in queue waiting forever after upgrade

May be caused by https://github.com/jenkinsci/jenkins/pull/3983 in Jenkins 2.205.

If a similar behavior works in 2.204 (not LTS), and doesn't work in 2.205, that would be a strong indication.

A bit weird we're only hearing about this now though. I cannot find similar recent bugs in throttle-concurrents either.

peter@softwolves.pp.se (JIRA)

unread,
Apr 6, 2020, 2:14:02 AM4/6/20
to jenkinsc...@googlegroups.com

Is there a 2.205 DEB that I can install to see if it has the same behaviour (if the configuration is downward compatible)?

dbeck@cloudbees.com (JIRA)

unread,
Apr 6, 2020, 4:12:05 AM4/6/20
to jenkinsc...@googlegroups.com

peter@softwolves.pp.se (JIRA)

unread,
Apr 6, 2020, 7:10:02 AM4/6/20
to jenkinsc...@googlegroups.com

dbeck@cloudbees.com (JIRA)

unread,
Apr 6, 2020, 7:26:03 AM4/6/20
to jenkinsc...@googlegroups.com

Create a log recorder for the loggers:

  • hudson.plugins.throttleconcurrents
  • hudson.model.Queue
  • hudson.model.queue.QueueSorter

and see whether anything interesting is being logged

Rocha@Stratovan.com (JIRA)

unread,
Apr 7, 2020, 12:25:04 PM4/7/20
to jenkinsc...@googlegroups.com

Rocha@Stratovan.com (JIRA)

unread,
Apr 7, 2020, 12:26:03 PM4/7/20
to jenkinsc...@googlegroups.com
John Rocha commented on Bug JENKINS-61779
 
Re: Regression: Job stuck in queue waiting forever after upgrade

Having the same problem. I update about once a month and after updating from 2.204 to 2.222 I see the same issue, but I'm on Windows systems.

grayaii@gmail.com (JIRA)

unread,
Apr 7, 2020, 8:26:02 PM4/7/20
to jenkinsc...@googlegroups.com

grayaii@gmail.com (JIRA)

unread,
Apr 7, 2020, 8:27:03 PM4/7/20
to jenkinsc...@googlegroups.com
Alex Gray commented on Bug JENKINS-61779
 
Re: Regression: Job stuck in queue waiting forever after upgrade

I have the same issue. I'm on 2.222.1. It ONLY happened when I upgraded to 2.222.1 from 2.204.1
I have a label_expression on a matrix project set to "master_node".
When I run it, it says "(pending—Waiting for next available executor on ‘master’)"
Note that it says "master", not "master_node". I have no idea why it's looking for something called "master", not "master_node".

I'm attaching a gif of what I see.
Here are the logs that Daniel Beck requested too:

Apr 08, 2020 12:16:14 AM FINEST hudson.model.Queue
Queue.Snapshot{waitingList=[];blockedProjects=[hudson.model.Queue$BlockedItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152871];buildables=[hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@598a8bc[util-check-for-container-spread]:152734, hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@4fe1c89a[util-generate-datadog-monitors]:152745];pendings=[]} → Queue.Snapshot{waitingList=[];blockedProjects=[hudson.model.Queue$BlockedItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152871];buildables=[hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@598a8bc[util-check-for-container-spread]:152734, hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@4fe1c89a[util-generate-datadog-monitors]:152745];pendings=[]}; leftItems={152845=hudson.model.Queue$LeftItem:hudson.matrix.MatrixProject@33385024[util-check-for-low-ips]:152845, 152866=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@2e88aabc[util-check-hung-ecs-tasks-prod-ca]:152866, 152865=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@1c17e4db[util-check-hung-ecs-tasks-prod]:152865, 152819=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152819, 152868=hudson.model.Queue$LeftItem:ExecutorStepExecution.PlaceholderTask{runId=util-alert-jenkins-vs-jumpcloud-users#4201,label=,context=CpsStepContext[4:node]:Owner[util-alert-jenkins-vs-jumpcloud-users/4201:util-alert-jenkins-vs-jumpcloud-users #4201],cookie=f8461e2f-66d2-48fb-9acc-79bb1b98a0ad,auth=null}:152868, 152867=hudson.model.Queue$LeftItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@13b79664[util-alert-jenkins-vs-jumpcloud-users]:152867, 152870=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@188db571[util-check-hung-ecs-tasks-stage]:152870, 152872=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@1ae51106[databases-without-termination-protection]:152872, 152869=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@60957b04[util-check-asg-thrashing]:152869}

grayaii@gmail.com (JIRA)

unread,
Apr 9, 2020, 8:38:03 AM4/9/20
to jenkinsc...@googlegroups.com

One more piece of information:

If you configure your job, and select "Restrict where this project can be run" and enter "master_node" as the node label, you'll get "Waiting for next available executor on 'master'".  NOT "master_node".  It's like it's tripping up on the underscore.  If I change that to "masternode", then it correctly says "Waiting for next available executor on 'masternodde'" (but it still doesn't work, even if I have a node with that label.)

Hope this bit of information helps!

acampeau@hotmail.com (JIRA)

unread,
May 3, 2020, 8:31:05 PM5/3/20
to jenkinsc...@googlegroups.com

Ran into the same problem. Everything is fine using version 2.204.6 but the problem shows up with version 2.205. So as previously suggested, https://github.com/jenkinsci/jenkins/pull/3983 seems the likely cause.

After some investigation it seems that the cause of the problem is the use of a label used to refer to the master node in jobs (for node restriction purposes). For instance, we have the "dispatch" label configured for the "master" node and use it in multi-configuration jobs which are the ones that get stuck.

If I modify a stuck job to use "master" instead of the "DISPATCH" label, the job gets triggered as before. Same thing if I just configure the job to no longer have node restrictions.

dbeck@cloudbees.com (JIRA)

unread,
May 3, 2020, 8:47:02 PM5/3/20
to jenkinsc...@googlegroups.com

I tried to reproduce this issue, but failed to do so.

If any of you could try to figure out instructions how to reproduce this problem from scratch, please provide detailed and complete instructions how to do that.

dbeck@cloudbees.com (JIRA)

unread,
May 7, 2020, 4:10:03 PM5/7/20
to jenkinsc...@googlegroups.com
Daniel Beck updated an issue
 
Change By: Daniel Beck
Labels: lts-candidate regression throttle

acampeau@hotmail.com (JIRA)

unread,
May 7, 2020, 6:46:05 PM5/7/20
to jenkinsc...@googlegroups.com
Alain Campeau commented on Bug JENKINS-61779
 
Re: Regression: Job stuck in queue waiting forever after upgrade

I've managed to reproduce from scratch on a newly set up Jenkins server (Windows) configured with a single node/agent (Windows). This way I've stripped off a lot of things from our production setup as to eliminate as many possible causes for this.

Here are the minimum steps I needed to repro:

  • Install latest Jenkins LTS release (2.222.3) on a Windows machine with the default set of plugins
  • Configure the master node with a label such as DISPATCH
  • Add a new node and configure it as a new agent. Due to limited resources I configured the new node/agent to run on the same Windows machine. Configure it with its own label such as WINDOWS and make sure "Usage" configuration is set to "Only build jobs with label expressions matching the node"
  •  Created a new Multi-configuration project job:
    • Configure this job's "Restrict where this project can run" setting so its "Label Expression" value is the one specified for master, so DISPATCH
    • Configure this job's "Configuration Matrix" by adding:
      • a "User-defined matrix" axis with a "Name" of "TARGET" and "Values" of "XboxOne PS4 Switch" (any strings to mimic building for various platforms)
      • a "Slaves" axis with a "Name" of "TARGET_POOL" and make sure to check the "WINDOWS" checkbox - Configure this job's "Build" section by adding a dummy "Execute Windows batch command" whose content is simply an "@echo Hello world!"

If I launch this job using Jenkins 2.222.3, 2.205 or anything in between, the job is stuck waiting for an executor on master when there are 2 available and the agent using the WINDOWS label is free with at least a single executor.

If I launch this job using Jenkins 2.204.6 or earlier, the job successfully launches and sequentially runs all three XboxOne, PS4 and Switch configurations on the sole agent using the WINDOWS label while the job itself "runs" on the master node (even though all it does is dispatch really).

On our production server we use the "Dynamic Axis" plugin to dynamically build an axis of all platforms to build and have multiple Windows, Linux and Mac build machines using an OS-specific labels. But for the sake of keeping these repro steps simple, I've dropped all but one OS and removed the "Dynamic Axis" plugin usage. It doesn't logically make much sense but shows the behavior difference starting with the Jenkins 2.205 release.

dbeck@cloudbees.com (JIRA)

unread,
May 8, 2020, 5:09:04 AM5/8/20
to jenkinsc...@googlegroups.com

Alain Campeau Thanks for these steps, I'll try to reproduce them when I have some time.

About

  • Configure this job's "Restrict where this project can run" setting so its "Label Expression" value is the one specified for master, so DISPATCH

What happens when you don't check that box, or specify "master" here? Would that be a viable workaround for this problem, and if not, why not?

Reply all
Reply to author
Forward
0 new messages