[JIRA] [multijob-plugin] (JENKINS-27371) Parent builds sometimes hang on successful child builds of same type

40 views
Skip to first unread message

simon@simonmweber.com (JIRA)

unread,
May 5, 2015, 3:12:02 PM5/5/15
to jenkinsc...@googlegroups.com
Simon Weber commented on Bug JENKINS-27371
 
Re: Parent builds sometimes hang on successful child builds of same type

@mcantin, I just installed your changes from source and unfortunately we're still seeing the problem. I've only seen hangs when one of the child builds failed, in case that's useful.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

mmerritt1010@gmail.com (JIRA)

unread,
Jun 10, 2015, 5:13:02 PM6/10/15
to jenkinsc...@googlegroups.com
Michael Merritt edited a comment on Bug JENKINS-27371
We're also seeing this same issue that Simon is describing and it's causing deploy headaches for us. Frequently having to kill the job and rebuild it and roll the dice to see if it's going to work. Our logs look similar to those mentioned above.

Jenkins 1.607
Multijob 1.16

{code:java}
website-deploy-production-download #10432 main build action completed: SUCCESS
Jun 10, 2015 8:50:19 PM SEVERE hudson.model.Executor finish1
Executor threw an exception
java.util.NoSuchElementException
at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:76)
at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:63)
at java.util.AbstractMap$2$1.next(AbstractMap.java:385)
at hudson.util.RunList.subList(RunList.java:137)
at hudson.tasks.LogRotator.perform(LogRotator.java:124)
at hudson.model.Job.logRotate(Job.java:465)
at hudson.model.Run.execute(Run.java:1805)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:374)

{code}

mmerritt1010@gmail.com (JIRA)

unread,
Jun 10, 2015, 5:13:02 PM6/10/15
to jenkinsc...@googlegroups.com

We're also seeing this same issue that Simon is describing and it's causing deploy headaches for us. Frequently having to kill the job and rebuild it and roll the dice to see if it's going to work. Our logs look similar to those mentioned above.

Jenkins 1.607
Multijob 1.16

website-deploy-production-download #10432 main build action completed: SUCCESS


Jun 10, 2015 8:50:19 PM SEVERE hudson.model.Executor finish1
Executor threw an exception
java.util.NoSuchElementException
at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:76)
at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:63)
at java.util.AbstractMap$2$1.next(AbstractMap.java:385)
at hudson.util.RunList.subList(RunList.java:137)
at hudson.tasks.LogRotator.perform(LogRotator.java:124)
at hudson.model.Job.logRotate(Job.java:465)
at hudson.model.Run.execute(Run.java:1805)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:374)

mmerritt1010@gmail.com (JIRA)

unread,
Jun 10, 2015, 5:14:01 PM6/10/15
to jenkinsc...@googlegroups.com
Michael Merritt edited a comment on Bug JENKINS-27371
We're also seeing this same issue that Simon is describing and it's causing deploy headaches for us. Frequently having to kill the job and rebuild it and roll the dice to see if it's going to work. Our logs look similar to those mentioned above.  Let me know if you need more info.

Jenkins 1.607
Multijob 1.16

{code:java}
website-deploy-production-download #10432 main build action completed: SUCCESS
Jun 10, 2015 8:50:19 PM SEVERE hudson.model.Executor finish1
Executor threw an exception
java.util.NoSuchElementException
at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:76)
at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:63)
at java.util.AbstractMap$2$1.next(AbstractMap.java:385)
at hudson.util.RunList.subList(RunList.java:137)
at hudson.tasks.LogRotator.perform(LogRotator.java:124)
at hudson.model.Job.logRotate(Job.java:465)
at hudson.model.Run.execute(Run.java:1805)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:374)
{code}

harcher81@gmail.com (JIRA)

unread,
Jun 18, 2015, 4:10:02 PM6/18/15
to jenkinsc...@googlegroups.com

mmerritt1010@gmail.com (JIRA)

unread,
Jun 18, 2015, 4:21:01 PM6/18/15
to jenkinsc...@googlegroups.com

simon@simonmweber.com (JIRA)

unread,
Jun 19, 2015, 5:47:01 PM6/19/15
to jenkinsc...@googlegroups.com

Mathieu Cantin, that fix didn't work for us: it made all builds hang.

I found this error in the logs:

Jun 19, 2015 9:34:03 PM WARNING hudson.triggers.Trigger checkTriggers
org.jenkinsci.plugins.ghprb.GhprbTrigger.run() failed for hudson.model.FreeStyleProject@b347b18[venmo_platform_auto_pr]
java.lang.NullPointerException
	at org.jenkinsci.plugins.ghprb.GhprbTrigger.run(GhprbTrigger.java:155)
	at hudson.triggers.Trigger.checkTriggers(Trigger.java:266)
	at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)

harcher81@gmail.com (JIRA)

unread,
Jun 19, 2015, 7:58:01 PM6/19/15
to jenkinsc...@googlegroups.com
Mathieu Cantin edited a comment on Bug JENKINS-27371
Simon Weber [~simonmweber]
I'm sorry. Did you update the "GitHub pull request builder plugin" (https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin) recently? Because, the error was on this plugin with a "Free Style Project". Which version do you have?

harcher81@gmail.com (JIRA)

unread,
Jun 19, 2015, 7:58:01 PM6/19/15
to jenkinsc...@googlegroups.com

Simon Weber


I'm sorry. Did you update the "GitHub pull request builder plugin" (https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin) recently? Because, the error was on this plugin with a "Free Style Project". Which version do you have?

simon@simonmweber.com (JIRA)

unread,
Jun 19, 2015, 8:00:01 PM6/19/15
to jenkinsc...@googlegroups.com

Whoops! I pasted the wrong thing. Let me do some debugging and get back to you.

simon@simonmweber.com (JIRA)

unread,
Jun 19, 2015, 10:04:03 PM6/19/15
to jenkinsc...@googlegroups.com

Ok, I had to fix a few unrelated problems, haha. The hanging I was seeing before was caused by Jenkins taking a long time to discard old builds. I'm not sure why that was triggered by updating the plugin and restarting – maybe I just got unlucky?

Anyway, after fixing all that other stuff and upgrading, things look good! I'll know for sure next week once we have some real load on it, but this looks promising =)

simon@simonmweber.com (JIRA)

unread,
Jun 22, 2015, 11:39:04 AM6/22/15
to jenkinsc...@googlegroups.com

Mathieu Cantin bummer; we just had a build hang.

Here are the logs, my comments <"like this"> so jira highlights them.

From the build itself:

Started by upstream project "platform_deploy_listener" build number 219
<"snip">
[MultiJob] Starting job platform_parallel.
[MultiJob] Starting job platform_parallel.
[MultiJob] Starting job venmo_platform_external_integration.
[MultiJob] Starting job venmo_platform_external_integration.
[MultiJob] Starting job scope_frontend.
[MultiJob] Finished Build : #96 - deploy of Job : scope_frontend with status : SUCCESS
[MultiJob] Finished Build : #6837 of Job : platform_parallel with status : SUCCESS
[MultiJob] Finished Build : #6836 of Job : platform_parallel with status : SUCCESS
[MultiJob] Finished Build : #1248 - deploy of Job : venmo_platform_external_integration with status : SUCCESS
Build timed out (after 20 minutes). Marking the build as aborted.
[MultiJob] Aborting all subjobs.
[MultiJob] Aborting platform_parallel.
[MultiJob] Aborting platform_parallel.
[MultiJob] Aborting venmo_platform_external_integration.
[MultiJob] Aborting venmo_platform_external_integration.
[MultiJob] Aborting scope_frontend.
Build was aborted
<"snip">
[MultiJob] Finished Build : #1247 - deploy of Job : venmo_platform_external_integration with status : ABORTED <"here's the build that hung">

From Jenkins:

Jun 22, 2015 2:58:55 PM INFO
venmo_platform_test_and_deploy - #863 Started by ...
Jun 22, 2015 3:09:10 PM INFO hudson.model.Run execute
venmo_platform_external_integration #1247 main build action completed: SUCCESS <"here we see it completed successfully">
Jun 22, 2015 3:09:11 PM INFO hudson.model.Run execute
venmo_platform_external_integration #1248 main build action completed: SUCCESS
Jun 22, 2015 3:09:20 PM SEVERE hudson.model.Executor finish1
Executor threw an exception
java.util.NoSuchElementException <"perhaps this is related? We keep the most recent 1000 builds.">
	at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:76)
	at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:63)
	at java.util.AbstractMap$2$1.next(AbstractMap.java:385)
	at hudson.util.RunList.subList(RunList.java:139)
	at hudson.tasks.LogRotator.perform(LogRotator.java:125)
	at hudson.model.Job.logRotate(Job.java:467)
	at hudson.model.Run.execute(Run.java:1808)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:374)

<"these look to be expected">
Jun 22, 2015 3:18:48 PM INFO hudson.model.Run execute
platform_deploy_listener #219 aborted
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:503)
	at hudson.remoting.AsyncFutureImpl.get(AsyncFutureImpl.java:73)
	at hudson.plugins.parameterizedtrigger.TriggerBuilder.perform(TriggerBuilder.java:135)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
	at hudson.model.Build$BuildExecution.build(Build.java:205)
	at hudson.model.Build$BuildExecution.doRun(Build.java:162)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
	at hudson.model.Run.execute(Run.java:1744)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:374)


Jun 22, 2015 3:18:57 PM INFO hudson.model.Run execute
venmo_platform_test_and_deploy #863 aborted
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
	at com.tikal.jenkins.plugins.multijob.MultiJobBuilder.perform(MultiJobBuilder.java:220)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
	at hudson.model.Build$BuildExecution.build(Build.java:205)
	at hudson.model.Build$BuildExecution.doRun(Build.java:162)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
	at com.tikal.jenkins.plugins.multijob.MultiJobBuild$MultiJobRunnerImpl.run(MultiJobBuild.java:137)
	at hudson.model.Run.execute(Run.java:1744)
	at com.tikal.jenkins.plugins.multijob.MultiJobBuild.run(MultiJobBuild.java:76)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:374)

harcher81@gmail.com (JIRA)

unread,
Jul 13, 2015, 5:46:01 PM7/13/15
to jenkinsc...@googlegroups.com

It's difficult to debug. I added debug information in this build : https://jenkins.ci.cloudbees.com/job/plugins/job/tikal-multijob-plugin/166/ (lines started with [MultiJob][Debug]). This could help to debug.

simon@simonmweber.com (JIRA)

unread,
Jul 13, 2015, 5:54:05 PM7/13/15
to jenkinsc...@googlegroups.com

Sounds good. I'll update to that revision and hopefully have an example in the next few days.

simon@simonmweber.com (JIRA)

unread,
Jul 22, 2015, 10:32:01 AM7/22/15
to jenkinsc...@googlegroups.com

So I've got my first example, but it's actually not for a multijob build! In this case, the parent build used https://wiki.jenkins-ci.org/display/JENKINS/Parameterized+Trigger+Plugin to trigger and block on the downstream build (which itself was a multijob build). The parent build was timed out after 20 minutes while the child was successful.

So, perhaps the problem is actually with that plugin instead? My understanding is that it's a dependency of the multijob plugin. My installed version was 2.25 (I've since upgraded to 2.27 in case that fixes it).

Here are the logs, but they're not too interesting:

parent build:

Waiting for the completion of platform_parallel
Build timed out (after 20 minutes). Marking the build as aborted.
Build was aborted

child (multijob) build:

[MultiJob] Starting job venmo_platform.
[MultiJob] Starting job venmo_platform.
[MultiJob] Starting job venmo_platform.
[MultiJob][Debug] Create executor service.
[MultiJob][Debug] Create task venmo_platform
[MultiJob][Debug] Create task venmo_platform
[MultiJob][Debug] Create task venmo_platform
[MultiJob][Debug] Shutdown.
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] Timeoutexception, continue
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] Timeoutexception, continue
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] Timeoutexception, continue
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] Timeoutexception, continue
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] Timeoutexception, continue
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] Timeoutexception, continue
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] future is started
[MultiJob][Debug] future is started
[MultiJob][Debug] future is started
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] future is started
[MultiJob][Debug] Thy to start future for venmo_platform
[MultiJob][Debug] future is started
<"snip; this repeats for a long time">
[MultiJob][Debug] Result value is SUCCESS
[MultiJob] Finished Build : #29045 -  of Job : venmo_platform with status : SUCCESS
[MultiJob][Debug]  Finish execution of SUCCESS
[MultiJob][Debug] Check phase termination.
<"one of these for each subbuild">
[MultiJob][Debug] Continuation calculation.
Finished: SUCCESS

simon@simonmweber.com (JIRA)

unread,
Jul 22, 2015, 10:56:01 AM7/22/15
to jenkinsc...@googlegroups.com

I just triggered an identical hang with the upgraded plugin, so it doesn't look like it's fixed.

elgalu3@gmail.com (JIRA)

unread,
Aug 18, 2015, 2:33:02 PM8/18/15
to jenkinsc...@googlegroups.com

harcher81@gmail.com (JIRA)

unread,
Oct 5, 2015, 4:42:03 PM10/5/15
to jenkinsc...@googlegroups.com

elgalu3@gmail.com (JIRA)

unread,
Oct 6, 2015, 3:42:03 AM10/6/15
to jenkinsc...@googlegroups.com

simon@simonmweber.com (JIRA)

unread,
Oct 8, 2015, 1:53:02 PM10/8/15
to jenkinsc...@googlegroups.com

We haven't seen this issue in a few months. We're now running Jenkins 1.628, trigger plugin 2.27, and multijob from source at https://github.com/jenkinsci/tikal-multijob-plugin/pull/65.

abridges@blackberry.com (JIRA)

unread,
Jan 25, 2016, 7:10:02 PM1/25/16
to jenkinsc...@googlegroups.com

We've seen this a few times, most recently in a MultiJob run calling a maven job, but previously in a Freestyle job calling maven jobs.
We are running 1.609.3.1, CBE.
I've opened a ticket with CB Support and cited this defect.

abridges@blackberry.com (JIRA)

unread,
Jan 25, 2016, 7:14:04 PM1/25/16
to jenkinsc...@googlegroups.com

Does the pull request conclusively correct the issue ? If so, maybe we can get it rolled into a new plugin release ?

simon@simonmweber.com (JIRA)

unread,
Jan 27, 2016, 11:26:04 PM1/27/16
to jenkinsc...@googlegroups.com

Tony Bridges I don't think this was solved for us until I upgraded Jenkins. I'm not sure what combination of the jenkins, trigger plugin, and multijob plugin upgrades is necessary to solve it, but the environment I described earlier has been rock solid for months now.

ifernandezcalvo@cloudbees.com (JIRA)

unread,
Feb 15, 2016, 11:10:01 AM2/15/16
to jenkinsc...@googlegroups.com

How to reproduce it:

  • Configure a FreeStyle Project - freeStyle01
  • Configure a Maven Project - maven01
  • Configure a Multijob Project with 30 subjobs (enough to have time to stop slave agent) - Multijob01
  • Configure one slave agent with 1 executor - slave01
  • Run Multijob01
  • Go to Manage Jenkins/Manage Nodes and put slave01 offline or kill "java -jar slave.jar" on slave agent
  • Multijob01 hang waiting for finish task

ataylor@cloudbees.com (JIRA)

unread,
Dec 21, 2016, 12:27:27 PM12/21/16
to jenkinsc...@googlegroups.com

So I would think this could happen even with a slight disconnect of the slave per Ivan's last comment.

Is there any more information that can be provided here?

This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

dansirbu101@yahoo.ca (JIRA)

unread,
Jan 11, 2017, 11:16:01 PM1/11/17
to jenkinsc...@googlegroups.com

I believe I have a similar issue.

A log shows :

Polling SCM changes on master
>> Job status: [ECM DB] subjob has no changes since last build.
FATAL: SCM polling aborted
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Unknown Source)
at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:257)
at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:236)
at hudson.model.AbstractProject.pollWithWorkspace(AbstractProject.java:1475)
at hudson.model.AbstractProject._poll(AbstractProject.java:1452)
at hudson.model.AbstractProject.poll(AbstractProject.java:1363)
at com.tikal.jenkins.plugins.multijob.MultiJobBuilder.getScmChange(MultiJobBuilder.java:190)
at com.tikal.jenkins.plugins.multijob.MultiJobBuilder.perform(MultiJobBuilder.java:279)


at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
at hudson.model.Build$BuildExecution.build(Build.java:205)
at hudson.model.Build$BuildExecution.doRun(Build.java:162)

at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
at com.tikal.jenkins.plugins.multijob.MultiJobBuild$MultiJobRunnerImpl.run(MultiJobBuild.java:136)
at hudson.model.Run.execute(Run.java:1729)
at com.tikal.jenkins.plugins.multijob.MultiJobBuild.run(MultiJobBuild.java:73)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:404)
>> Job status: [ECM Setup] subjob has no changes since last build.
>> Job status: [ECM JARs] subjob does not contain lastbuild.
Starting build job ECM JARs.

SCM polling is disabled for all subjobs including the multijob parent itself.

What is interesting is that this happens just when switching from the multijob to subjob.

In my case it is the multijob that check out the source code & the subjobs are inheriting the workspace.

Maybe this info could help. It is 100% reproducible.

dansirbu101@yahoo.ca (JIRA)

unread,
Jan 11, 2017, 11:21:01 PM1/11/17
to jenkinsc...@googlegroups.com

owen@nerdnetworks.org (JIRA)

unread,
Feb 14, 2018, 12:15:03 AM2/14/18
to jenkinsc...@googlegroups.com
Owen Mehegan assigned an issue to Chen Cohen
Change By: Owen Mehegan
Assignee: Chen Cohen
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

yorammi@tikalk.com (JIRA)

unread,
Jun 8, 2018, 12:40:03 PM6/8/18
to jenkinsc...@googlegroups.com
Yoram Michaeli closed an issue as Postponed
 

Closing issue as part of tikal-multijob-plugin issues cleanup.
If still relevant, please open a matching issue in https://github.com/jenkinsci/tikal-multijob-plugin/issues (you can refer to this issue in its description)

Change By: Yoram Michaeli
Status: Open Closed
Resolution: Postponed
Reply all
Reply to author
Forward
0 new messages