[JIRA] [core] (JENKINS-33219) Job.updateNextBuildNumber can cause deadlock

10 views
Skip to first unread message

regs@akom.net (JIRA)

unread,
Feb 29, 2016, 11:42:02 AM2/29/16
to jenkinsc...@googlegroups.com
Alexander Komarov created an issue
 
Jenkins / Bug JENKINS-33219
Job.updateNextBuildNumber can cause deadlock
Issue Type: Bug Bug
Assignee: Unassigned
Attachments: jenkins-deadlock-threaddump.txt
Components: core
Created: 29/Feb/16 4:41 PM
Labels: jenkins
Priority: Minor Minor
Reporter: Alexander Komarov

The attached thread dump shows two deadlocked threads:

"jenkins.util.Timer 7" daemon prio=5 BLOCKED
"Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED)

Jenkins 1.650

This happens as follows:

  1. I use the Job DSL plugin to create a bunch of jobs. Many have SCM Poll trigger enabled, causing them to run instantly as soon as they are created (possibly also when updated, looking at the stacktrace).
  2. As part of job creation, Job.updateNextBuildNumber is called (Next Build Number plugin integrated with Job DSL)

The two events above create deadlock.

The first one locks AbstractLazyLoadRunMap.getByNumber(). The synchronized block in this method is new as of *1.646 *(https://github.com/jenkinsci/jenkins/commit/d5167025a204750633c931ea8c1fff8d7561ab9c#diff-383116e240993025e5b727359e61db09)

The second one locks Job, aka AbstractProject.

Then the two attempt to acquire the other lock and deadlock. Here are the relevant parts for convenience:

"Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED
jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:356)
jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:274)
jenkins.model.lazy.LazyBuildMixIn.getLastBuild(LazyBuildMixIn.java:238)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:993)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:144)
hudson.model.Job.updateNextBuildNumber(Job.java:422)
org.jvnet.hudson.plugins.nextbuildnumber.JobDslExtension.notifyItemUpdated(JobDslExtension.java:30)

"jenkins.util.Timer 7" daemon prio=5 BLOCKED
hudson.model.AbstractProject.save(AbstractProject.java:305)
hudson.model.Job.addProperty(Job.java:523)
hudson.model.AbstractProject.addProperty(AbstractProject.java:785)
hudson.plugins.disk_usage.DiskUsageUtil.addProperty(DiskUsageUtil.java:58)
hudson.plugins.disk_usage.BuildDiskUsageAction.<init>(BuildDiskUsageAction.java:38)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:31)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:21)
hudson.model.Actionable.createFor(Actionable.java:107)
hudson.model.Actionable.getAllActions(Actionable.java:98)
hudson.model.Run.onLoad(Run.java:343)
hudson.model.RunMap.retrieve(RunMap.java:224)
hudson.model.RunMap.retrieve(RunMap.java:56)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:367)
jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:274)
jenkins.model.lazy.LazyBuildMixIn.getLastBuild(LazyBuildMixIn.java:238)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:993)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:144)
hudson.views.AbstractBuildTrendFilter.matches(AbstractBuildTrendFilter.java:71)
hudson.views.AbstractIncludeExcludeJobFilter.doFilter(AbstractIncludeExcludeJobFilter.java:68)
hudson.views.AbstractIncludeExcludeJobFilter.filter(AbstractIncludeExcludeJobFilter.java:57)
hudson.model.ListView.getItems(ListView.java:195)
hudson.model.ListView.getItems(ListView.java:67)
jenkins.advancedqueue.jobinclusion.strategy.ViewBasedJobInclusionStrategy.isJobInView(ViewBasedJobInclusionStrategy.java:182)
jenkins.advancedqueue.jobinclusion.strategy.ViewBasedJobInclusionStrategy.contains(ViewBasedJobInclusionStrategy.java:149)
jenkins.advancedqueue.PriorityConfiguration.getJobGroup(PriorityConfiguration.java:241)
jenkins.advancedqueue.PriorityConfiguration.getPriorityInternal(PriorityConfiguration.java:225)
jenkins.advancedqueue.PriorityConfiguration.getPriority(PriorityConfiguration.java:203)
jenkins.advancedqueue.sorter.AdvancedQueueSorter.onNewItem(AdvancedQueueSorter.java:136)
jenkins.advancedqueue.sorter.AdvancedQueueSorterQueueListener.onEnterWaiting(AdvancedQueueSorterQueueListener.java:46)
hudson.model.Queue$WaitingItem.enter(Queue.java:2348)
hudson.model.Queue.scheduleInternal(Queue.java:599)
hudson.model.Queue.schedule2(Queue.java:555)
jenkins.model.ParameterizedJobMixIn.scheduleBuild2(ParameterizedJobMixIn.java:138)
jenkins.model.ParameterizedJobMixIn.scheduleBuild(ParameterizedJobMixIn.java:94)
hudson.model.AbstractProject.scheduleBuild(AbstractProject.java:836)

My Job DSL setup is rather complicated and I have not been able to extract a simple test for this - if I do, I'll attach it.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

regs@akom.net (JIRA)

unread,
Feb 29, 2016, 12:00:01 PM2/29/16
to jenkinsc...@googlegroups.com
Alexander Komarov updated an issue
Change By: Alexander Komarov
The attached thread dump shows two deadlocked threads:

"jenkins.util.Timer [#7]" daemon prio=5 BLOCKED 

"Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED)

Jenkins 1.650

This happens as follows:

# I use the *Job DSL plugin* to create a bunch of jobs.  Many have SCM Poll trigger enabled, causing them to run instantly as soon as they are created (possibly also when updated, looking at the stacktrace).
# As part of job creation, *Job.updateNextBuildNumber* is called (Next Build Number plugin integrated with Job DSL)


The two events above create deadlock.  

The first one locks *AbstractLazyLoadRunMap.getByNumber()*.  The synchronized block in this method is new as of *1.646 *(https://github.com/jenkinsci/jenkins/commit/d5167025a204750633c931ea8c1fff8d7561ab9c#diff-383116e240993025e5b727359e61db09)


The second one locks Job, aka AbstractProject.   

Then the two attempt to acquire the other lock and deadlock. Here are the relevant parts for convenience:
{quote}

"Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED
jenkins.model.lazy.AbstractLazyLoadRunMap.*getByNumber*(AbstractLazyLoadRunMap.java:356)

jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:274)
jenkins.model.lazy.LazyBuildMixIn.getLastBuild(LazyBuildMixIn.java:238)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:993)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:144)
hudson.model. * Job. updateNextBuildNumber updateNextBuildNumbe*r (Job.java:422)
org.jvnet.hudson.plugins.nextbuildnumber.JobDslExtension.notifyItemUpdated(JobDslExtension.java:30)
{quote}

{quote}
"jenkins.util.Timer [#7]" daemon prio=5 BLOCKED
hudson.model.*AbstractProject*.save(AbstractProject.java:305)

hudson.model.Job.addProperty(Job.java:523)
hudson.model.AbstractProject.addProperty(AbstractProject.java:785)
hudson.plugins.disk_usage.DiskUsageUtil.addProperty(DiskUsageUtil.java:58)
hudson.plugins.disk_usage.BuildDiskUsageAction.<init>(BuildDiskUsageAction.java:38)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:31)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:21)
hudson.model.Actionable.createFor(Actionable.java:107)
hudson.model.Actionable.getAllActions(Actionable.java:98)
hudson.model.Run.onLoad(Run.java:343)
hudson.model.RunMap.retrieve(RunMap.java:224)
hudson.model.RunMap.retrieve(RunMap.java:56)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
jenkins.model.lazy.AbstractLazyLoadRunMap.*getByNumber*(AbstractLazyLoadRunMap.java:367)

jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:274)
jenkins.model.lazy.LazyBuildMixIn.getLastBuild(LazyBuildMixIn.java:238)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:993)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:144)
hudson.views.AbstractBuildTrendFilter.matches(AbstractBuildTrendFilter.java:71)
hudson.views.AbstractIncludeExcludeJobFilter.doFilter(AbstractIncludeExcludeJobFilter.java:68)
hudson.views.AbstractIncludeExcludeJobFilter.filter(AbstractIncludeExcludeJobFilter.java:57)
hudson.model.ListView.getItems(ListView.java:195)
hudson.model.ListView.getItems(ListView.java:67)
jenkins.advancedqueue.jobinclusion.strategy.ViewBasedJobInclusionStrategy.isJobInView(ViewBasedJobInclusionStrategy.java:182)
jenkins.advancedqueue.jobinclusion.strategy.ViewBasedJobInclusionStrategy.contains(ViewBasedJobInclusionStrategy.java:149)
jenkins.advancedqueue.PriorityConfiguration.getJobGroup(PriorityConfiguration.java:241)
jenkins.advancedqueue.PriorityConfiguration.getPriorityInternal(PriorityConfiguration.java:225)
jenkins.advancedqueue.PriorityConfiguration.getPriority(PriorityConfiguration.java:203)
jenkins.advancedqueue.sorter.AdvancedQueueSorter.onNewItem(AdvancedQueueSorter.java:136)
jenkins.advancedqueue.sorter.AdvancedQueueSorterQueueListener.onEnterWaiting(AdvancedQueueSorterQueueListener.java:46)
hudson.model.Queue$WaitingItem.enter(Queue.java:2348)
hudson.model.Queue.scheduleInternal(Queue.java:599)
hudson.model.Queue.schedule2(Queue.java:555)
jenkins.model.ParameterizedJobMixIn.scheduleBuild2(ParameterizedJobMixIn.java:138)
jenkins.model.ParameterizedJobMixIn.scheduleBuild(ParameterizedJobMixIn.java:94)
hudson.model.AbstractProject.scheduleBuild(AbstractProject.java:836)
{quote}


My Job DSL setup is rather complicated and I have not been able to extract a simple test for this - if I do, I'll attach it.

regs@akom.net (JIRA)

unread,
Feb 29, 2016, 12:01:01 PM2/29/16
to jenkinsc...@googlegroups.com
hudson.model.*Job. updateNextBuildNumbe updateNextBuildNumber * r (Job.java:422)

regs@akom.net (JIRA)

unread,
Feb 29, 2016, 12:05:02 PM2/29/16
to jenkinsc...@googlegroups.com
Alexander Komarov updated an issue
The attached thread dump shows two deadlocked threads:

"jenkins.util.Timer [#7]" daemon prio=5 BLOCKED 
"Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED)

Jenkins 1.650

This happens as follows:

# I use the *Job DSL plugin* to create a bunch of jobs.  Many have SCM Poll trigger enabled, causing them to run instantly as soon as they are created (possibly also when updated, looking at the stacktrace).
# As part of job creation, *Job.updateNextBuildNumber* is called (Next Build Number plugin integrated with Job DSL)

The two events above create deadlock.  

The first one locks * Job* since updateNextBuildNumber() is synchronized.  Then it calls * AbstractLazyLoadRunMap.getByNumber()*.  The synchronized block in this method is new as of *1.646 *(https://github.com/jenkinsci/jenkins/commit/d5167025a204750633c931ea8c1fff8d7561ab9c#diff-383116e240993025e5b727359e61db09)

The second one
 locks Job  calls *AbstractLazyLoadRunMap.getByNumber()* first ,  aka  which causes a call to  AbstractProject.    

Then the two attempt to acquire the other lock and
save(), but Job is an AbstractProject... so,  deadlock.

 Here are the relevant parts for convenience , bold elements are causing deadlock :

{quote}
"Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED
jenkins.model.lazy. * AbstractLazyLoadRunMap. * getByNumber*(AbstractLazyLoadRunMap.java:356)

jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:274)
jenkins.model.lazy.LazyBuildMixIn.getLastBuild(LazyBuildMixIn.java:238)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:993)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:144)
hudson.model.*Job.updateNextBuildNumber*(Job.java:422)

org.jvnet.hudson.plugins.nextbuildnumber.JobDslExtension.notifyItemUpdated(JobDslExtension.java:30)
{quote}

{quote}
"jenkins.util.Timer [#7]" daemon prio=5 BLOCKED
hudson.model.*AbstractProject * .save * (AbstractProject.java:305)

hudson.model.Job.addProperty(Job.java:523)
hudson.model.AbstractProject.addProperty(AbstractProject.java:785)
hudson.plugins.disk_usage.DiskUsageUtil.addProperty(DiskUsageUtil.java:58)
hudson.plugins.disk_usage.BuildDiskUsageAction.<init>(BuildDiskUsageAction.java:38)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:31)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:21)
hudson.model.Actionable.createFor(Actionable.java:107)
hudson.model.Actionable.getAllActions(Actionable.java:98)
hudson.model.Run.onLoad(Run.java:343)
hudson.model.RunMap.retrieve(RunMap.java:224)
hudson.model.RunMap.retrieve(RunMap.java:56)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
jenkins.model.lazy. * AbstractLazyLoadRunMap. * getByNumber*(AbstractLazyLoadRunMap.java:367)

regs@akom.net (JIRA)

unread,
Feb 29, 2016, 12:08:01 PM2/29/16
to jenkinsc...@googlegroups.com
Alexander Komarov updated an issue
The attached thread dump shows two deadlocked threads:

* "jenkins.util.Timer [#7]" daemon prio=5 BLOCKED 
* "Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED)

* Jenkins 1.650 *

This happens as follows:

# I use the *Job DSL plugin* to create a bunch of jobs.  Many have SCM Poll trigger enabled, causing them to run instantly as soon as they are created (possibly also when updated, looking at the stacktrace).
# As part of job creation, *Job.updateNextBuildNumber* is called (Next Build Number plugin integrated with Job DSL)

The two events above create deadlock.  

The first one locks *Job* since updateNextBuildNumber() is synchronized.  Then it calls *AbstractLazyLoadRunMap.getByNumber()*.  The synchronized block in this method is new as of *1.646 *(https://github.com/jenkinsci/jenkins/commit/d5167025a204750633c931ea8c1fff8d7561ab9c#diff-383116e240993025e5b727359e61db09)

The second one calls *AbstractLazyLoadRunMap.getByNumber()* first, which causes a call to
 *  AbstractProject.save() * , but Job is an AbstractProject... so, deadlock.

Here are the relevant parts for convenience, bold elements are causing deadlock:

{quote}
"Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED
jenkins.model.lazy.*AbstractLazyLoadRunMap.getByNumber*(AbstractLazyLoadRunMap.java:356)

jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:274)
jenkins.model.lazy.LazyBuildMixIn.getLastBuild(LazyBuildMixIn.java:238)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:993)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:144)
hudson.model.*Job.updateNextBuildNumber*(Job.java:422)
org.jvnet.hudson.plugins.nextbuildnumber.JobDslExtension.notifyItemUpdated(JobDslExtension.java:30)
{quote}

{quote}
"jenkins.util.Timer [#7]" daemon prio=5 BLOCKED
hudson.model.*AbstractProject.save*(AbstractProject.java:305)

hudson.model.Job.addProperty(Job.java:523)
hudson.model.AbstractProject.addProperty(AbstractProject.java:785)
hudson.plugins.disk_usage.DiskUsageUtil.addProperty(DiskUsageUtil.java:58)
hudson.plugins.disk_usage.BuildDiskUsageAction.<init>(BuildDiskUsageAction.java:38)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:31)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:21)
hudson.model.Actionable.createFor(Actionable.java:107)
hudson.model.Actionable.getAllActions(Actionable.java:98)
hudson.model.Run.onLoad(Run.java:343)
hudson.model.RunMap.retrieve(RunMap.java:224)
hudson.model.RunMap.retrieve(RunMap.java:56)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
jenkins.model.lazy.*AbstractLazyLoadRunMap.getByNumber*(AbstractLazyLoadRunMap.java:367)

regs@akom.net (JIRA)

unread,
Feb 29, 2016, 12:43:01 PM2/29/16
to jenkinsc...@googlegroups.com
Alexander Komarov updated an issue
The attached thread dump shows two deadlocked threads:

* "jenkins.util.Timer [#7]" daemon prio=5 BLOCKED 
* "Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED)

*Jenkins 1.650*

This happens as follows:

# I use the *Job DSL plugin* to create a bunch of jobs.  Many have SCM Poll trigger enabled, causing them to run instantly as soon as they are created (possibly also when updated, looking at the stacktrace).
# As part of job creation, *Job.updateNextBuildNumber* is called (Next Build Number plugin integrated with Job DSL)

The two events above create deadlock.  

The first one locks *Job* since updateNextBuildNumber() is synchronized.  Then it calls *AbstractLazyLoadRunMap.getByNumber()*.  The synchronized block in this method is new as of *1.646*(https://github.com/jenkinsci/jenkins/commit/d5167025a204750633c931ea8c1fff8d7561ab9c#diff-383116e240993025e5b727359e61db09)

The second one calls *AbstractLazyLoadRunMap.getByNumber()* first, which causes a call to *AbstractProject.save()*, but Job is an AbstractProject... so, deadlock.



Update: 1.645 also deadlocks.  AbstractLazyLoadRunMap.load() is synchronized.   I am confused about why this started happening once I upgraded from 1.645 but continues to happen once I downgraded back.

regs@akom.net (JIRA)

unread,
Feb 29, 2016, 12:49:01 PM2/29/16
to jenkinsc...@googlegroups.com
Update:  *  1.645 also deadlocks * .  AbstractLazyLoadRunMap.load() is synchronized.   I am confused about why this started happening once I upgraded from 1.645 but continues to happen once I downgraded back.   Maybe the newer version of some other plugin is making this more likely (ie SVN plugin polls more agressively, etc)

ogondza@gmail.com (JIRA)

unread,
Mar 2, 2016, 9:16:05 AM3/2/16
to jenkinsc...@googlegroups.com
Oliver Gondža commented on Bug JENKINS-33219
 
Re: Job.updateNextBuildNumber can cause deadlock

How did you captured the threaddump (it does not contain the monitor info)?

regs@akom.net (JIRA)

unread,
Mar 2, 2016, 9:18:06 AM3/2/16
to jenkinsc...@googlegroups.com

ogondza@gmail.com (JIRA)

unread,
Mar 2, 2016, 9:23:04 AM3/2/16
to jenkinsc...@googlegroups.com

I am not sure about

{kill -3}

but jstack -l is a better option for sure.

ogondza@gmail.com (JIRA)

unread,
Mar 2, 2016, 9:25:02 AM3/2/16
to jenkinsc...@googlegroups.com
Oliver Gondža edited a comment on Bug JENKINS-33219

regs@akom.net (JIRA)

unread,
Mar 2, 2016, 9:27:02 AM3/2/16
to jenkinsc...@googlegroups.com

I'll try that the next time I can screw up the production Jenkins (I'll have to restart it after). I can't reproduce this on any other installation, probably because they are not as busy.

jglick@cloudbees.com (JIRA)

unread,
Mar 18, 2016, 6:56:02 AM3/18/16
to jenkinsc...@googlegroups.com
Jesse Glick updated an issue
Change By: Jesse Glick
Labels: jenkins deadlock regression

jglick@cloudbees.com (JIRA)

unread,
Mar 18, 2016, 7:00:02 AM3/18/16
to jenkinsc...@googlegroups.com
Jesse Glick updated an issue
Change By: Jesse Glick
Priority: Minor Major

jglick@cloudbees.com (JIRA)

unread,
Mar 18, 2016, 7:03:02 AM3/18/16
to jenkinsc...@googlegroups.com
Jesse Glick assigned an issue to Lucie Votypkova
 

As with many threading bugs, it is difficult to assign blame conclusively¹, but the fact that DiskUsageUtil.addProperty is attempting to mutate a Job while inside the constructor of an Action attached to one of its Run is what draws my attention.

¹Software transactional memory is no panacea but it at least helps modular applications in this kind of situation.

Change By: Jesse Glick
Component/s: disk-usage-plugin
Component/s: core
Assignee: Lucie Votypkova

jglick@cloudbees.com (JIRA)

unread,
Mar 18, 2016, 7:07:03 AM3/18/16
to jenkinsc...@googlegroups.com

Whereas getByNumber is newly synchronized, load was before as well, which might explain why you could observe this deadlock in 1.645—though less frequently (only during a cache miss).

mszpak@wp.pl (JIRA)

unread,
May 24, 2016, 7:00:02 AM5/24/16
to jenkinsc...@googlegroups.com
Marcin Zajączkowski updated an issue

Thread dump

Change By: Marcin Zajączkowski
Attachment: jstack-thread-dump-blocked-AbstractLazyLoadRunMap-JENKINS-33219.txt

mszpak@wp.pl (JIRA)

unread,
May 24, 2016, 7:04:03 AM5/24/16
to jenkinsc...@googlegroups.com
I've had similar situation  (Jenkins 1.642.3)  and have thread dump with information of acquired locks (attached).

disk-usage-plugin thread so some "computation" in synchronized section

{code}
        (...)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at jenkins.model.lazy.LazyBuildMixIn.loadBuild(LazyBuildMixIn.java:156)
at jenkins.model.lazy.LazyBuildMixIn$1.create(LazyBuildMixIn.java:133)
at hudson.model.RunMap.retrieve(RunMap.java:223)
at hudson.model.RunMap.retrieve(RunMap.java:56)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:367)
- locked <0x00000005f57899b0> (a hudson.model.RunMap)
at hudson.model.RunMap.getById(RunMap.java:204)
at hudson.model.RunMap.getById(RunMap.java:56)
        (...)
{code}

while other requests are blocked on synchronized section:

{code}
"Handling GET / from 192.168.72.4 : RequestHandlerThread[#1110] View/index.jelly Dashboard/main.jelly LatestBuilds/portlet.jelly LatestBuilds/latestbuilds.jelly" #1979423 daemon prio=5 os_prio=0 tid=0x00007f84b0008000 nid=0x147de waiting for monitor entry [0x00007f85728f1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:356)
{code}

I don't see a classic deadlock there. My suspicion is rather that the reason (in additional to newly added synchronized block to fix JENKINS-22767) was large number of historical builds (7K+). It could just take some time for disk-usage-plugin to calculate required data (which the other several request threads for that project just was waiting to the first one to finish).

However I'm not sure if it is related only to disk-usage-plugin. After its removal (and restart) I observed the same situation with the thread handling side panel request (below). I'm not sure why sidepanel.jelly should also take much time in that place.

{code}
"Handling GET /job/somejob-pr/ from 9.9.9.9 : RequestHandlerThread[#15] Job/index.jelly AbstractProject/sidepanel.jelly HistoryWidget/index.jelly" #97 daemon prio=5 os_prio=0 tid=0x00007f8104008800 nid=0x15ad2 runnable [0x00007f81e28e0000]
   java.lang.Thread.State: RUNNABLE
at java.lang.Object.hashCode(Native Method)
at java.util.HashMap.hash(HashMap.java:338)
at java.util.HashMap.get(HashMap.java:556)
at com.thoughtworks.xstream.mapper.DefaultImplementationsMapper.defaultImplementationOf(DefaultImplementationsMapper.java:69)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.AnnotationMapper.defaultImplementationOf(AnnotationMapper.java:143)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.mapper.AnnotationMapper.defaultImplementationOf(AnnotationMapper.java:143)
at hudson.util.xstream.MapperDelegate.defaultImplementationOf(MapperDelegate.java:59)
at com.thoughtworks.xstream.mapper.MapperWrapper.defaultImplementationOf(MapperWrapper.java:46)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:54)
at hudson.util.RobustReflectionConverter.unmarshalField(RobustReflectionConverter.java:352)
        (...)
at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134)
at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32)
at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1189)
at hudson.util.XStream2.unmarshal(XStream2.java:114)
at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1173)
at hudson.XmlFile.unmarshal(XmlFile.java:163)
at hudson.model.Run.reload(Run.java:326)
at hudson.model.Run.<init>(Run.java:314)
at hudson.model.AbstractBuild.<init>(AbstractBuild.java:175)
at hudson.model.Build.<init>(Build.java:103)
at hudson.model.FreeStyleBuild.<init>(FreeStyleBuild.java:38)
at sun.reflect.GeneratedConstructorAccessor160.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at jenkins.model.lazy.LazyBuildMixIn.loadBuild(LazyBuildMixIn.java:156)
at jenkins.model.lazy.LazyBuildMixIn$1.create(LazyBuildMixIn.java:133)
at hudson.model.RunMap.retrieve(RunMap.java:223)
at hudson.model.RunMap.retrieve(RunMap.java:56)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:367)
- locked <0x00000005eda16310> (a hudson.model.RunMap)
at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
at jenkins.model.lazy.LazyBuildMixIn$RunMixIn.getPreviousBuild(LazyBuildMixIn.java:355)
at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:196)
at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:104)
at hudson.model.RunMap$1.next(RunMap.java:109)
at hudson.model.RunMap$1.next(RunMap.java:98)
at org.apache.commons.collections.IteratorUtils.toList(IteratorUtils.java:849)
at org.apache.commons.collections.IteratorUtils.toList(IteratorUtils.java:825)
at hudson.widgets.BuildHistoryWidget.getHistoryPageFilter(BuildHistoryWidget.java:81)
        (...)
{code}

mszpak@wp.pl (JIRA)

unread,
May 25, 2016, 1:44:00 PM5/25/16
to jenkinsc...@googlegroups.com

I've had similar situation and have thread dump with information of acquired locks (attached).

disk-usage-plugin thread so some "computation" in synchronized section

        (...)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at jenkins.model.lazy.LazyBuildMixIn.loadBuild(LazyBuildMixIn.java:156)
	at jenkins.model.lazy.LazyBuildMixIn$1.create(LazyBuildMixIn.java:133)
	at hudson.model.RunMap.retrieve(RunMap.java:223)
	at hudson.model.RunMap.retrieve(RunMap.java:56)
	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:367)
	- locked <0x00000005f57899b0> (a hudson.model.RunMap)
	at hudson.model.RunMap.getById(RunMap.java:204)
	at hudson.model.RunMap.getById(RunMap.java:56)
        (...)

while other requests are blocked on synchronized section:

"Handling GET / from 192.168.72.4 : RequestHandlerThread[#1110] View/index.jelly Dashboard/main.jelly LatestBuilds/portlet.jelly LatestBuilds/latestbuilds.jelly" #1979423 daemon prio=5 os_prio=0 tid=0x00007f84b0008000 nid=0x147de waiting for monitor entry [0x00007f85728f1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:356)

I don't see a classic deadlock there. My suspicion is rather that the reason (in additional to newly added synchronized block to fix

JENKINS-22767 ) was large number of historical builds (7K+). It could just take some time for disk-usage-plugin to calculate required data (which the other several request threads for that project just was waiting to the first one to finish).

However I'm not sure if it is related only to disk-usage-plugin. After its removal (and restart) I observed the same situation with the thread handling side panel request (below). I'm not sure why sidepanel.jelly should also take much time in that place.

ogondza@gmail.com (JIRA)

unread,
Feb 11, 2020, 8:30:13 AM2/11/20
to jenkinsc...@googlegroups.com
Oliver Gondža assigned an issue to Unassigned
Change By: Oliver Gondža
Assignee: Lucie Votypkova
This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages