[JIRA] [core] (JENKINS-27514) Jenkins leaks thousands of Computer.threadPoolForRemoting threads leading to eventual server OOM

265 views
Skip to first unread message

sagi.sinai-glazer@ericsson.com (JIRA)

unread,
Aug 4, 2015, 1:31:02 PM8/4/15
to jenkinsc...@googlegroups.com
Sagi Sinai-Glazer updated an issue
 
Jenkins / Bug JENKINS-27514
Jenkins leaks thousands of Computer.threadPoolForRemoting threads leading to eventual server OOM
Change By: Sagi Sinai-Glazer
Attachment: support_2015-08-04_14.10.32.zip
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

sagi.sinai-glazer@ericsson.com (JIRA)

unread,
Aug 4, 2015, 1:31:02 PM8/4/15
to jenkinsc...@googlegroups.com
Sagi Sinai-Glazer commented on Bug JENKINS-27514
 
Re: Jenkins leaks thousands of Computer.threadPoolForRemoting threads leading to eventual server OOM

Same here... see attached Support Bundle (support_2015-08-04_14.10.32.zip) for more details.
Also seems to be related to JENKINS-23560 and JENKINS-26769.

stephenconnolly@java.net (JIRA)

unread,
Aug 6, 2015, 5:03:01 AM8/6/15
to jenkinsc...@googlegroups.com

SSHLauncher is supposed to have a separate instance for each and every Node. The Cloud implementations are supposed to be using SSHConnector.launch() to produce the concrete instance for connecting to the specific slave. If Cloud implementations are reusing a single shared instance then those Cloud implementations are broken (aside: all Cloud implementations are broken for other reasons, so there is nothing new in asserting that Cloud implementations are broken)

clark.boylan@gmail.com (JIRA)

unread,
Aug 28, 2015, 2:24:04 PM8/28/15
to jenkinsc...@googlegroups.com

To clarify we are not using any cloud plugins so there is no single shared instance special broken here. We run an external process that boot nodes in clouds then adds them as Jenkins slaves via the Jenkins API. When the job is done running on that slave the external process is notified and it removes the slave from jenkins. As a result we add a remove a large number of slaves but all of this happens via the API talking to ssh slaves plugin. We do this because the cloud plugins do indeed have problems. But this means that the bug has to be in the ssh slaves plugin somewhere.

clark.boylan@gmail.com (JIRA)

unread,
Aug 28, 2015, 2:26:02 PM8/28/15
to jenkinsc...@googlegroups.com
Clark Boylan edited a comment on Bug JENKINS-27514
To clarify we are not using any cloud plugins so there is no single shared instance special broken here. We run an external process that boot nodes in clouds then adds them as Jenkins slaves via the Jenkins API. When the job is done running on that slave the external process is notified and it removes the slave from jenkins. As a result we add  a  and  remove a large number of slaves but all of this happens via the API talking to ssh slaves plugin. We do this because the cloud plugins do indeed have problems. But this means that the bug has to be in the ssh slaves plugin somewhere.

clark.boylan@gmail.com (JIRA)

unread,
Sep 4, 2015, 4:42:02 PM9/4/15
to jenkinsc...@googlegroups.com
Clark Boylan updated an issue
 

Thread dump from Jenkins master upgraded to ssh slaves plugin 1.10 but otherwise the same as the previous thread dump example.

Change By: Clark Boylan
Attachment: 20150904-jenkins03.txt

clark.boylan@gmail.com (JIRA)

unread,
Sep 4, 2015, 4:43:02 PM9/4/15
to jenkinsc...@googlegroups.com
Clark Boylan edited a comment on Bug JENKINS-27514
 
Re: Jenkins leaks thousands of Computer.threadPoolForRemoting threads leading to eventual server OOM
Thread dump from Jenkins master upgraded to ssh slaves plugin 1.10 but otherwise the same as the previous thread dump example.  We thought that 1.10 may have fixed the bug but doesn't appear to have done so. Attaching this thread dump in case the information is sufficiently new and different to be helpful.

kravenscroft@micron.com (JIRA)

unread,
Jan 25, 2016, 5:15:05 PM1/25/16
to jenkinsc...@googlegroups.com

Has any silent progress been made on this issue? I currently have to reboot weekly due to this issue. Currently running Jenkins 1.645 and SSH Plugin 1.10.

stephenconnolly@java.net (JIRA)

unread,
Jan 26, 2016, 7:30:06 AM1/26/16
to jenkinsc...@googlegroups.com

clark.boylan@gmail.com (JIRA)

unread,
Jan 26, 2016, 3:37:02 PM1/26/16
to jenkinsc...@googlegroups.com

kravenscroft@micron.com (JIRA)

unread,
Jan 27, 2016, 10:15:03 AM1/27/16
to jenkinsc...@googlegroups.com

My setup is slightly different from the original description. I do not do any substantial adding/deleting of slaves. I have narrowed my cause down to a substantial number (30+ slaves) being consistently power cycled at a time. I am using Jenkins to perform hardware tests, and one set of tests requires that each slave be rebooted ~200 times. I performed an experiment last night that seems to prevent my issue. My preventative measures are:

1) Switch all slaves to use the Availability setting of "Keep this slave on-line as much as possible, but don't reconnect if temporarily marked offline by the user." This option is added with this plugin: https://github.com/daniel-beck/jenkins-keep-slave-disconnected-plugin

2) Set the script that is power-cycling the slaves to mark the slave as offline using the Jenkins CLI jar file.

For me these steps prevent Jenkins from trying to reconnect to each slave whenever they come online again, so it prevents the SSH plugin in from connecting/disconnecting. I'm not sure this will help in your case of adding/deleting slaves, but I figured I'd throw it out there.

sjursky@gratex.com (JIRA)

unread,
Jun 15, 2016, 6:14:01 AM6/15/16
to jenkinsc...@googlegroups.com
Stanislav Jursky updated an issue
 
Change By: Stanislav Jursky
Attachment: file-leak-detector.log
This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

sjursky@gratex.com (JIRA)

unread,
Jun 15, 2016, 6:14:03 AM6/15/16
to jenkinsc...@googlegroups.com
Stanislav Jursky commented on Bug JENKINS-27514
 
Re: Jenkins leaks thousands of Computer.threadPoolForRemoting threads leading to eventual server OOM

Our setup is also different. No slave restarts, no dynamic adding slaves. Just one slave running maven sonar analysis {{ clean install sonar:sonar }}
File leak detector plugin reports many maven-plugins with nearly same stack trace.

{{ at hudson.remoting.ResourceImageDirect.<init>(ResourceImageDirect.java:29) }}
or
{{ at hudson.remoting.RemoteClassLoader$ClassLoaderProxy.fetch3(RemoteClassLoader.java:810) }}

see attached file-leak-detector.log

sjursky@gratex.com (JIRA)

unread,
Jun 15, 2016, 6:15:01 AM6/15/16
to jenkinsc...@googlegroups.com
Stanislav Jursky edited a comment on Bug JENKINS-27514

dilipm79@gmail.com (JIRA)

unread,
Jun 29, 2016, 4:47:02 PM6/29/16
to jenkinsc...@googlegroups.com

dilipm79@gmail.com (JIRA)

unread,
Jun 29, 2016, 4:47:03 PM6/29/16
to jenkinsc...@googlegroups.com
Dilip M commented on Bug JENKINS-27514
 
Re: Jenkins leaks thousands of Computer.threadPoolForRemoting threads leading to eventual server OOM

Jenkins ver. 1.625.2
SSH Slaves plugin: 1.10
Thread Dump attached: thread-dump.txt
We have hit a similiar issue. We had some 5k+ blocked Computer.threadPoolForRemoting threads.

Computer.threadPoolForRemoting [#175318] - threadId:2655322 (0x28845a) - state:BLOCKED
stackTrace:
- waiting to lock <0x6e0864fe> (a hudson.plugins.sshslaves.SSHLauncher)
owned by Computer.threadPoolForRemoting [#175203] id=2653481
at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1226)
at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:603)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Locked synchronizers: count = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@729cb958

.
.
.

dilipm79@gmail.com (JIRA)

unread,
Jun 29, 2016, 4:59:02 PM6/29/16
to jenkinsc...@googlegroups.com

dilipm79@gmail.com (JIRA)

unread,
Jun 29, 2016, 4:59:05 PM6/29/16
to jenkinsc...@googlegroups.com
Dilip Mahadevappa edited a comment on Bug JENKINS-27514
Jenkins ver. 1.625.2
SSH Slaves plugin: 1.10
Thread Dump  attached:   [^thread-dump.txt]
Support bundle attached:  [^support_2016-06-29_13.17.36 (2).zip]
We have hit a similiar issue. We had some 5k+ blocked Computer.threadPoolForRemoting threads.
{code:java}

Computer.threadPoolForRemoting [#175318] - threadId:2655322 (0x28845a) - state:BLOCKED
stackTrace:
- waiting to lock <0x6e0864fe> (a hudson.plugins.sshslaves.SSHLauncher)
owned by Computer.threadPoolForRemoting [#175203] id=2653481
at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1226)
at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:603)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Locked synchronizers: count = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@729cb958

.
.
.
{code}

david@makewhatis.com (JIRA)

unread,
Jun 29, 2016, 5:06:01 PM6/29/16
to jenkinsc...@googlegroups.com

So, we were hitting this issue a while back, threads stacking until we had to restart. I tracked it down to some automation that I had written, using http://javadoc.jenkins-ci.org/hudson/model/Computer.html#connect(boolean). I was using connect(true) which is supposed to spawn a new thread if the agent is "disconnected", and cancel the current thread that was disconnected. It doesn't cancel it right away though.

Anyways, it was spawning them much more frequently than Jenkins was killing those old threads, when a host was not in a state where it could be connected.

I changed my call to connect(false) to not force reconnect if the agent was unavailable, and we stopped seeing the crazy thread stacking. Kinda sounds like what you are seeing, but either way figured I'd share what we did.

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 27, 2017, 4:44:02 PM3/27/17
to jenkinsc...@googlegroups.com
Oleg Nenashev updated an issue
 
Change By: Oleg Nenashev
Component/s: remoting
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 27, 2017, 4:46:02 PM3/27/17
to jenkinsc...@googlegroups.com
Oleg Nenashev commented on Bug JENKINS-27514
 
Re: Jenkins leaks thousands of Computer.threadPoolForRemoting threads leading to eventual server OOM

From what I see in the File Leak detector, it requires a fix on the Remoting side. The fix needs to be similar to JENKINS-37332

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 27, 2017, 4:47:02 PM3/27/17
to jenkinsc...@googlegroups.com
Oleg Nenashev edited a comment on Bug JENKINS-27514
From what I see in the File Leak detector, it requires a fix on the Remoting side. The fix needs to be similar to JENKINS-37332 . BTW, it seems to be a standalone issue

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 27, 2017, 4:50:08 PM3/27/17
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 27, 2017, 4:51:08 PM3/27/17
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages