[JIRA] [core] (JENKINS-24155) Jenkins Slaves Go Offline In Large Quantities and Don't Reconnect Until Reboot

2 views
Skip to first unread message

vladimir.lazarev@intel.com (JIRA)

unread,
May 18, 2015, 3:43:01 AM5/18/15
to jenkinsc...@googlegroups.com
Vladimir Lazarev commented on Bug JENKINS-24155
 
Re: Jenkins Slaves Go Offline In Large Quantities and Don't Reconnect Until Reboot

We have "peer reconnecting" issue ones per 2 weeks. After applying the WA proposed by Amal connections with crashing during heavy jobs (high CPU load and long duration) on regular basis.

So be aware...

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265)
Atlassian logo

vladimir.lazarev@intel.com (JIRA)

unread,
May 18, 2015, 4:28:01 AM5/18/15
to jenkinsc...@googlegroups.com
Vladimir Lazarev edited a comment on Bug JENKINS-24155
We have "peer reconnecting" issue  ones per 2 weeks. After applying the WA proposed by Amal connections  with  are  crashing during heavy jobs (high CPU load and long duration) on regular basis.

So be aware...

stephenconnolly@java.net (JIRA)

unread,
Jun 24, 2015, 4:25:01 AM6/24/15
to jenkinsc...@googlegroups.com

sgannon200@gmail.com (JIRA)

unread,
Aug 24, 2015, 1:09:02 PM8/24/15
to jenkinsc...@googlegroups.com

I've been told that this issue is the same as JENKINS-28844 and has been resolved in the 1.609.3 LTS.

mohittater@bluejeansnet.com (JIRA)

unread,
Apr 4, 2016, 8:18:02 AM4/4/16
to jenkinsc...@googlegroups.com

We are facing this issue on Jenkins ver. 1.605.

On most of the offline slaves I am seeing:
"JNLP agent connected from /x.y.z.a" in the node log.

Here is the threadDump link of the affected Jenkins instance.
http://pastebin.com/9hUR1Awf

Please provide a temporary workaround for this so that it can be avoided in future.

Note:
We are using 50+ nodes on a single master.

o.v.nenashev@gmail.com (JIRA)

unread,
Aug 19, 2016, 4:33:03 PM8/19/16
to jenkinsc...@googlegroups.com
Oleg Nenashev updated an issue
 
Jenkins / Bug JENKINS-24155
Change By: Oleg Nenashev
Component/s: remoting
This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c)
Atlassian logo

aubert.aa@gmail.com (JIRA)

unread,
Sep 30, 2016, 10:48:02 AM9/30/16
to jenkinsc...@googlegroups.com

aubert.aa@gmail.com (JIRA)

unread,
Sep 30, 2016, 10:51:07 AM9/30/16
to jenkinsc...@googlegroups.com
Alexandre Aubert commented on Bug JENKINS-24155
 
Re: Jenkins Slaves Go Offline In Large Quantities and Don't Reconnect Until Reboot

same problem since several days with jenkins 2.23, here is the extract of log with :

  • first 'outofmemory' error
    then
  • all 'java.lang.OutOfMemoryError: unable to create new native thread'
  • then 'disconnection of all slaves'

log.txt

Hope this could help.

aubert.aa@gmail.com (JIRA)

unread,
Sep 30, 2016, 10:59:09 AM9/30/16
to jenkinsc...@googlegroups.com
Alexandre Aubert edited a comment on Bug JENKINS-24155
same problem since several days with jenkins 2.23, here is the extract of log with :
- first 'outofmemory' error
then
- all 'java.lang.OutOfMemoryError: unable to create new native thread'
- then 'disconnection of all slaves'

[^log.txt]

2 slaves are not disconnected : slave.jar is more recent on those. I will update slave.jar on all and check if it happens again.... (waiting also the autoupdate of slave.jar files which is pending in another ticket....)

Hope this could help.

aubert.aa@gmail.com (JIRA)

unread,
Oct 4, 2016, 3:00:04 AM10/4/16
to jenkinsc...@googlegroups.com

In my case this was a outofmemory problem : i fixed it by increasing the -Xmx in jenkins args and all seems to be ok since.

trushar.scm@gmail.com (JIRA)

unread,
Nov 7, 2016, 12:14:04 PM11/7/16
to jenkinsc...@googlegroups.com

We are also facing the same issue on Jenkins 1.624. I had to reboot it. Please someone help. This looks like its been going on for while.

nelu.vasilica@cirrus.com (JIRA)

unread,
Dec 15, 2016, 6:46:02 AM12/15/16
to jenkinsc...@googlegroups.com

Just seen the same issue on Jenkins 1.642.1 Linux master. the fix was to restart tomcat and the windows slaves reconnected automatically.
Found several instances of: Ping started at xxxxxx hasn't completed by xxxxxxx in the logs.
Is setting jenkins.slaves.NioChannelSelector.disabled property to true a viable workaround?

hailcesos@gmail.com (JIRA)

unread,
Dec 19, 2016, 5:30:02 PM12/19/16
to jenkinsc...@googlegroups.com

Same issue here. Only this time, my tests never get done. The slaves area always dropping during the tests please halp!!

l.heche@hotmail.fr (JIRA)

unread,
Nov 30, 2018, 8:31:04 AM11/30/18
to jenkinsc...@googlegroups.com
Louis Heche updated an issue
 
Change By: Louis Heche
Attachment: masterJenkins.log
Attachment: jenkins-slave.0.err.log
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

l.heche@hotmail.fr (JIRA)

unread,
Nov 30, 2018, 8:31:06 AM11/30/18
to jenkinsc...@googlegroups.com
Louis Heche commented on Bug JENKINS-24155
 
Re: Jenkins Slaves Go Offline In Large Quantities and Don't Reconnect Until Reboot

I'm having what seems to be this issue with Jenkins 2.138.3.

Every 3-4 days all the slaves node go offline although it seems to have no network problem. They return online once the master has been restarted. 

In attachment you'll find the logs jenkins-slave.0.err.logmasterJenkins.log

jenkins2jrw@nym.hush.com (JIRA)

unread,
Jul 10, 2019, 11:22:06 PM7/10/19
to jenkinsc...@googlegroups.com

Louis Heche Oleg Nenashev Cesos Barbarino

 Can one of you do the following. To help narrow down the possible leak areas it will be useful to capture process memory usage and JVM heap usage. Start your master process as normal. Then start 2 tools on the system and redirect the output to separate files. Both tools have low system resource usage.

 Memory stats can be captured using pidstat. Specifically to capture resident set size.

$ pidstat -r -p <pid> 8 > /tmp/pidstat-capture.txt

 JVM heap size and GC behavior. Specifically the percentage of reclaimed heap space after a full collection.

$ jstat -gcutil -t -h12 <pid> 8s > /tmp/jstat-capture.txt

Please attach the generated files to this issue.

Reply all
Reply to author
Forward
0 new messages