[JIRA] (JENKINS-43038) Intermittent error "Cannot contact node123: java.lang.InterruptedException " in jenkins

2 views
Skip to first unread message

manish.savlani@gmail.com (JIRA)

unread,
Mar 22, 2017, 11:37:02 AM3/22/17
to jenkinsc...@googlegroups.com
Manish Sawlani created an issue
 
Jenkins / Bug JENKINS-43038
Intermittent error "Cannot contact node123: java.lang.InterruptedException " in jenkins
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 2017/Mar/22 3:36 PM
Environment: Jenkins Version : 2.48
OS on Master : RHEL 5.4
OS on Salve : RHEL 6.6
Java version on salve : jdk1.7.0_80
Priority: Minor Minor
Reporter: Manish Sawlani

We face below connection errors intermittently while running jobs on node123.

Error which we see in build log is : Cannot contact node123: java.lang.InterruptedException

I dont see any error in thread dump or any other logs related to this node.

Also i see there was not connection drop between Master and node.

Slave is see is running since more than 24 hrs now.

 

 

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 23, 2017, 4:11:02 AM3/23/17
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 24, 2017, 3:54:01 AM3/24/17
to jenkinsc...@googlegroups.com

levtar@gmail.com (JIRA)

unread,
Mar 27, 2017, 7:06:01 AM3/27/17
to jenkinsc...@googlegroups.com
Lev Tartakovsky commented on Bug JENKINS-43038
 
Re: Intermittent error "Cannot contact node123: java.lang.InterruptedException " in jenkins

The problem persist also in Ubuntu 16.04, Jenkins 2.32.3.
Unfortunately, I cannot find any evidence of exception stack trace.
As part of the above problem troubleshooting I've used SSH Jenkins slave running at the same server as master.I've managed to workaround the problem by 

levtar@gmail.com (JIRA)

unread,
Mar 27, 2017, 7:07:01 AM3/27/17
to jenkinsc...@googlegroups.com
Lev Tartakovsky edited a comment on Bug JENKINS-43038
The problem persist also in Ubuntu 16.04, Jenkins 2.32.3.
Unfortunately, I cannot find any evidence of exception stack trace.
As part of the above problem troubleshooting I've used SSH Jenkins slave running at the same server as master.I've managed to workaround the problem by switching my jobs to run at master and not slave.

 

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 27, 2017, 8:56:01 AM3/27/17
to jenkinsc...@googlegroups.com

OK. If you see no exception, please provide full Jenkins System logs at least. Without such information I cannot triangulate the issue

levtar@gmail.com (JIRA)

unread,
Mar 27, 2017, 10:19:03 AM3/27/17
to jenkinsc...@googlegroups.com

levtar@gmail.com (JIRA)

unread,
Mar 27, 2017, 10:22:03 AM3/27/17
to jenkinsc...@googlegroups.com
Lev Tartakovsky commented on Bug JENKINS-43038
 
Re: Intermittent error "Cannot contact node123: java.lang.InterruptedException " in jenkins

I've just uploaded my Jenkins log.
Please note that most exceptions in the log are referring to disconnect/connect of slave.

levtar@gmail.com (JIRA)

unread,
Mar 28, 2017, 2:25:02 AM3/28/17
to jenkinsc...@googlegroups.com

The problem could be related to another problem that I've reported recently
https://issues.jenkins-ci.org/browse/JENKINS-43106

At the problem description you may find more logs, including thread dump that may shed some lite at the root cause of the problem.

levtar@gmail.com (JIRA)

unread,
Mar 29, 2017, 4:55:02 AM3/29/17
to jenkinsc...@googlegroups.com

levtar@gmail.com (JIRA)

unread,
Mar 29, 2017, 4:57:01 AM3/29/17
to jenkinsc...@googlegroups.com
 
Re: Intermittent error "Cannot contact node123: java.lang.InterruptedException " in jenkins

I've managed to catch the exception which may shed some light at the problem.
Please review the attached pipeline_log.txt

levtar@gmail.com (JIRA)

unread,
Mar 29, 2017, 6:12:02 AM3/29/17
to jenkinsc...@googlegroups.com
Lev Tartakovsky edited a comment on Bug JENKINS-43038
I've managed to catch the exception which may shed some light at on the problem.

Please review the attached pipeline_log.txt


At the same time build log printed:
[Pipeline] stage
[Pipeline] \{ (Create GIT TAG)
[Pipeline] sh
10:45:26 [CISystem_generic@2] Running shell script
10:45:37 Cannot contact ##############: java.lang.InterruptedException
10:45:47 Cannot contact ##############: java.lang.InterruptedException
10:45:57 Cannot contact ##############: java.lang.InterruptedException
10:46:07 Cannot contact ##############: java.lang.InterruptedException
10:46:18 Cannot contact ##############: java.lang.InterruptedException
10:46:28 Cannot contact ##############: java.lang.InterruptedException
10:46:38 Cannot contact ##############: java.lang.InterruptedException
10:46:48 Cannot contact ##############: java.lang.InterruptedException
10:46:59 Cannot contact ##############: java.lang.InterruptedException
10:47:09 Cannot contact ##############: java.lang.InterruptedException
10:47:19 Cannot contact ##############: java.lang.InterruptedException
10:47:29 Cannot contact ##############: java.lang.InterruptedException
10:47:40 Cannot contact ##############: java.lang.InterruptedException
10:47:50 Cannot contact ##############: java.lang.InterruptedException
10:48:00 Cannot contact ##############: java.lang.InterruptedException
10:48:10 Cannot contact ##############: java.lang.InterruptedException
10:48:21 Cannot contact ##############: java.lang.InterruptedException
10:48:31 Cannot contact ##############: java.lang.InterruptedException
10:48:41 Cannot contact ##############: java.lang.InterruptedException
10:48:51 Cannot contact ##############: java.lang.InterruptedException
10:49:02 Cannot contact ##############: java.lang.InterruptedException
10:49:12 Cannot contact ##############: java.lang.InterruptedException
10:49:22 Cannot contact ##############: java.lang.InterruptedException
10:49:32 Cannot contact ##############: java.lang.InterruptedException
10:49:43 Cannot contact ##############: java.lang.InterruptedException
10:49:53 Cannot contact ##############: java.lang.InterruptedException
10:50:03 Cannot contact ##############: java.lang.InterruptedException
10:50:13 Cannot contact ##############: java.lang.InterruptedException
10:50:24 Cannot contact ##############: java.lang.InterruptedException
10:50:34 Cannot contact ##############: java.lang.InterruptedException
10:50:44 Cannot contact ##############: java.lang.InterruptedException
10:50:54 Cannot contact ##############: java.lang.InterruptedException
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins
10:51:00 + git tag -a ############## -m Created by Jenkins

ttmost@gmail.com (JIRA)

unread,
Feb 11, 2018, 7:13:03 AM2/11/18
to jenkinsc...@googlegroups.com

I see this issue as well during testing which can take about 10-20 minutes of running a single shell script.

I suppose it happens when the agent gets disconnected for a split second. Is there anyway to create a workaround protecting the shell script from this. At the moment I have to manually abort the running test.

Thanks,

Tsvi

shahmishal@gmail.com (JIRA)

unread,
Feb 21, 2018, 5:10:09 PM2/21/18
to jenkinsc...@googlegroups.com

dor982@gmail.com (JIRA)

unread,
Mar 1, 2018, 4:47:05 AM3/1/18
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 13, 2018, 10:33:16 PM3/13/18
to jenkinsc...@googlegroups.com
Oleg Nenashev assigned an issue to Unassigned
 

Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

Change By: Oleg Nenashev
Assignee: Oleg Nenashev

svanoort@cloudbees.com (JIRA)

unread,
Apr 3, 2018, 2:22:05 PM4/3/18
to jenkinsc...@googlegroups.com
Sam Van Oort closed an issue as Fixed
 

Manish Sawlani mishal shah Tsvi Mostovicz If you update to the latest Pipeline plugins and especially support-core plugin and use the suggested GC settings (https://jenkins.io/blog/2016/11/21/gc-tuning/) you should find that the InterruptedExceptions are pretty much gone – they are the result of timeouts in remoting-related operations generally. The only cases they should happen now I believe are actual hardware/system/network issues.

In the last quarter of 2017 we did a big change to the way Pipeline's durable tasks interact with remoting that should avoid many of these issues.

Explanation: support-core plugin in version 2.42 added heap histogram analysis for diagnostics but this had the side effect of introducing periodic catastrophically long GC pauses that made the Jenkins master unresponsive for long periods and triggered timeouts (and thus the InterruptedException here when Timeouts kick in).

Please see https://issues.jenkins-ci.org/browse/JENKINS-49931 for more details of that.

For now I'm going to transition this to "closed" because when working with several users showing this among other symptoms, the suggestions above successfully resolved the issues – but I'm happy to re-open this if you all still experience problems after applying the above (please reply to note the same).

Change By: Sam Van Oort
Status: Open Closed
Resolution: Fixed

svanoort@cloudbees.com (JIRA)

unread,
Apr 3, 2018, 2:23:02 PM4/3/18
to jenkinsc...@googlegroups.com

svanoort@cloudbees.com (JIRA)

unread,
Apr 3, 2018, 2:32:02 PM4/3/18
to jenkinsc...@googlegroups.com
Sam Van Oort edited a comment on Bug JENKINS-43038
 
Re: Intermittent error "Cannot contact node123: java.lang.InterruptedException " in jenkins
[~msavlani1] [~shahmishal] [~tsvi]  If you update to the latest Pipeline plugins and *especially* support-core plugin and use the suggested GC settings (https://jenkins.io/blog/2016/11/21/gc-tuning/) you should find that the InterruptedExceptions are pretty much gone -- they are the result of timeouts in remoting-related operations generally.  The only cases they should happen now I believe are actual hardware/system/network issues.

In the last quarter of 2017 we did a big change to the way Pipeline's durable tasks interact with remoting that should avoid many of these issues.

Explanation Edit : There was an additional issue fixed around support-core that caused problems and was recently fixed.  Specifically, support-core plugin in version 2.42 added heap histogram analysis for diagnostics but this had the unexpected side effect of introducing periodic catastrophically long GC pauses that made the Jenkins master unresponsive for long periods and triggered timeouts (and thus the InterruptedException here when Timeouts kick in).

Please see https://issues.jenkins-ci.org/browse/JENKINS-49931 for more details of that.

For now I'm going to transition this to "closed" because when working with several users showing this among other symptoms, the suggestions above successfully resolved the issues -- but I'm happy to re-open this if you all still experience problems after applying the above (please reply to note the same).

joe.barber@genband.com (JIRA)

unread,
Jul 19, 2018, 9:36:02 AM7/19/18
to jenkinsc...@googlegroups.com

Hi I am recently seeing the same "Cannot contact node123: java.lang.InterruptedException" error but only during parallel stages in a pipeline job.

I have created a brand new Jenkins environment (Jenkins version 2.121.1) with all updated plugins and have the GC settings according to the gc-tuning page from the above comment.
This issue is intermittent (about 1 every 8 builds or so).

Support-Core version 2.48
Pipeline version 2.5

Any other advice?

 

Thanks,

 

This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396)

svanoort@cloudbees.com (JIRA)

unread,
Jul 19, 2018, 10:11:02 AM7/19/18
to jenkinsc...@googlegroups.com

Joe Barber What you describe sounds a lot like https://issues.jenkins-ci.org/browse/JENKINS-46507 but we have not had a consistent way to reproduce the issue, so it's very hard to debug. If you can provide a simple, self-contained sample Pipeline in the comments of that ticket that will reproduce the issue, that would be very helpful. Thanks!

oxygenxo@gmail.com (JIRA)

unread,
Oct 18, 2019, 5:00:05 PM10/18/19
to jenkinsc...@googlegroups.com

We're experiencing the same issue when our java agent get killed my OOM or machine on which agent is running is rebooted. Is there any way to reduce amount of time Jenkins will wait till the build will be mark as failed?

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages