[JIRA] (JENKINS-48865) JNLP Agents/Slaves Disconnecting Unpredictably

16 views
Skip to first unread message

piotr.plenik+jenkinsio@gmail.com (JIRA)

unread,
Oct 26, 2018, 6:54:02 AM10/26/18
to jenkinsc...@googlegroups.com
Piotr Plenik commented on Bug JENKINS-48865
 
Re: JNLP Agents/Slaves Disconnecting Unpredictably

Oleg Nenashev indeed JENKINS-48865 and JENKINS-48865 is precisely the same issue.  

I guess that you mean JENKINS-44132. Isn't?

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

jthompson@cloudbees.com (JIRA)

unread,
Oct 26, 2018, 1:58:02 PM10/26/18
to jenkinsc...@googlegroups.com

I suspect Oleg meant JENKINS-48895.

 

Ping failures on the agent can occur because of some issue on the master, perhaps a restart, or excessive resource issue causing it to delay in responding to the ping, or some other system or networking issue.

jthompson@cloudbees.com (JIRA)

unread,
Dec 11, 2018, 2:09:02 PM12/11/18
to jenkinsc...@googlegroups.com
Jeff Thompson closed an issue as Cannot Reproduce
 

Closing for lack of sufficient diagnostics and information to reproduce after no response for quite a while.

Jenkins / Bug JENKINS-48865
JNLP Agents/Slaves Disconnecting Unpredictably
Change By: Jeff Thompson
Status: Open Closed
Resolution: Cannot Reproduce

awong29@ford.com (JIRA)

unread,
Apr 1, 2019, 5:20:06 PM4/1/19
to jenkinsc...@googlegroups.com
Alfred Wong reopened an issue
 

I have an issue very similar to this issue. My observation is that the slave has lost connectivity and tried to re-establish a connection and the master is rejecting the connection because master thinks it already have the connection. While at the same time master is trying to ping the slave and waiting for the 4 minutes timeout. I think the error condition can be handle a bit differently, if ping is not responding and a new connection request is coming in, it should accept the new connection instead of waiting for 4 minutes before destroying the old connection. I have attached a log file from the master. The only thing I am not sure is why the slave needs to request a new connection, maybe because the connection to the master is not very stable. It would be nice to have more slave logs to see why the connection is dropped.

 
The Jenkins version is 2.150.3 and run under Kunbernetes and the slaves are Windows slaves started using JNLP.

Change By: Alfred Wong
Resolution: Cannot Reproduce
Status: Closed Reopened

awong29@ford.com (JIRA)

unread,
Apr 1, 2019, 5:20:06 PM4/1/19
to jenkinsc...@googlegroups.com
Alfred Wong updated an issue
Change By: Alfred Wong
Attachment: jswum_jenkins_log.txt

jthompson@cloudbees.com (JIRA)

unread,
Apr 25, 2019, 1:10:02 PM4/25/19
to jenkinsc...@googlegroups.com

Alfred Wong, your description sounds different from the original report. The original report was about unpredictable disconnects. These can happen for many reasons, but often occur because of system, network, or environmental issues. Your description concerns re-connection problems. I think it would be better for you to create a separate ticket for your issue.

Could you share more information about what is occurring? Information about how you launch your agents. Anything relevant about their configuration. Agent logs would be essential.

awong29@ford.com (JIRA)

unread,
Apr 25, 2019, 3:15:02 PM4/25/19
to jenkinsc...@googlegroups.com

Sure, I can create a new JIRA, I think the original problem I got was the disconnect and it is still happening a few times a day. Our vendor OpenShift and our container team has been spending the last few weeks investigating the issue. I will put the re-connection issue in another JIRA. Thanks.

jthompson@cloudbees.com (JIRA)

unread,
Apr 25, 2019, 4:34:02 PM4/25/19
to jenkinsc...@googlegroups.com

Yes, disconnect issues can be very difficult to track down. They're usually due to something closing the connection at the TCP layer. Or one end being overloaded and unable to maintain its side.

I think we should re-close this ticket.

awong29@ford.com (JIRA)

unread,
Apr 26, 2019, 12:25:03 PM4/26/19
to jenkinsc...@googlegroups.com
Alfred Wong closed an issue as Fixed
 

I will update if we find anything more about why the disconnection happen from our IT. Thanks.

Change By: Alfred Wong
Status: Reopened Closed
Resolution: Fixed

tomahawk1187@gmail.com (JIRA)

unread,
Mar 21, 2020, 8:05:02 PM3/21/20
to jenkinsc...@googlegroups.com

Why is this closed? I have the same problem. Remoting v3.36, Jenkins v2.213

This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo

mark.earl.waite@gmail.com (JIRA)

unread,
Mar 22, 2020, 6:02:03 PM3/22/20
to jenkinsc...@googlegroups.com

Anargyros Tomaras there is a comment from Jeff Thompson which says that he is closing it for lack of information that will allow the problem to be duplicated. If you can provide a set of steps which will allow someone else to duplicate the failures, I'm sure he'd be delighted to see those steps and experiment with them.

< /td>

cvalean@cloudbasesolutions.com (JIRA)

unread,
Mar 22, 2020, 6:09:03 PM3/22/20
to jenkinsc...@googlegroups.com

Mark Waite you need to understand this was not an issue of "step 1, 2, 3, repro".

Everyone's environment is different, and errors do go away after people are trying to do anything to pass this issue.

personally I'm going to unsubscribe from this thread as it's no longer relevant to me. Reporting issues here is very disappointing when trash get hidden under the "no repro" tag, rather than trying to understand the problem and offer any type of suggestions.

mark.earl.waite@gmail.com (JIRA)

unread,
Mar 22, 2020, 7:43:05 PM3/22/20
to jenkinsc...@googlegroups.com

Chris Valean I accept that many issues are not "step 1, 2, 3, repro", many environments are different, and that workarounds often help users find ways to avoid issues. I was trying to answer the question from Anargyros Tomaras.

I'm open to any suggestions that a volunteer maintainer should do to fix a bug that can't be duplicated. What would persuade a volunteer maintainer to be more interested in this issue than the other issues they are investigating or the other features they are adding?

I've spent many hours making guesses about bug reports, trying various experiments in hopes of seeing the problem that the user reported. The investigations are usually focused on helping a user find an alternative which will allow them to avoid an issue they have detected. Those investigations have the added hope that if I understand how to duplicate the problem, I can assess how many other users will see the problem. The investigations may also help me understand how to fix the problem. The investigations are done on personal time and for personal passion.

I empathize with user frustration that the issue they are seeing is not visible to the maintainer. I don't see what maintainers can do to fix a problem they cannot see.

I empathize with maintainers that don't receive enough information from submitters. I understand that users may not want to spend any more time reporting an issue than is absolutely necessary.

I don't see a lot of benefit to leaving an issue open as a maintainer when I've tried my best to duplicate it and I cannot duplicate it. If it is left open, it may mislead users that someone might work on it. If I can't duplicate the problem, it is much less likely that I will work on the problem. I don't see any loss of information in marking an issue as "Cannot reproduce" and closing it. If others find a way to duplicate the problem, they can provide the detailed information to duplicate the problem and reopen the issue.

jthompson@cloudbees.com (JIRA)

unread,
Mar 23, 2020, 10:44:02 AM3/23/20
to jenkinsc...@googlegroups.com

As I mentioned previously, these sorts of issues are almost always caused by some problem in the local environment. Something to do with system, network, or environment configuration. Sometimes it results from a conflict between plugins or job execution errors, which mistakenly appear as Remoting issues. All of these types of issues require troubleshooting in the local environment. Without providing a substantial amount of troubleshooting data, which usually ends up identifying the configuration issue anyway, there is nothing that anyone else can do.

Frequently with these issues, when someone reports they have the same issue, it often turns out to be something quite different. Alfred's, earlier here, is an excellent example. On another similar ticket, there were multiple reports from different people as to how they resolved the issue, most of them different.

If someone can provide sufficient diagnostics or reproduction steps, I'd be happy to take a look. Even better, submit a PR, as several people have done.

gopal.ahir@motorolasolutions.com (JIRA)

unread,
May 8, 2020, 1:14:04 AM5/8/20
to jenkinsc...@googlegroups.com

Any fix for this? I am also facing the same issue. Jenkins v2.204.1. ssh plugin version 1.31.0

Slave OS:- Windows Server 2016

I am facing this issue only when the build is in progress and there is no logs in job output for some time. The build is getting failed.

 
12:29:24 Z:\>rem \\zmy19nap01\HOME\pcrscm\PuTTY\plink.exe -ssh -i \\zmy19nap01\home\pcrscm\.ssh\pcrscm.ppk pcrscm@zmy33lxclient04 "/usr/atria/bin/cleartool setview -exec 'perl /view/cars_CARS_PCR_SU_PLIGHT1.1.50_SCM/vobs/ltd_tools/cars/common/cleartool_lscheckout.pl' pcrscm_Crete_host_I9998" 12:29:24 12:29:24 Z:\>exit 0 12:39:16
[Agent went offline during the build

https://pcrsub-jenkins.mot-solutions.com/computer/ZMY33-WIN2016/log]
12:39:16 ERROR: Connection was broken: java.util.concurrent.TimeoutException: Ping started at 1588912516061 hasn't completed by 1588912756062*12:39:16* at hudson.remoting.PingThread.ping(PingThread.java:133)12:39:16 at hudson.remoting.PingThread.run(PingThread.java:89)12:39:16 12:39:16 Build step 'Console output (build log) parsing' marked build as failure*12:39:16* ERROR: ZMY33-WIN2016 is offline; cannot locate JAVA_HOME
Reply all
Reply to author
Forward
0 new messages