[JIRA] (JENKINS-49309) All SSH slaves unexpectedly disconnect when one job finishes

1 view
Skip to first unread message

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 13, 2018, 7:46:04 AM3/13/18
to jenkinsc...@googlegroups.com
Oleg Nenashev assigned an issue to Unassigned
 

Bulk issue update: The plugin connectivity is still unstable from what I see in this and other reports. Probably the recent patches in 1.24-1.25 caused some extra instability by getting rid of interlocks between agent connection and termination logic. Apparently it impacts some reconnection scenarios due to the race conditions.

Unfortunately I do not have capacity to work on the plugin in medium-term. So for now I am unassigning issues from myself. Ivan Fernandez Calvo was very kind to take ownership of the plugin and to handle some workload in it. Probably he will have some capacity to review the backlog I was unable to triage.

Jenkins / Bug JENKINS-49309
All SSH slaves unexpectedly disconnect when one job finishes
Change By: Oleg Nenashev
Assignee: Oleg Nenashev
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

ifernandezcalvo@cloudbees.com (JIRA)

unread,
Apr 18, 2018, 2:31:02 PM4/18/18
to jenkinsc...@googlegroups.com
Ivan Fernandez Calvo commented on Bug JENKINS-49309
 
Re: All SSH slaves unexpectedly disconnect when one job finishes
Connection to 127.0.0.1 closed by remote host.

The loopback device, Could you attach the config file of this Agent? Is it running on the same Jenkins instance? Did you set the Xmx and Xms JVM parameters?

diongonano@gmail.com (JIRA)

unread,
Apr 18, 2018, 2:46:03 PM4/18/18
to jenkinsc...@googlegroups.com

Ivan Fernandez Calvo Can you be more specific about which config or how to get it? I attached a pic of the slave config page in jenkins

diongonano@gmail.com (JIRA)

unread,
Apr 18, 2018, 2:47:02 PM4/18/18
to jenkinsc...@googlegroups.com
Dion Gonano edited a comment on Bug JENKINS-49309
[~ifernandezcalvo] Can you be more specific about which config or how to get it? I attached a pic of the slave config page in jenkins .

what is the Xmx and Xms?

ifernandezcalvo@cloudbees.com (JIRA)

unread,
Apr 18, 2018, 2:49:01 PM4/18/18
to jenkinsc...@googlegroups.com

diongonano@gmail.com (JIRA)

unread,
Apr 18, 2018, 2:50:01 PM4/18/18
to jenkinsc...@googlegroups.com

If Xms and Xmx are the java memory parameters i haven't configured anything. I thought Jenkins ssh'd in and started the slave JVM with the correct params.

diongonano@gmail.com (JIRA)

unread,
Apr 18, 2018, 2:53:02 PM4/18/18
to jenkinsc...@googlegroups.com

slave configuration

<slave>
 <name>vagrant</name>
 <description/>
 <remoteFS>/media/disk1/jenkins/</remoteFS>
 <numExecutors>1</numExecutors>
 <mode>NORMAL</mode>
 <retentionStrategy class="hudson.slaves.RetentionStrategy$Always"/>
 <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-s...@1.24">
   <host>192.168.11.10</host>
   <port>22</port>
   <credentialsId>vagrant</credentialsId>
   <maxNumRetries>0</maxNumRetries>
   <retryWaitTime>0</retryWaitTime>
   <sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy">
     <requireInitialManualTrust>false</requireInitialManualTrust>
   </sshHostKeyVerificationStrategy>
 </launcher>
 <label>vagrant-slave</label>
 <nodeProperties/>
 </slave>

diongonano@gmail.com (JIRA)

unread,
Apr 18, 2018, 2:54:01 PM4/18/18
to jenkinsc...@googlegroups.com

I've been running it with one executor for a while now with no disconnection issues

ifernandezcalvo@cloudbees.com (JIRA)

unread,
Apr 18, 2018, 3:16:02 PM4/18/18
to jenkinsc...@googlegroups.com

try to set the JVM options to

-Xmx512m -Xms512m

and increase the executors, I think that the slave.jar process launches an Out Of memory error.

diongonano@gmail.com (JIRA)

unread,
Apr 18, 2018, 4:45:01 PM4/18/18
to jenkinsc...@googlegroups.com

diongonano@gmail.com (JIRA)

unread,
Apr 19, 2018, 11:04:02 AM4/19/18
to jenkinsc...@googlegroups.com

ifernandezcalvo@cloudbees.com (JIRA)

unread,
May 1, 2018, 8:08:01 AM5/1/18
to jenkinsc...@googlegroups.com

kuisathaverat@gmail.com (JIRA)

unread,
Apr 8, 2019, 5:26:01 AM4/8/19
to jenkinsc...@googlegroups.com
Ivan Fernandez Calvo commented on Bug JENKINS-49309
 
Re: All SSH slaves unexpectedly disconnect when one job finishes

recently we detected disconnections that are related to https://wiki.jenkins.io/display/JENKINS/Slave+To+Master+Access+Control setting here we do not have the agent logs but in case that they show a serialization warning you should try to disable this feature and report a bug on the plugin that contains the class that fails to serialize. see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#selenium-grid-agents-failed-to-connect

 

The warning would be something like this one with a different class


{{Apr 03, 2019 9:46:01 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn

WARNING: Attempt to (de-)serialize anonymous class hudson.plugins.selenium.configuration.DirectJsonInputConfiguration$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/}}

This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

kuisathaverat@gmail.com (JIRA)

unread,
Jul 21, 2019, 9:03:02 AM7/21/19
to jenkinsc...@googlegroups.com
Ivan Fernandez Calvo closed an issue as Cannot Reproduce
 
Status: Open Closed
Resolution: Cannot Reproduce
Reply all
Reply to author
Forward
0 new messages