[JIRA] (JENKINS-50458) JNLP agent died while reconnecting to master with java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller

270 views
Skip to first unread message

regis.maura@infotel.com (JIRA)

unread,
Mar 28, 2018, 9:41:03 AM3/28/18
to jenkinsc...@googlegroups.com
Régis Maura created an issue
 
Jenkins / Bug JENKINS-50458
JNLP agent died while reconnecting to master with java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 2018-03-28 13:40
Environment: Master is Jenkins 2.112 as windows service with command :
{quote}
-Xrs -Xmx512m -Djava.net.preferIPv4Stack=true -Dhudson.remoting.ClassFilter=java.util.Formatter,javax.mail.internet.InternetAddress -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -jar "%BASE%\jenkins.war" --httpPort=8080 --webroot="%BASE%\war"
{quote}

Agent jar is 3.17 launched as windows service with args :
{quote}
-Xrs -jar "%BASE%\agent.jar" -jnlpUrl http://mycompany:8080/computer/Windows_Agent/slave-agent.jnlp -secret XYZ
{quote}

Both master and windows agent are running in Windows 7 32bits VMs
Both used JRE are 1.8.0_162
JNLP agent port is fixed, and only JNLP4 is allowed.
Labels: agent.jar jnlp-slave
Priority: Minor Minor
Reporter: Régis Maura

First agent is well started, and identicated on the master :

mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
INFOS: Locating server among [http://xxxxxxxxxx:8080/]
mars 22, 2018 5:40:04 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFOS: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
INFOS: Agent discovery successful
  Agent address: xxxxxxxxxx
  Agent port:    9999
  Identity:      xxxxxxxxxx
mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
INFOS: Handshaking
mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
INFOS: Connecting to topvm09.sesame.infotel.com:9999
mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
INFOS: Trying protocol: JNLP4-connect
mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
INFOS: Remote identity confirmed: xxxxxxxxxx
mars 22, 2018 5:40:05 PM hudson.remoting.jnlp.Main$CuiListener status
INFOS: Connected
mars 22, 2018 5:40:06 PM com.youdevise.hudson.slavestatus.SlaveListener call
INFOS: Slave-status listener starting
mars 22, 2018 5:40:06 PM com.youdevise.hudson.slavestatus.SocketHTTPListener waitForConnection
INFOS: Slave-status listener ready on port 3141

Then master is unavailable (lots of OutOfMemory) and has been restarted.

In the meantime, the JNLP agent try to reconnect to master until connection is OK:

mars 28, 2018 1:49:25 PM hudson.slaves.ChannelPinger$1 onDead
INFOS: Ping failed. Terminating the channel JNLP4-connect connection to xxxxxxxxxx/192.168.2.98:9999.
java.util.concurrent.TimeoutException: Ping started at 1522237525477 hasn't completed by 1522237765505
    at hudson.remoting.PingThread.ping(PingThread.java:134)
    at hudson.remoting.PingThread.run(PingThread.java:90)

[... Repeated multiple times...]

mars 28, 2018 2:26:45 PM hudson.remoting.jnlp.Main$CuiListener status
INFOS: Terminated
mars 28, 2018 2:26:45 PM hudson.util.ProcessTree getKillers
AVERTISSEMENT: Failed to obtain killers
hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection to xxxxxxxxxx/192.168.2.98:9999 failed. The channel is closing down or has closed down
    at hudson.remoting.Channel.call(Channel.java:945)
    at hudson.util.ProcessTree.getKillers(ProcessTree.java:159)
    at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:220)
    at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:436)
    at hudson.util.ProcessTree.killAll(ProcessTree.java:146)
    at hudson.Proc$LocalProc.destroy(Proc.java:384)
    at hudson.Proc$LocalProc.join(Proc.java:357)
    at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1304)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:927)
    at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:901)
    at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:850)
    at hudson.remoting.UserRequest.perform(UserRequest.java:210)
    at hudson.remoting.UserRequest.perform(UserRequest.java:53)
    at hudson.remoting.Request$2.run(Request.java:364)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at hudson.remoting.Engine$1$1.run(Engine.java:94)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
    at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1800(BIONetworkLayer.java:48)
    at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:264)
    ... 4 more

[... Repeated multiple times...]

mars 28, 2018 2:26:46 PM hudson.remoting.Request$2 run
AVERTISSEMENT: Failed to send back a reply to the request hudson.remoting.Request$2@34a893f6
hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@71af25fc:JNLP4-connect connection to xxxxxxxxxx/192.168.2.98:9999": channel is already closed
    at hudson.remoting.Channel.send(Channel.java:715)
    at hudson.remoting.Request$2.run(Request.java:377)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at hudson.remoting.Engine$1$1.run(Engine.java:94)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
    at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
    at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1800(BIONetworkLayer.java:48)
    at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:264)
    ... 4 more

mars 28, 2018 2:27:00 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFOS: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out

[... Repeated multiple times...]

mars 28, 2018 2:31:49 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFOS: Master isn't ready to talk to us on http://topvm09.sesame.infotel.com:8080/tcpSlaveAgentListener/. Will try again: response code=503
mars 28, 2018 2:32:00 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFOS: Master isn't ready to talk to us on http://topvm09.sesame.infotel.com:8080/tcpSlaveAgentListener/. Will try again: response code=503
mars 28, 2018 2:32:15 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFOS: Failed to connect to the master. Will try again: java.net.SocketTimeoutException Read timed out
mars 28, 2018 2:32:30 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFOS: Failed to connect to the master. Will try again: java.net.SocketTimeoutException Read timed out
mars 28, 2018 2:32:40 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFOS: Master isn't ready to talk to us on http://topvm09.sesame.infotel.com:8080/tcpSlaveAgentListener/. Will try again: response code=503

 

But when the master is back, then the agent died with the following stacktrace :

mars 28, 2018 2:32:50 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFOS: Master isn't ready to talk to us on http://topvm09.sesame.infotel.com:8080/tcpSlaveAgentListener/. Will try again: response code=503
mars 28, 2018 2:33:01 PM hudson.remoting.jnlp.Main$CuiListener error
GRAVE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
    at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
    at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
    at hudson.remoting.Engine.innerRun(Engine.java:662)
    at hudson.remoting.Engine.run(Engine.java:469)
Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 4 more

Please note that changelog of 2.112 says remoting has been updated to 3.18, and I use previous version of agent.

If agent version mismatch is the root cause, I would expect Jenkins to complains about the deprecated version of agent.

PS : I don't known if this a "core" component issue.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

regis.maura@infotel.com (JIRA)

unread,
Mar 28, 2018, 9:48:02 AM3/28/18
to jenkinsc...@googlegroups.com
Régis Maura updated an issue
Change By: Régis Maura
Environment: Master is Jenkins 2.112 as windows service with command :
{quote}
-Xrs -Xmx512m -Djava.net.preferIPv4Stack=true -Dhudson.remoting.ClassFilter=java.util.Formatter,javax.mail.internet.InternetAddress -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -jar "%BASE%\jenkins.war" --httpPort=8080 --webroot="%BASE%\war"
{quote}

Agent jar is 3.17 launched as windows service with args :
{quote}
-Xrs  -jar "%BASE%\agent.jar" -jnlpUrl http://mycompany:8080/computer/Windows_Agent/slave-agent.jnlp -secret XYZ
{quote}

Both master and windows agent are running in Windows 7 32bits VMs
Both used JRE are 1.8.0_162
JNLP agent port is fixed, and only JNLP4 is allowed.

o.v.nenashev@gmail.com (JIRA)

unread,
Apr 5, 2018, 5:55:01 AM4/5/18
to jenkinsc...@googlegroups.com
Oleg Nenashev commented on Bug JENKINS-50458
 
Re: JNLP agent died while reconnecting to master with java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller

JnlpSlaveRestarterInstaller is a part of the Jenkins core, yes.
But it propagates the restart logic to the master

From what I see the restart logic depends on the JnlpSlaveRestarterInstaller class instance, which cannot be propagated correctly when the agent is not connect. So this this error happens.

Why it happens? https://github.com/jenkinsci/jenkins/blob/f5bd70936d60503127d49597a08ec496db293dac/core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java#L91 is a non-static anonymous class, so it pulls in data. Serializable extension points are also a potential root cause

jenkins@garbe.io (JIRA)

unread,
Apr 10, 2018, 4:49:03 PM4/10/18
to jenkinsc...@googlegroups.com

regis.maura@infotel.com (JIRA)

unread,
Apr 11, 2018, 3:26:03 AM4/11/18
to jenkinsc...@googlegroups.com

Philipp Garbe Thank you for the feedback. I have updated agent but can't test the fix now.

However, It would be smart to warn administrator when some agent have lower version than required by master's version.

regis.maura@infotel.com (JIRA)

unread,
Apr 11, 2018, 3:27:03 AM4/11/18
to jenkinsc...@googlegroups.com

jonathan_tancer@colpal.com (JIRA)

unread,
Apr 25, 2018, 10:51:04 AM4/25/18
to jenkinsc...@googlegroups.com
Jon Tancer commented on Improvement JENKINS-50458
 
Re: JNLP agent died while reconnecting to master with java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller

I see the same error as the original poster, although I am using agent version 3.19.  Jenkins slaves connected via JNLP agent are unable to reconnect to Jenkins after the Jenkins web app reboots.  The error below is logged.

My server's startup script always pulls down the latest agent file, so I should never have an issue relating to a mismatch in versions.

Apr 25, 2018 11:54:01 AM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstallerjava.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller        at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)

 

jonathan_tancer@colpal.com (JIRA)

unread,
Apr 25, 2018, 10:52:03 AM4/25/18
to jenkinsc...@googlegroups.com
Jon Tancer edited a comment on Improvement JENKINS-50458
I see the same error as the original poster, although I am using agent version 3.19.  Jenkins slaves connected via JNLP agent are unable to reconnect to Jenkins after the Jenkins web app reboots.  The error below is logged.

My server's startup  build slaves have a boot-up script which always pulls down the latest agent file before establishing the connection to Jenkins.

For this reason
, so I should never have an issue relating to a mismatch in versions because all the slave reboot once daily .
{quote}Apr 25, 2018 11:54:01 AM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstallerjava.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller        at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
{quote}
 

jonathan_tancer@colpal.com (JIRA)

unread,
Apr 25, 2018, 10:53:03 AM4/25/18
to jenkinsc...@googlegroups.com
Jon Tancer edited a comment on Improvement JENKINS-50458
I see the same error as the original poster, although I am using agent version 3.19.  Jenkins slaves connected via JNLP agent are unable to reconnect to Jenkins after the Jenkins web app reboots.  Rebooting, then reconnecting the slaves fixes the error... temporarily.  The error below is logged.

My build slaves have a boot-up script which always pulls down the latest agent file before establishing the connection to Jenkins.

For this reason, I should never have an issue relating to a mismatch in versions because all the slave reboot once daily.

{quote}Apr 25, 2018 11:54:01 AM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstallerjava.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller        at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
{quote}
 

jonathan_tancer@colpal.com (JIRA)

unread,
Apr 25, 2018, 10:53:03 AM4/25/18
to jenkinsc...@googlegroups.com
Jon Tancer edited a comment on Improvement JENKINS-50458
I see the same error as the original poster, although I am using agent version 3.19.  Jenkins slaves connected via JNLP agent are unable to reconnect to Jenkins after the Jenkins web app reboots.  Rebooting, then reconnecting the slaves fixes the error... temporarily.  The error below is logged below .


My build slaves have a boot-up script which always pulls down the latest agent file before establishing the connection to Jenkins.

For this reason, I should never have an issue relating to a mismatch in versions because all the slave reboot once daily.
{quote}Apr 25, 2018 11:54:01 AM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstallerjava.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller        at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
{quote}
 

jonathan_tancer@colpal.com (JIRA)

unread,
May 26, 2018, 8:03:02 AM5/26/18
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
May 26, 2018, 8:36:03 AM5/26/18
to jenkinsc...@googlegroups.com

regis.maura@infotel.com (JIRA)

unread,
Jun 5, 2018, 5:11:02 AM6/5/18
to jenkinsc...@googlegroups.com

Oleg Nenashev We are using Java 8 for both master and agent.
Note : I have not tried to reproduce the bug since agent update to 3.19.

jthompson@cloudbees.com (JIRA)

unread,
Aug 17, 2018, 3:54:02 PM8/17/18
to jenkinsc...@googlegroups.com
Jeff Thompson assigned an issue to Jeff Thompson
 
Change By: Jeff Thompson
Assignee: Jeff Thompson
This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396)

jthompson@cloudbees.com (JIRA)

unread,
Aug 17, 2018, 3:57:02 PM8/17/18
to jenkinsc...@googlegroups.com
Jeff Thompson commented on Improvement JENKINS-50458
 
Re: JNLP agent died while reconnecting to master with java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller

Régis Maura, it looks like this has been working fine for you so we should probably just close it.

From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability.

I see a couple of other similar reports JENKINS-50730 and JENKINS-52283 but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.--

In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.

ashton.treadway@gmail.com (JIRA)

unread,
Sep 21, 2018, 4:46:02 PM9/21/18
to jenkinsc...@googlegroups.com
Ashton Treadway updated Improvement JENKINS-50458
 

Per Oleg Nenashev, closing as resolved with no response from submitter.

Change By: Ashton Treadway
Status: Open Fixed but Unreleased
Resolution: Fixed
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

o.v.nenashev@gmail.com (JIRA)

unread,
Oct 3, 2018, 5:57:25 PM10/3/18
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages