[JIRA] (JENKINS-49956) Kubernetes agent fails to reconnect on master restarts

400 views
Skip to first unread message

jenkins-ci@carlossanchez.eu (JIRA)

unread,
Mar 6, 2018, 12:02:02 PM3/6/18
to jenkinsc...@googlegroups.com
Carlos Sanchez updated an issue
 
Jenkins / Bug JENKINS-49956
Kubernetes agent fails to reconnect on master restarts
Change By: Carlos Sanchez
Summary: Kubernetes agent fail fails to reconnect on master restarts
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)
Atlassian logo

jenkins-ci@carlossanchez.eu (JIRA)

unread,
Mar 6, 2018, 12:05:03 PM3/6/18
to jenkinsc...@googlegroups.com
Carlos Sanchez updated an issue
After restarting the master for plugin upgrades the agent fails to reconnect

master logs

{code}
Mar 06, 2018 4:50:23 PM jenkins.model.Jenkins$24 run
SEVERE: Restarting VM as requested by xxx
Mar 06, 2018 4:50:23 PM jenkins.model.Jenkins cleanUp
INFO: Stopping Jenkins
Mar 06, 2018 4:50:23 PM jenkins.model.Jenkins$19 onAttained
INFO: Started termination
Mar 06, 2018 4:50:23 PM jenkins.model.Jenkins$19 onAttained
INFO: Completed termination
Mar 06, 2018 4:50:23 PM jenkins.model.Jenkins _cleanUpDisconnectComputers
INFO: Starting node disconnection
Mar 06, 2018 4:50:23 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
WARNING: safe-restart thread for artifact-manager-s3-c7701-0rmp3 terminated
java.nio.channels.ClosedChannelException
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:800)
at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:311)
at hudson.remoting.Channel.close(Channel.java:1405)
at hudson.remoting.Channel.close(Channel.java:1358)
at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:737)
at hudson.slaves.SlaveComputer.kill(SlaveComputer.java:704)
at hudson.model.AbstractCIBase.killComputer(AbstractCIBase.java:88)
at jenkins.model.Jenkins.access$2000(Jenkins.java:304)
at jenkins.model.Jenkins$20.run(Jenkins.java:3367)
at hudson.model.Queue._withLock(Queue.java:1373)
at hudson.model.Queue.withLock(Queue.java:1250)
at jenkins.model.Jenkins._cleanUpDisconnectComputers(Jenkins.java:3361)
at jenkins.model.Jenkins.cleanUp(Jenkins.java:3237)
at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73)
at jenkins.model.Jenkins$24.run(Jenkins.java:4218)

Mar 06, 2018 4:50:24 PM jenkins.model.Jenkins _cleanUpShutdownPluginManager
INFO: Stopping plugin manager
Mar 06, 2018 4:50:24 PM jenkins.model.Jenkins _cleanUpPersistQueue
INFO: Persisting build queue
Mar 06, 2018 4:50:24 PM jenkins.model.Jenkins _cleanUpAwaitDisconnects
INFO: Waiting for node disconnection completion
Mar 06, 2018 4:50:24 PM jenkins.model.Jenkins cleanUp
INFO: Jenkins stopped
VM settings:
    Max. Heap Size: 4.00G
    Ergonomics Machine Class: server
    Using VM: OpenJDK 64-Bit Server VM

Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Mar 06, 2018 4:50:27 PM Main deleteWinstoneTempContents
WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war
Mar 06, 2018 4:50:27 PM org.eclipse.jetty.util.log.Log initialized
INFO: Logging initialized @1153ms to org.eclipse.jetty.util.log.JavaUtilLog
Mar 06, 2018 4:50:28 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
Mar 06, 2018 4:50:28 PM org.eclipse.jetty.server.handler.ContextHandler setContextPath
WARNING: Empty contextPath
Mar 06, 2018 4:50:28 PM org.eclipse.jetty.server.Server doStart
INFO: jetty-9.4.z-SNAPSHOT
Mar 06, 2018 4:50:29 PM org.eclipse.jetty.webapp.StandardDescriptorProcessor visitServlet
INFO: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet
Mar 06, 2018 4:50:29 PM org.eclipse.jetty.server.session.DefaultSessionIdManager doStart
INFO: DefaultSessionIdManager workerName=node0
Mar 06, 2018 4:50:29 PM org.eclipse.jetty.server.session.DefaultSessionIdManager doStart
INFO: No SessionScavenger set, using defaults
Mar 06, 2018 4:50:29 PM org.eclipse.jetty.server.session.HouseKeeper startScavenging
INFO: Scavenging every 660000ms
Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
Mar 06, 2018 4:50:30 PM org.eclipse.jetty.server.handler.ContextHandler doStart
INFO: Started w.@239a307b{/,file:///var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war}
Mar 06, 2018 4:50:30 PM org.eclipse.jetty.server.AbstractConnector doStart
INFO: Started ServerConnector@30f842ca{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
Mar 06, 2018 4:50:30 PM org.eclipse.jetty.server.Server doStart
INFO: Started @4439ms
Mar 06, 2018 4:50:30 PM winstone.Logger logInternal
INFO: Winstone Servlet Engine v4.0 running: controlPort=disabled
Mar 06, 2018 4:50:33 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started initialization
Mar 06, 2018 4:50:35 PM jenkins.InitReactorRunner$1 onAttained
INFO: Listed all plugins
[csanchez@mbook-retina
// here
kubernetes (master killed jenkins pod because Liveness probe failed

$ % u=)]$ kubectl logs -f jenkins-0
VM settings:
    Max. Heap Size: 4.00G
    Ergonomics Machine Class: server
    Using VM: OpenJDK 64-Bit Server VM

Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Mar 06, 2018 4:50:54 PM Main deleteWinstoneTempContents
WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war
Mar 06, 2018 4:50:54 PM org.eclipse.jetty.util.log.Log initialized
INFO: Logging initialized @1119ms to org.eclipse.jetty.util.log.JavaUtilLog
Mar 06, 2018 4:50:54 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
Mar 06, 2018 4:50:54 PM org.eclipse.jetty.server.handler.ContextHandler setContextPath
WARNING: Empty contextPath
Mar 06, 2018 4:50:54 PM org.eclipse.jetty.server.Server doStart
INFO: jetty-9.4.z-SNAPSHOT
Mar 06, 2018 4:50:55 PM org.eclipse.jetty.webapp.StandardDescriptorProcessor visitServlet
INFO: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet
Mar 06, 2018 4:50:56 PM org.eclipse.jetty.server.session.DefaultSessionIdManager doStart
INFO: DefaultSessionIdManager workerName=node0
Mar 06, 2018 4:50:56 PM org.eclipse.jetty.server.session.DefaultSessionIdManager doStart
INFO: No SessionScavenger set, using defaults
Mar 06, 2018 4:50:56 PM org.eclipse.jetty.server.session.HouseKeeper startScavenging
INFO: Scavenging every 600000ms
Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
Mar 06, 2018 4:50:57 PM org.eclipse.jetty.server.handler.ContextHandler doStart
INFO: Started w.@239a307b{/,file:///var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war}
Mar 06, 2018 4:50:57 PM org.eclipse.jetty.server.AbstractConnector doStart
INFO: Started ServerConnector@30f842ca{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
Mar 06, 2018 4:50:57 PM org.eclipse.jetty.server.Server doStart
INFO: Started @4295ms
Mar 06, 2018 4:50:57 PM winstone.Logger logInternal
INFO: Winstone Servlet Engine v4.0 running: controlPort=disabled
Mar 06, 2018 4:50:59 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started initialization
Mar 06, 2018 4:51:00 PM jenkins.InitReactorRunner$1 onAttained
INFO: Listed all plugins
Mar 06, 2018 4:51:21 PM jenkins.InitReactorRunner$1 onAttained
INFO: Prepared all plugins
Mar 06, 2018 4:51:21 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started all plugins
Mar 06, 2018 4:51:23 PM hudson.ExtensionFinder$GuiceFinder$FaultTolerantScope$1 error
INFO: Failed to instantiate optional component hudson.plugins.build_timeout.operations.AbortAndRestartOperation$DescriptorImpl; skipping
Mar 06, 2018 4:51:26 PM jenkins.InitReactorRunner$1 onAttained
INFO: Augmented all extensions
Mar 06, 2018 4:51:27 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Agent has already been launched, activating: {}
Mar 06, 2018 4:51:28 PM jenkins.InitReactorRunner$1 onAttained
INFO: Loaded all jobs
Mar 06, 2018 4:51:28 PM jenkins.util.groovy.GroovyHookScript execute
INFO: Executing /var/jenkins_home/init.groovy.d/tcp-slave-agent-port.groovy
Mar 06, 2018 4:51:29 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Mar 06, 2018 4:51:29 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Download metadata. 91 ms
Mar 06, 2018 4:51:30 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@5ef7d416: display name [Root WebApplicationContext]; startup date [Tue Mar 06 16:51:30 GMT 2018]; root of context hierarchy
Mar 06, 2018 4:51:30 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@5ef7d416]: org.springframework.beans.factory.support.DefaultListableBeanFactory@3f128517
Mar 06, 2018 4:51:30 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@3f128517: defining beans [filter,legacy]; root of factory hierarchy
Mar 06, 2018 4:51:30 PM jenkins.InitReactorRunner$1 onAttained
INFO: Completed initialization
Mar 06, 2018 4:51:34 PM hudson.WebAppMain$3 run
INFO: Jenkins is fully up and running
--> setting agent port for jnlp
--> setting agent port for jnlp... done
Mar 06, 2018 4:52:16 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
INFO: Accepted JNLP4-connect connection #1 from /100.96.6.38:47260
Mar 06, 2018 4:52:25 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminating Kubernetes instance for agent artifact-manager-s3-c7701-0rmp3
Mar 06, 2018 4:52:25 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
WARNING: Computer.threadPoolForRemoting [#9] for artifact-manager-s3-c7701-0rmp3 terminated
java.nio.channels.ClosedChannelException
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:800)
at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:311)
at hudson.remoting.Channel.close(Channel.java:1405)
at hudson.remoting.Channel.close(Channel.java:1358)
at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:737)
at hudson.slaves.SlaveComputer.access$800(SlaveComputer.java:96)
at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:655)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Mar 06, 2018 4:52:25 PM hudson.remoting.Request$2 run
WARNING: Failed to send back a reply to the request hudson.remoting.Request$2@78147134
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:671)
at hudson.remoting.Request$2.run(Request.java:370)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:19)
at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:800)
at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:311)
at hudson.remoting.Channel.close(Channel.java:1405)
at hudson.remoting.Channel.close(Channel.java:1358)
at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:737)
at hudson.slaves.SlaveComputer.access$800(SlaveComputer.java:96)
at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:655)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
... 4 more

Mar 06, 2018 4:52:25 PM org.csanchez.jenkins.plugins.kubernetes.pipeline.PodTemplateStepExecution$PodTemplateCallback finished
INFO: Removing pod template and deleting pod artifact-manager-s3-c7701 from cloud kubernetes
Mar 06, 2018 4:52:27 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Mar 06, 2018 4:52:27 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Mar 06, 2018 4:52:27 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminated Kubernetes instance for agent jenkins/artifact-manager-s3-c7701-0rmp3
Terminated Kubernetes instance for agent jenkins/artifact-manager-s3-c7701-0rmp3
Mar 06, 2018 4:52:27 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Disconnected computer artifact-manager-s3-c7701-0rmp3
Mar 06, 2018 4:52:27 PM org.csanchez.jenkins.plugins.kubernetes.pipeline.PodTemplateStepExecution$PodTemplateCallback finished
WARNING: Failed to delete pod for agent jenkins/artifact-manager-s3-c7701: not found
Mar 06, 2018 4:52:27 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: test #2 completed: FAILURE
{code}


agent logs

{code}
Mar 06, 2018 4:41:48 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Mar 06, 2018 4:50:23 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Mar 06, 2018 4:50:33 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFO: Master isn't ready to talk to us on http://jenkins/tcpSlaveAgentListener/. Will try again: response code=503
Mar 06, 2018 4:50:48 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFO: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out
Mar 06, 2018 4:51:03 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFO: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out
Mar 06, 2018 4:51:18 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFO: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out
Mar 06, 2018 4:51:33 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFO: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out
Mar 06, 2018 4:51:48 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFO: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out
Mar 06, 2018 4:52:03 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
INFO: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out
Mar 06, 2018 4:52:13 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect
INFO: Restarting agent via jenkins.slaves.restarter.UnixSlaveRestarter@2d4c4c79
Mar 06, 2018 4:52:15 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: artifact-manager-s3-c7701-0rmp3
Mar 06, 2018 4:52:15 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Mar 06, 2018 4:52:15 PM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
Mar 06, 2018 4:52:15 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins/]
Mar 06, 2018 4:52:15 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Mar 06, 2018 4:52:15 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
  Agent address: jenkins
  Agent port:    50000
  Identity:      6b:d5:db:ec:8b:c9:46:b3:00:40:72:91:da:6f:6c:31
Mar 06, 2018 4:52:15 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Mar 06, 2018 4:52:15 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins:50000
Mar 06, 2018 4:52:15 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Mar 06, 2018 4:52:16 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: 6b:d5:db:ec:8b:c9:46:b3:00:40:72:91:da:6f:6c:31
Mar 06, 2018 4:52:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Mar 06, 2018 4:52:25 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Mar 06, 2018 4:52:25 PM hudson.remoting.Request$2 run
WARNING: Failed to send back a reply to the request hudson.remoting.Request$2@4fbffa99
hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@452b851a:JNLP4-connect connection to jenkins/100.67.180.57:50000": channel is already closed
at hudson.remoting.Channel.send(Channel.java:672)
at hudson.remoting.Request$2.run(Request.java:371)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at hudson.remoting.Engine$1$1.run(Engine.java:94)
at java.lang.Thread.run(Thread.java:748)
Caused by: hudson.remoting.Channel$OrderlyShutdown: Command close created at
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1226)
at hudson.remoting.Channel$1.handle(Channel.java:561)
at hudson.remoting.AbstractByteBufferCommandTransport.processCommand(AbstractByteBufferCommandTransport.java:202)
at hudson.remoting.AbstractByteBufferCommandTransport.receive(AbstractByteBufferCommandTransport.java:188)
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onRead(ChannelApplicationLayer.java:186)
at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecv(ApplicationLayer.java:207)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:669)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:369)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:669)
at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetworkLayer.java:48)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:283)
... 4 more
Caused by: Command close created at
at hudson.remoting.Command.<init>(Command.java:60)
at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1219)
at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1217)
at hudson.remoting.Channel.close(Channel.java:1391)
at hudson.remoting.Channel.close(Channel.java:1358)
at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:737)
at hudson.slaves.SlaveComputer.access$800(SlaveComputer.java:96)
at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:655)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
{code}

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 7, 2018, 3:59:02 AM3/7/18
to jenkinsc...@googlegroups.com

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 7, 2018, 4:01:02 AM3/7/18
to jenkinsc...@googlegroups.com
Oleg Nenashev commented on Bug JENKINS-49956
 
Re: Kubernetes agent fails to reconnect on master restarts

Carlos Sanchez

1. does the Remoting process die after the error in the log? I'd guess so, it should be a fatal error
2. Do the docker images you use have any failover configured outside the remoting executable?

jenkins-ci@carlossanchez.eu (JIRA)

unread,
Mar 7, 2018, 4:04:02 AM3/7/18
to jenkinsc...@googlegroups.com

1. yes, the docker agent dies after the last log Failed to send back a reply to the request
2. No, it's the standard https://github.com/jenkinsci/docker-jnlp-slave image

jenkins-ci@carlossanchez.eu (JIRA)

unread,
Mar 7, 2018, 4:08:02 AM3/7/18
to jenkinsc...@googlegroups.com

My guess is that it has to do with the double restart because there are tests that exercise durable pipeline at https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/test/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/RestartPipelineTest.java

In my instance I increased the time for health checks to avoid the double restart

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 7, 2018, 5:36:02 AM3/7/18
to jenkinsc...@googlegroups.com

From Remoting perspective it is a default behavior for fatal errors. What is "fatal" is separate topic, e.g. see JENKINS-46515.

Generally the advanced failover behavior should be implemented by users of slave.jar, e.g. see https://github.com/jenkinsci/windows-slave-installer-module/blob/master/src/main/resources/org/jenkinsci/modules/windows_slave_installer/jenkins-slave.xml#L52 . Feel free to submit feature requests to Remoting though.

jenkins-ci@carlossanchez.eu (JIRA)

unread,
May 25, 2018, 4:39:01 AM5/25/18
to jenkinsc...@googlegroups.com
Carlos Sanchez resolved as Cannot Reproduce
 

This seems to work after increasing the liveness checks making the master not restart twice

Change By: Carlos Sanchez
Status: Open Resolved
Resolution: Cannot Reproduce
Reply all
Reply to author
Forward
0 new messages