We've been facing a couple of build failure because of agent disconnection for no apparent reasons. Failures are following the same pattern: The build logs say (time of error): {{ 9:45:50 FATAL: command execution failed java.nio.channels.ClosedChannelException at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141) at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91) ... Caused: java.io.IOException: Backing channel 'our-windows10-agent' is disconnected. at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)}} On the agent, the logs say: {{ Apr 09, 2020 9:45:50 AM hudson.Launcher$RemoteLaunchCallable$1 join INFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@11010e5e:our-windows10-agent hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@11010e5e:our-windows10-agent": Remote call on our-windows10-agent failed. The channel is closing down or has closed down at hudson.remoting.Channel.call(Channel.java:991) at hudson.remoting.Channel.syncIO(Channel.java:1730) ... Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@11010e5e:our-windows10-agent": channel is already closed at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251) Apr 09, 2020 9:45:50 AM hudson.remoting.UserRequest perform WARNING: LinkageError while performing UserRequest:UserRPCRequest:hudson.Launcher$RemoteProcess.join[](4) java.lang.LinkageError: Failed to load hudson.util.ProcessTree at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:391) ... Caused by: java.lang.NoClassDefFoundError: hudson/util/ProcessTreeRemoting$IProcessTree at java.lang.ClassLoader.defineClass1(Native Method) ... Caused by: java.lang.ClassNotFoundException: hudson.util.ProcessTreeRemoting$IProcessTree at java.net.URLClassLoader.findClass(URLClassLoader.java:382) }} On the master, the agent logs say (file slave.log.1, mtime = Apr 9 09:45): {{ Inbound agent connected from x.x.x.x Remoting version: 4.3 This is a Windows agent onOnline: class org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl reported an exception: java.net.MalformedURLException: no protocol: jnlpJars/slave.jar Agent successfully connected and online ERROR: Connection terminated java.nio.channels.ClosedChannelException at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141) at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91) at jenkins.websocket.WebSockets$$Lambda$282/00000000940934C0.invoke(Unknown Source) at com.sun.proxy.$Proxy74.onWebSocketClose(Unknown Source) at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:119) at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:389) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:317) at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86) at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359) at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288) at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280) at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293) at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381) at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264) at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193) at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:584) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:511) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:441) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938) at java.lang.Thread.run(Unknown Source) }} I see no error in the nginx (reverse proxy) logs. I only see the GET /wsagent/ in the access logs. Weirdly, the last GET I see in the access logs on the the wsagents endoint does not fit with the error. It's like the agent did not reconnect. The agent logs in the master say that the "Connection terminated" (see above) and it's still connected (the mtime of the latest agent log file - slave.log - on the master is Apr 9 09:46). We used to run this agent with a direct TCP connection (with Jenkins 2.204.5 and remoting 3.36.1) and we were not facing the issue. We need to switch to websocket connection to avoid streaming the TCP connection in the nginx reverse-proxy. Is there anything I can do to better analyze what's going on? |