Investigating a spike in builds queue size we've found out that TcpSlaveAgent listener thread was dead with the following logs:
2019-10-23 09:02:17.236+0000 [id=200815] SEVERE h.TcpSlaveAgentListener$ConnectionHandler#lambda$new$0: Uncaught exception in TcpSlaveAgentListener ConnectionHandler Thread[TCP agent connection handler #1715 with /10.125.100.99:47700,5,main]
java.lang.UnsupportedOperationException: Network layer is not supposed to call isSendOpen
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:730)
at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:340)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:738)
at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:340)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.isSendOpen(SSLEngineFilterLayer.java:237)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:738)
at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:340)
at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.isSendOpen(ConnectionHeadersFilterLayer.java:514)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doSend(ProtocolStack.java:690)
at org.jenkinsci.remoting.protocol.ApplicationLayer.write(ApplicationLayer.java:157)
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.start(ChannelApplicationLayer.java:230)
at org.jenkinsci.remoting.protocol.ProtocolStack.init(ProtocolStack.java:201)
at org.jenkinsci.remoting.protocol.ProtocolStack.access$700(ProtocolStack.java:106)
at org.jenkinsci.remoting.protocol.ProtocolStack$Builder.build(ProtocolStack.java:554)
at org.jenkinsci.remoting.engine.JnlpProtocol4Handler.handle(JnlpProtocol4Handler.java:153)
at jenkins.slaves.JnlpSlaveAgentProtocol4.handle(JnlpSlaveAgentProtocol4.java:203)
at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:271)
2019-10-23 09:02:17.237+0000 [id=200815] WARNING hudson.TcpSlaveAgentListener$1#run: Connection handler failed, restarting listener
java.lang.UnsupportedOperationException: Network layer is not supposed to call isSendOpen
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:730)
at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:340)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:738)
at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:340)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.isSendOpen(SSLEngineFilterLayer.java:237)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:738)
at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:340)
at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.isSendOpen(ConnectionHeadersFilterLayer.java:514)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doSend(ProtocolStack.java:690)
at org.jenkinsci.remoting.protocol.ApplicationLayer.write(ApplicationLayer.java:157)
at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.start(ChannelApplicationLayer.java:230)
at org.jenkinsci.remoting.protocol.ProtocolStack.init(ProtocolStack.java:201)
at org.jenkinsci.remoting.protocol.ProtocolStack.access$700(ProtocolStack.java:106)
at org.jenkinsci.remoting.protocol.ProtocolStack$Builder.build(ProtocolStack.java:554)
at org.jenkinsci.remoting.engine.JnlpProtocol4Handler.handle(JnlpProtocol4Handler.java:153)
at jenkins.slaves.JnlpSlaveAgentProtocol4.handle(JnlpSlaveAgentProtocol4.java:203)
at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:271)
Followed by logs from nodes created by Jenkins Kubernetes Plugin:
SEVERE: http://jenkins-master.example.com/ provided port:50000 is not reachable
java.io.IOException: http://jenkins-master.example.com/ provided port:50000 is not reachable
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:287)
at hudson.remoting.Engine.innerRun(Engine.java:523)
at hudson.remoting.Engine.run(Engine.java:474)
Changing JNLP port from 50000 to 50001 and back in Jenkins settings helped to restore connection and then nodes were able to connect to master again.
A few questions:
How can I debug this further?
Can it be an issue with Jenkins 2.190.1? (We've faced this twice after upgrade from previous LTS in September)
Is there some way to notify administrator about such things in logs?
Investigating a spike in builds queue size we've found out that TcpSlaveAgent listener thread was dead with the following logs:
{code:java} 2019-10-23 09:02:17.236+0000 [id=200815] SEVERE h.TcpSlaveAgentListener$ConnectionHandler#lambda$new$0: Uncaught exception in TcpSlaveAgentListener ConnectionHandler Thread[TCP agent connection handler #1715 with /10.125.100.99:47700,5,main]
at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:271) {code}
Followed by logs from nodes created by Jenkins Kubernetes Plugin:
{code:java}
SEVERE: http://jenkins-master.example.com/ provided port:50000 is not reachable java.io.IOException: http://jenkins-master.example.com/ provided port:50000 is not reachable at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:287) at hudson.remoting.Engine.innerRun(Engine.java:523) at hudson.remoting.Engine.run(Engine.java:474)
{code}
Changing JNLP port from 50000 to 50001 and back in Jenkins settings helped to restore connection and then nodes were able to connect to master again.
A few questions: # How can I debug this further? # Can it be an issue with Jenkins 2.190.1? (We've faced this twice after upgrade from previous LTS in September) # Is there some way to notify administrator about such things in logs?