Frequent connection losses: java.io.IOException

2,796 views
Skip to first unread message

Andreas Tscharner

unread,
May 22, 2017, 4:14:03 AM5/22/17
to Jenkins Users
Hello World,

Our setup:
* Jenkins 2.32.2 on Debian Linux
libvirt Plugin for Jenkins
* 8 VMs on the same server, running Windows 7, building our software
KVM/QEMU VMs, using libvirt
* 4 Ubuntu machines containing 2 VMs each (=8) running Windows 10,
testing our software
KVM/QEMU VMs, using libvirt

We've experienced more and more connection problems lately, all having
the same exception:


FATAL: Befehlsausführung fehlgeschlagen
java.io.IOException: Backing channel 'JNLP4-connect connection from
build-vm1/192.168.168.160:1033' is disconnected.
at
hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:191)
at
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:256)
at com.sun.proxy.$Proxy60.isAlive(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:154)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:108)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:65)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
at hudson.model.Build$BuildExecution.build(Build.java:205)
at hudson.model.Build$BuildExecution.doRun(Build.java:162)
at
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
at hudson.model.Run.execute(Run.java:1728)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:404)
Caused by: java.nio.channels.ClosedChannelException
at
org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
at
org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
at
org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
at
org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
at
org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
at
org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
at
org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
at
org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
at
org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
at
org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
at
org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:800)
at
org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
at
org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:311)
at hudson.remoting.Channel.close(Channel.java:1287)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:115)
at hudson.remoting.PingThread.ping(PingThread.java:130)
at hudson.remoting.PingThread.run(PingThread.java:86)


These appear randomly on all nodes; we've found no pattern and no way to
reproduce them. There are also no more information in the logs.

Has anyone seen the same problems? Any hints what I should check?

TIA and best regards
Andreas
--
Andreas Tscharner <sterne...@gmail.com>
----------------------------------------------------------------------
"Intruder on level one. All Aliens please proceed to level one."
-- Call in "Alien: Resurrection"

Andreas Tscharner

unread,
May 22, 2017, 4:22:45 AM5/22/17
to Jenkins Users
On 22.05.2017 10:13, Andreas Tscharner wrote:
> Hello World,

Hello again,

Some more information
>
> Our setup:
> * Jenkins 2.32.2 on Debian Linux

jenkins@buildserver:~$ uname -a
Linux buildserver 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2
(2016-10-19) x86_64 GNU/Linux

jenkins@buildserver:~$ java -version
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-3-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

> libvirt Plugin for Jenkins
> * 8 VMs on the same server, running Windows 7, building our software
> KVM/QEMU VMs, using libvirt
Java version: 1.8.0_121, using the new Jenkins service

> * 4 Ubuntu machines containing 2 VMs each (=8) running Windows 10,
> testing our software
> KVM/QEMU VMs, using libvirt
Java Version: 1.01 (it worked better with this version), auto-started
batch file running "java -jar slave.jar -jnlpCredentials ... -jnlpUrl ..."

Andreas Tscharner

unread,
May 29, 2017, 3:58:23 AM5/29/17
to Jenkins Users
On 22.05.2017 10:13, Andreas Tscharner wrote:
> Hello World,
>

[snip]
> FATAL: Befehlsausführung fehlgeschlagen
> java.io.IOException: Backing channel 'JNLP4-connect connection from
> build-vm1/192.168.168.160:1033' is disconnected.

No-one? No idea?
Or any pointer to a similar (old)er issue?

Andreas Tscharner

unread,
May 29, 2017, 9:47:31 AM5/29/17
to Jenkins Users
On 22.05.2017 10:13, Andreas Tscharner wrote:
> Hello World,
>

Hello again,

[snip]
> FATAL: Befehlsausführung fehlgeschlagen
> java.io.IOException: Backing channel 'JNLP4-connect connection from
> build-vm1/192.168.168.160:1033' is disconnected.
[snip]

I have found some more information in /var/log/jenkins/jenkins.log:


Mai 28, 2017 2:02:59 PM hudson.slaves.ChannelPinger$1 onDead
INFORMATION: Ping failed. Terminating the channel JNLP4-connect
connection from build-vm3/192.168.168.162:1033.
java.util.concurrent.TimeoutException: Ping started at 1495972739294
hasn't completed by 1495972979294
at hudson.remoting.PingThread.ping(PingThread.java:130)
at hudson.remoting.PingThread.run(PingThread.java:86)


I've calculated the value between those two timestamps: 240000. I
suppose that are ms which would be 4 minutes.
Is there anything I could do? Why would the ping be dead for more than 4
minutes?

Kristian

unread,
May 29, 2017, 11:02:48 AM5/29/17
to jenkins...@googlegroups.com
Did you already looked at
https://wiki.jenkins-ci.org/display/JENKINS/Ping+Thread ?
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jenkinsci-use...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jenkinsci-users/ba3c7ec3-dfee-a128-2bc4-a31ec402e573%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Stephen Connolly

unread,
May 29, 2017, 11:31:20 AM5/29/17
to jenkins...@googlegroups.com
On 29 May 2017 at 06:47, Andreas Tscharner <sterne...@gmail.com> wrote:
On 22.05.2017 10:13, Andreas Tscharner wrote:
Hello World,


Hello again,

[snip]
FATAL: Befehlsausführung fehlgeschlagen
java.io.IOException: Backing channel 'JNLP4-connect connection from
build-vm1/192.168.168.160:1033' is disconnected.
[snip]

I have found some more information in /var/log/jenkins/jenkins.log:


Mai 28, 2017 2:02:59 PM hudson.slaves.ChannelPinger$1 onDead
INFORMATION: Ping failed. Terminating the channel JNLP4-connect connection from build-vm3/192.168.168.162:1033.
java.util.concurrent.TimeoutException: Ping started at 1495972739294 hasn't completed by 1495972979294
        at hudson.remoting.PingThread.ping(PingThread.java:130)
        at hudson.remoting.PingThread.run(PingThread.java:86)


I've calculated the value between those two timestamps: 240000. I suppose that are ms which would be 4 minutes.
Is there anything I could do? Why would the ping be dead for more than 4 minutes?

The ping is an application level ping, i.e. it submits a command into the command queue and waits for the response to be generated.
The command channel can be functioning, but if there is a backlog in the command queue on either side and the command takes longer than 4 minutes (configurable) then the channel is disconnected.

The command queue is single threaded. So there are multiple things that can cause the issue, e.g.:

* Transferring large artifacts between the master and the agent
* Plugin causing deadlock on the agent and prevent 
* etc

You may want to investigate the agent logs

You may want to monitor Disk I/O and CPU and memory usage (especially if running all things on the one machine)

You may want to monitor Network I/O if running agents over a remote network

HTH



TIA and best regards
        Andreas
--
Andreas Tscharner                             <sterne...@gmail.com>
----------------------------------------------------------------------
"Intruder on level one. All Aliens please proceed to level one."
                                      -- Call in "Alien: Resurrection"

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.

Andreas Tscharner

unread,
May 30, 2017, 2:45:17 AM5/30/17
to jenkins...@googlegroups.com, kristian...@gmail.com
On 29.05.2017 17:02, Kristian wrote:
> Did you already looked at
> https://wiki.jenkins-ci.org/display/JENKINS/Ping+Thread ?

No, thank you. I didn't realize that the ping caused the connection loss
until yesterday (I am not a Java developer so I missed that)


Unfortunately: If I execute the line given on that page
(Jenkins.instance.injector.getInstance(hudson.slaves.ChannelPinger.class).@pingInterval=-1)
in the script console in Jenkins, I get the following exception:

groovy.lang.GroovyRuntimeException: Cannot set the property
'pingInterval' because the backing field is final.
at
org.codehaus.groovy.reflection.CachedField.setProperty(CachedField.java:71)
at groovy.lang.MetaClassImpl.setAttribute(MetaClassImpl.java:2858)
at groovy.lang.MetaClassImpl.setAttribute(MetaClassImpl.java:3770)
at
org.codehaus.groovy.runtime.InvokerHelper.setAttribute(InvokerHelper.java:161)
at
org.codehaus.groovy.runtime.ScriptBytecodeAdapter.setField(ScriptBytecodeAdapter.java:332)
at Script1.run(Script1.groovy:1)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594)
at
hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142)
at
hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
at hudson.remoting.LocalChannel.call(LocalChannel.java:45)
at
hudson.util.RemotingDiagnostics.executeGroovy(RemotingDiagnostics.java:111)
at jenkins.model.Jenkins._doScript(Jenkins.java:4283)
at jenkins.model.Jenkins.doScript(Jenkins.java:4255)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:335)
at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:175)
at
org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:108)
at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:124)
at
org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:746)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:876)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:649)
at org.kohsuke.stapler.Stapler.service(Stapler.java:238)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:135)
at
hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:59)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132)
at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:126)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:49)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
at
hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
at
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at
jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
at
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at
org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
at
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at
org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
at
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at
org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
at
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at
jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
at
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at
org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249)
at
hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67)
at
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
at
hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:76)
at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I can, however, read that value if I execute
println(Jenkins.instance.injector.getInstance(hudson.slaves.ChannelPinger.class).@pingInterval)
(returns 5)

Andreas Tscharner

unread,
May 30, 2017, 2:49:25 AM5/30/17
to jenkins...@googlegroups.com, Stephen Connolly
On 29.05.2017 17:31, Stephen Connolly wrote:

[snip]
> The ping is an application level ping, i.e. it submits a command into
> the command queue and waits for the response to be generated.
> The command channel can be functioning, but if there is a backlog in the
> command queue on either side and the command takes longer than 4 minutes
> (configurable) then the channel is disconnected.
>
> The command queue is single threaded. So there are multiple things that
> can cause the issue, e.g.:
>
> * Transferring large artifacts between the master and the agent
> * Plugin causing deadlock on the agent and prevent
> * etc
>
> You may want to investigate the agent logs

I already have had a short look at them and did not find anything special.
>
> You may want to monitor Disk I/O and CPU and memory usage (especially if
> running all things on the one machine)

OK, I will do that. Jenkins and all 8 VMs are running on the same machine.
>
> You may want to monitor Network I/O if running agents over a remote network

The VMs are in the local network using a simple bridge

Thank you very much and best regards
Reply all
Reply to author
Forward
0 new messages