"hudson.remoting.ChannelClosedException: channel is already closed"
indicates an unexpected loss of connection to the slave. The nested
"Caused by: java.io.EOFException" indicates that the slave side has shut
down the communication with the slave.
The thing is, the communication to the slave (InputStream that Channel
reads) is tunneled over several layers, and the way this part of the
code discovers the problem is by InputStream.read() returning -1.
This design of InputStream does not allow us to report the underlying
cause of the communication problem through a chained exception, so we
really can't properly report the root cause.
The slave console log does normally capture the last dying message from
the slave JVM or a transport level errors, but this gets rotated quickly
as soon as the next connection attempt starts, and while on
$JENKINS_HOME this file is still available, there's no way to look at
this from the web UI. Jenkins does pretty aggressively auto-reconnect
slaves that fail, and it takes some time for someone to notice a build
failure by ChannelClosedException and try to understand what's going on,
so that makes the trouble-shooting even more tricky.
I was just sweeping the ssh-slaves plugin ticket backlog, and there are
many reports of this same issue, so this clearly is a gap in the
diagnosability of the slave connectivity.
If anyone has a good idea of how to capture the errors, that'd be
greatly appreciated.
One approach that I think about is to introduce a proper log rotation
mechanism (that handles LargeText.doProgressText() correctly), and
somehow use that to let people scroll back the slave console log.
Perhaps another possibility is to let the ComputerLauncher record a
connection loss as an Exception on a failing Channel.
--
Kohsuke Kawaguchi | CloudBees, Inc. |
http://cloudbees.com/
Try Nectar, our professional version of Jenkins