The last 15 versions (or so) of Hudson does really not work with the slaves. We are stuck with 1.377, which also have bad issues with hanging slaves and jobs. Very similar behavior to what is described in the good ol' HUDSON-5977 - "Dead lock condition due to pipe clogging".
In versions after 1.377, the Hudson-Logs gets spammed with connection and pipe errors and sometimes slave disconnects. And various plug-ins does not really work as they used too (CopyArtifact, Emma, Perforce, Copy-to-slave and others)
There are already tons of JIRA-issues related to remoting problems, so I'm not going to create yet another one. But after just a quick glance I found these (short log-snippets included):
[HUDSON-7745] CopyArtifact plugin triggers intermittent remoting failures
hudson.util.IOException2: hudson.util.IOException2: Not in GZIP format
stream=636366470c8b4e5080b5c73ed5afaad3f4e6f07892 [CUT]
at hudson.FilePath.copyRecursiveTo(FilePath.java:1474)
[HUDSON-7664] Exceptions when communicating with Perforce "No output for:" (probably slave-only)
WARNING: Failed to write to stream
java.io.IOException: Pipe is already closed
[HUDSON-8241] Performance problem : CPU at 100% and IHM not available for over than 1 hour (Blocker)
WARNING: Failed to write to stream
java.io.IOException: Pipe is already closed
[HUDSON-7871] "java.io.IOException: Bad file descriptor" when file copied from slave
Oct 21, 2010 4:40:03 AM hudson.remoting.ProxyOutputStream$Chunk$1 run
WARNING: Failed to write to stream
java.io.IOException: Bad file descriptor
[HUDSON-7809] Remote Launcher randomly returns no data. (Blocker)
Oct 15, 2010 10:12:17 AM hudson.remoting.ProxyOutputStream$Chunk$1 run
WARNING: Failed to write to stream
java.io.IOException: Pipe closed
&&
java.lang.NegativeArraySizeException
at hudson.remoting.ProxyOutputStream$Chunk.<init>(ProxyOutputStream.java:175)
at hudson.remoting.ProxyOutputStream._write(ProxyOutputStream.java:123)
[HUDSON-3412] - For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:412)
at hudson.remoting.Request.call(Request.java:105)
at hudson.remoting.Channel.call(Channel.java:555)
at hudson.FilePath.act(FilePath.java:742)
... 13 more
FATAL: channel is already closed
hudson.remoting.ChannelClosedException: channel is already closed
[HUDSON-5073] - hudson.util.IOException2: Failed to join the process - on a Windows slave
[HUDSON-7715] - java.io.IOException: Could not open channel (The connection is being shutdown)
[HUDSON-8384] - Copying a compressed tar file from a 64 bit master to a slave running SUSE Linux fails
--------
There is a chance that some of the listed issues above is no longer relevant, but I'm also sure that there are more issues related to the remoting library that are not listed above.
I would like to use this thread for discussing a bit around the remoting issues. I'm trying to track some of them down, but after the ProxyStreams was reworked to be threaded (and provided with a new windowing mechanism), I'm having a bit of a trouble to completely wrap my head around it. I do have some experience with it since my proposed patch for the 5977 - Deadlock issue. But this was before the refactoring/threading of the remoting streams.
So, let’s start with a common issue seen in the recent versions (here 1.390) when running the Emma plug-in with a slave. The Hudson log gets spammed with 1000s of lines stating:
Jan 10, 2011 1:36:51 PM hudson.remoting.ProxyOutputStream$Chunk$1 run
WARNING: Failed to write to stream
java.io.IOException: Pipe is already closed
at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:147)
at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:131)
at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:185)
and also
Jan 10, 2011 1:36:56 PM hudson.remoting.ProxyOutputStream$Chunk$1 run
WARNING: Failed to write to stream
java.io.IOException: Reader side has already been abandoned
I'm not sure how it affects the build, but other plug-ins seems to have similar (same?) issue. For instance [HUDSON-7664] & [HUDSON-8241].
It seems like the ProxyOutputStream/FastPipedOutputStream tries to fill its buffer (that the reader returned from FilePath.read() eventually reads from), even after the stream is remotely closed. In the case of the Emma plug-in, the issue can easy be reproduced by creating a publisher plug-in with just the following lines of code:
@Override
public boolean perform(AbstractBuild build, Launcher launcher, BuildListener listener) throws IOException {
//getName() returns the name of a file availible in the workspace...
InputStream in = build.getWorkspace().child(getName()).read();
System.out.println(in.read());
in.close();
return true;
}
This will spam the log with 10.000s of lines with "Pipe is already closed" exceptions and also sometimes make the slave disconnect.
This is due to FastPipedOutputStream tries to write long after the stream is closed:
146 if(s.closed!=null) {
147 throw (IOException)new IOException("Pipe is already closed").initCause(s.closed);
148 }
Suggestions on where to start? Any core developers that could give some hints?
Best regards
Gustaf Lundh – Sony Ericsson
I'll try to clarify a bit more:
Whenever a plug-in (or other code running on the master) is using
FilePath.read() to read data from a remote file, the slave process starts to
write ProxyOutputStream$Chunks until it have sent the full file. "The full
file"-part is the main issue here.
I'll try to explain why; when you close the InputStream returned by
FilePath.read, there is no mechanism to let the Slave know this.
So the poor slave will continue to send data in chunks until the full file
is "sent".
The FastPipedOutputStream (at the master) will continue to receive the data,
and write to our InputStream's buffer. First at this point, it will know
notice what the Pipe is closed, and therefor throw an exception "Pipe is
already closed...". Then the next chunks (sent from the slave) comes in.
Rinse and repeat.
This means that if you have a plug-in that only reads a little chunk of data
from a larger (and remotely located) file and then closes then stream, the
slave will still continue to send chunk after chunk and the
FastPipedOutputStream (at the master) continues trying to write it down -
Generating a loads of exceptions.
A few plug-ins suffers from this behavior. The Emma plug-in among others.
So, the main issue here, is that there is no mechanism to let the slave know
that it should stop sending. Looking at the Remoting-code I'm wondering if
the FileStream.read() was ever supposed to be used outside the unzip() and
untar() methods (where only full reads are performed).
If this is the case, then perhaps it should be documented and read() should
be changed to private.
If it is a bug, a few hints on how to achieve the correct behavior would be
appreciated.
Gustaf Lundh - Sony Ericsson
--
View this message in context: http://hudson.361315.n4.nabble.com/Regarding-the-remoting-issues-seen-in-later-Hudson-releases-Core-Emma-plug-in-etc-tp3207205p3216166.html
Sent from the Hudson dev mailing list archive at Nabble.com.