Regarding the remoting issues seen in later Hudson releases [Core/Emma plug-in etc]

19 views
Skip to first unread message

Lundh, Gustaf

unread,
Jan 10, 2011, 8:58:25 AM1/10/11
to hudso...@googlegroups.com

The last 15 versions (or so) of Hudson does really not work with the slaves. We are stuck with 1.377, which also have bad issues with hanging slaves and jobs. Very similar behavior to what is described in the good ol' HUDSON-5977 - "Dead lock condition due to pipe clogging".

 

In versions after 1.377, the Hudson-Logs gets spammed with connection and pipe errors and sometimes slave disconnects. And various plug-ins does not really work as they used too (CopyArtifact, Emma, Perforce, Copy-to-slave and others)

 

There are already tons of JIRA-issues related to remoting problems, so I'm not going to create yet another one. But after just a quick glance I found these (short log-snippets included):

 

[HUDSON-7745] CopyArtifact plugin triggers intermittent remoting failures

  hudson.util.IOException2: hudson.util.IOException2: Not in GZIP format

  stream=636366470c8b4e5080b5c73ed5afaad3f4e6f07892 [CUT]

                             at hudson.FilePath.copyRecursiveTo(FilePath.java:1474)

 

[HUDSON-7664] Exceptions when communicating with Perforce "No output for:" (probably slave-only)

  WARNING: Failed to write to stream

  java.io.IOException: Pipe is already closed

 

[HUDSON-8241] Performance problem : CPU at 100% and IHM not available for over than 1 hour (Blocker)

  WARNING: Failed to write to stream

  java.io.IOException: Pipe is already closed

 

[HUDSON-7871] "java.io.IOException: Bad file descriptor" when file copied from slave

  Oct 21, 2010 4:40:03 AM hudson.remoting.ProxyOutputStream$Chunk$1 run

  WARNING: Failed to write to stream

  java.io.IOException: Bad file descriptor

 

[HUDSON-7809] Remote Launcher randomly returns no data. (Blocker)

  Oct 15, 2010 10:12:17 AM hudson.remoting.ProxyOutputStream$Chunk$1 run

  WARNING: Failed to write to stream

  java.io.IOException: Pipe closed

&&

java.lang.NegativeArraySizeException

                             at hudson.remoting.ProxyOutputStream$Chunk.<init>(ProxyOutputStream.java:175)

                             at hudson.remoting.ProxyOutputStream._write(ProxyOutputStream.java:123)

 

[HUDSON-3412] - For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

  Caused by: hudson.remoting.ChannelClosedException: channel is already closed

  at hudson.remoting.Channel.send(Channel.java:412)

  at hudson.remoting.Request.call(Request.java:105)

  at hudson.remoting.Channel.call(Channel.java:555)

  at hudson.FilePath.act(FilePath.java:742)

  ... 13 more

  FATAL: channel is already closed

  hudson.remoting.ChannelClosedException: channel is already closed

 

[HUDSON-5073] - hudson.util.IOException2: Failed to join the process - on a Windows slave

 

[HUDSON-7715] - java.io.IOException: Could not open channel (The connection is being shutdown)

 

[HUDSON-8384] - Copying a compressed tar file from a 64 bit master to a slave running SUSE Linux fails

--------

 

There is a chance that some of the listed issues above is no longer relevant, but I'm also sure that there are more issues related to the remoting library that are not listed above.

 

I would like to use this thread for discussing a bit around the remoting issues. I'm trying to track some of them down, but after the ProxyStreams was reworked to be threaded (and provided with a new windowing mechanism), I'm having a bit of a trouble to completely wrap my head around it. I do have some experience with it since my proposed patch for the 5977 - Deadlock issue. But this was before the refactoring/threading of the remoting streams.

 

So, let’s start with a common issue seen in the recent versions (here 1.390) when running the Emma plug-in with a slave. The Hudson log gets spammed with 1000s of lines stating:

 

                             Jan 10, 2011 1:36:51 PM hudson.remoting.ProxyOutputStream$Chunk$1 run

                             WARNING: Failed to write to stream

                             java.io.IOException: Pipe is already closed

                                                                                       at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:147)

                                                                                       at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:131)

                                                                                       at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:185)

 

and also

 

                             Jan 10, 2011 1:36:56 PM hudson.remoting.ProxyOutputStream$Chunk$1 run

                             WARNING: Failed to write to stream

                             java.io.IOException: Reader side has already been abandoned

                            

I'm not sure how it affects the build, but other plug-ins seems to have similar (same?) issue. For instance [HUDSON-7664] & [HUDSON-8241].

 

It seems like the ProxyOutputStream/FastPipedOutputStream tries to fill its buffer (that the reader returned from FilePath.read() eventually reads from), even after the stream is remotely closed. In the case of the Emma plug-in, the issue can easy be reproduced by creating a publisher plug-in with just the following lines of code:

 

    @Override

    public boolean perform(AbstractBuild build, Launcher launcher, BuildListener listener) throws IOException {

                                                          //getName() returns the name of a file availible in the workspace...

        InputStream in = build.getWorkspace().child(getName()).read();

        System.out.println(in.read());

        in.close();

        return true;

    }

                            

This will spam the log with 10.000s of lines with "Pipe is already closed" exceptions and also sometimes make the slave disconnect.

 

This is due to FastPipedOutputStream tries to write long after the stream is closed:

 

    146             if(s.closed!=null) {

    147                 throw (IOException)new IOException("Pipe is already closed").initCause(s.closed);

    148             }

 

Suggestions on where to start? Any core developers that could give some hints?

 

Best regards

Gustaf Lundh – Sony Ericsson

 

GLundh

unread,
Jan 13, 2011, 11:46:53 AM1/13/11
to hudso...@googlegroups.com

Sorry, about the messed up formatting in the previous mail. Will try to avoid
outlook for future posts. Please check on nabble for a reformatted version
of previous mail.

I'll try to clarify a bit more:

Whenever a plug-in (or other code running on the master) is using
FilePath.read() to read data from a remote file, the slave process starts to
write ProxyOutputStream$Chunks until it have sent the full file. "The full
file"-part is the main issue here.

I'll try to explain why; when you close the InputStream returned by
FilePath.read, there is no mechanism to let the Slave know this.

So the poor slave will continue to send data in chunks until the full file
is "sent".

The FastPipedOutputStream (at the master) will continue to receive the data,
and write to our InputStream's buffer. First at this point, it will know
notice what the Pipe is closed, and therefor throw an exception "Pipe is
already closed...". Then the next chunks (sent from the slave) comes in.
Rinse and repeat.

This means that if you have a plug-in that only reads a little chunk of data
from a larger (and remotely located) file and then closes then stream, the
slave will still continue to send chunk after chunk and the
FastPipedOutputStream (at the master) continues trying to write it down -
Generating a loads of exceptions.

A few plug-ins suffers from this behavior. The Emma plug-in among others.

So, the main issue here, is that there is no mechanism to let the slave know
that it should stop sending. Looking at the Remoting-code I'm wondering if
the FileStream.read() was ever supposed to be used outside the unzip() and
untar() methods (where only full reads are performed).

If this is the case, then perhaps it should be documented and read() should
be changed to private.

If it is a bug, a few hints on how to achieve the correct behavior would be
appreciated.

Gustaf Lundh - Sony Ericsson
--
View this message in context: http://hudson.361315.n4.nabble.com/Regarding-the-remoting-issues-seen-in-later-Hudson-releases-Core-Emma-plug-in-etc-tp3207205p3216166.html
Sent from the Hudson dev mailing list archive at Nabble.com.

Rob Petti

unread,
Jan 13, 2011, 11:58:28 AM1/13/11
to hudso...@googlegroups.com
I'm not sure if that's the only issue here. You're talking about the
slave sending too much data, when a lot of plugins are failing because
not enough data is getting through. For example, the Cobertura (I
think) and the perforce plugin both suffer from occasional truncation
of their data streams from slaves. Obviously incomplete files or
command output from slaves will cause all sorts of havoc, as evidenced
by the plethora of bugs filed for issues caused by this.
Reply all
Reply to author
Forward
0 new messages