How to robustly use stream piping, and how to write robust streams?

59 views

Skip to first unread message

Thomas Riccardi

unread,

Mar 5, 2015, 12:02:51 PM3/5/15

to nod...@googlegroups.com

Hi all,

I haven't found a general guide to properly use and implement streams,
so here are some of my questions about nodejs stream api (pipe, error
handing), and core classes using it.
I work on nodejs 0.10.36, but usually checked the validity on nodejs
0.12.0 too.

The use-case is:
http.incomingMessage (from http.server) > some Stream.Transform > Amazon
AWS S3 upload

With Stream2, pipes work well, with backpressure, everything is great.
Except for error handling.
Some actions need to be taken when everything is fine and finished (send
the HTTP response). It's easy: listen to 'finish' events on the last
stream in the pipeline.
Some actions need to be taken when an error happen: send an error HTTP
response, but also abort the AWS S3 upload (otherwise pending MultiPart
upload data is billed indefinitely).
There has already been a discussion on this ML for error handling, and
the recommended solution seems to be to listen to error events on each
stream of the pipe, and do a global cleanup.

However, there is no standard way to abort a stream: it's not part of
the Stream2 (or Stream3) API.
Some streams have a 'destroy()' method to force the end of the stream.
- For example net.socket: it emits the 'close' event with a boolean
argument to specify if an error caused this 'close' event (in which case
an 'error' event was previous emitted).
This 'destroy()' method in fact takes an optional error argument that
will be emitted as 'error' after the destruction. This is not
documented; it should, or it should be removed.
- 'through2'
(https://github.com/rvagg/through2/blob/0fa42c010aa58b562fd3eabfb53c0bc5ddaa60f7/through2.js#L12-L22)
does something similar, although without the boolean argument to 'close'.
- 'http.ClientRequest' (which implements Stream.Writable) has no
'destroy()', but has an 'abort()' method. It only calls 'socket.close()'
(if any), which (usually) emits 'close', and ClientRequest listens to it
and finally emits 'close', and in some cases 'aborted' too (which is not
documented).

If there is no direct method to abort a stream (and maybe even if there
is), the streams down the pipe from the one that failed need to finish.
A solution is to call 'end()' on the first downstream stream (see
http://www.bennadel.com/blog/2692-you-have-to-explicitly-end-streams-after-pipes-break-in-node-js.htm).
But this doesn't signal the downstream stream that the upstream stream
failed, it just signals it ended: all downstream streams should be
properly aborted first (in my initial use-case I don't want to write an
incomplete data to AWS S3, which is what would happen if we just call
.end()).

To sum up, there is no standard way to abort a stream, and when there
are, they don't behave the same (it may emit 'error' and/or 'close', and
maybe some specific events (like 'aborted')). So it's hard to cleanup a
failed pipeline, and to deal with the consequences of the cleanup (the
cleanup may emit some events, in a non standardized way).

What is the recommended way do to this?

Related to that, http.incomingMessage emit neither 'error' nor 'end'
when the HTTP request finishes but is incomplete.
From the Stream.Readable documentation on 'error' and 'end' events, I
would expect one or both of 'error' and 'end' events in this case:
- the http request is incomplete, this is an error, the request should
not be treated as successful: there is missing input data => should emit
'error'
- there will be no more data to read from the request object => should
emit 'end'
In fact this reasoning could be applied for all fatal 'error' events:
after them there will be no more data to read => both 'error' then 'end'
should be emitted.
If we don't want this behavior then the documentation should be changed
to reflect this.

To come back to http.incomingMessage: on aborted request, 'aborted' then
'close' events are emitted.
'aborted' is not even documented, it should, or it should be removed.
Without documentation we cannot rely on it, and fallback to listening to
'close' and 'end', and detect the aborted request if 'end' was not
emitted before 'close', which may not be robust (as it's implementation
dependent), and may even be wrong. ('close' even seems to not be emitted
in normal requests, probably because of HTTP keepAlive).

What is the recommended way to deal with incomplete http.incomingMessage
? Shouldn't the current behavior be modified?

Finally, from a Stream.Writable implementer point-of-view, how to deal
with errors (internal, or external: from an abort/destroy() call)?
There are multiple ways to signal an error:
- emit the 'error' event
- call pending callbacks with an error (writable._write(chunk, encoding,
callback), transform._flush(callback))
Calling the pending callbacks with an error will eventually also emit an
'error' event. But there is not always a pending callback.
The 'error' event will notify upstream streams in the pipe, but how to
notify downstream too?
Which one should be preferred?
Also, could it be useful to guarantee that 'close' is emitted after
'error' in case of aborts?

Thank you,

Thomas

Reply all

Reply to author

Forward

0 new messages