Re: [nodejs] New Streams confusion

653 views
Skip to first unread message

Nathan Rajlich

unread,
Mar 18, 2013, 3:08:24 PM3/18/13
to nod...@googlegroups.com
I also find that implementing a custom Writable stream (one callback) is easier than the .read() function that we have now (event, synchronous function with inconsistent return value). I've brought up my concerns about .read() before[0] but unfortunately it was too late and now we're stuck with what we got.

So in short, *no*, I don't think there's any benefit for using the .read() function directly rather than implementing a separate Writable class. As far as I'm concerned the only API for consuming readables is .pipe().

0: https://github.com/joyent/node/pull/4835#issuecomment-14024535 

On Mon, Mar 18, 2013 at 11:06 AM, Sigurgeir Jonsson <ziggy.jo...@gmail.com> wrote:
The new streams have excellent support for high/low watermarks and auto-pausing/resuming, but the documentation confuses me a little... particularly the read method.

When I read the new docs for the first time I was under the impression that the optimal way to become a user of a stream is to write loops around the read functio.  However in practice I find myself simply writing custom writeStreams and use the callback to control upstream pressure (in addition to source Watermarks if needed).   Here is an example where I move the output to a queue that executes a custom function in parallel (i.e. uploading to a database)    https://gist.github.com/ZJONSSON/5189249

Are there any benefits to using the read method directly on a stream vs. piping to a custom Writable stream?  

--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Bruno Jouhier

unread,
Mar 18, 2013, 4:24:23 PM3/18/13
to nod...@googlegroups.com

Nathan Rajlich

unread,
Mar 18, 2013, 4:41:14 PM3/18/13
to nod...@googlegroups.com
You have been very vocal indeed Bruno, I'm sorry that your proposal hasn't gained more traction.

Bruno Jouhier

unread,
Mar 19, 2013, 4:57:36 AM3/19/13
to nod...@googlegroups.com


On Monday, March 18, 2013 9:41:14 PM UTC+1, Nathan Rajlich wrote:
You have been very vocal indeed Bruno, I'm sorry that your proposal hasn't gained more traction.

@Nathan

Less than 10 posts in  8 months. I've been rather quiet :-)

My proposal was exactly what you describe in your post (https://github.com/joyent/node/pull/4835#issuecomment-14024535). I don't really feel sorry for us but for the community. There was a great opportunity to simplify the streams API and it didn't happen. Also, this is not speculation: we now have almost 2 years of hands-on experience with the simple callback API and it just works!

Floby

unread,
Mar 19, 2013, 5:12:38 AM3/19/13
to nod...@googlegroups.com
I have a question regarding the new streams to which I could not find any answer.

Is it okay to use streams as before (aka "old mode") ?

Floby

unread,
Mar 19, 2013, 5:13:17 AM3/19/13
to nod...@googlegroups.com
The real question being: will the pain of using it go away?

Fedor Indutny

unread,
Mar 19, 2013, 5:17:05 AM3/19/13
to nod...@googlegroups.com
Floby,

It'll generally work just fine, but please consider using streams2 in the new code.

Cheers,
Fedor.


--
Message has been deleted

Marco Rogers

unread,
Mar 20, 2013, 4:45:32 AM3/20/13
to nod...@googlegroups.com
@Nathan's response is right. Creating a writable stream is preferable in most cases. But I wanted to add a little context to that. If you're dealing with a base readable stream, it's just pushing chunks of data at you off the wire. Your first task is to collect those chunks into meaningful data. So IMO the reason creating a writable stream is preferable is because it prompts you not just read off the stream, but to create semantics around what the new stream is supposed to be. The api reflects this opinion and that's why creating writable streams feels like the more natural way, and the ugliness of dealing with read() is wrapped up in the pipe() method. It was kind of designed that way.

But the read() api was also designed for a use case. It's meant to handle low/high water marks effectively, as well as enable more optimized special parsing by reading off specific lengths of chunks. These were things that people kept needing, but the old api didn't support well. If you were writing a library for a special parser, you might write a custom Writable stream and inside it you would be using the read(n) api to control *how* you read data off the socket. I hope that makes sense.

:Marco

Sigurgeir Jonsson

unread,
Mar 21, 2013, 5:01:41 PM3/21/13
to nod...@googlegroups.com
Thanks for all the answers. I almost forgot to look back at this thread as the custom writeStreams have exceeded the high expectation I had already for Streams 2.
For me, the reference manual was a little confusing, as there are complete examples on using the read method, no mention of  "reading" through a writeStream endpoint.

Marco, I agree that that read has more detailed control of minimum incoming content.  However I wonder if it would be more efficient to default pipe.chunkSize to a "lowWatermark" of the receiver (if defined).   This lowWatermark could be adjusted dynamically and the callback in the writable should keep sequence of events under control?

Anyway, thanks Node team, I'm very impressed!

Isaac Schlueter

unread,
Mar 21, 2013, 7:27:02 PM3/21/13
to nodejs
re old-mode

Yes, that's fine. If you just want to get all the data asap, use
on('data', handler). It'll work great, and it's still very fast.
pause()/resume(), the whole bit. (The difference is that it won't
emit data until you're listening, and pause() will *actually* pause.)


Re read(cb)

It's problematic for reasons that I've discussed all of the places
where it's been brought up. That horse is dead, let's stop beating
it. (There were a few other proposals as well, btw. Reducibles and
some other monadic approaches come to mind.)


Re pipe() vs looping around read() vs custom Writable vs on('data')

Whatever works for your case is fine. It's flexible on purpose, and
allows more types of consumption than streams1, and creating custom
writables is easier than it was in streams1.

If you find something that the API can't do for you, or find yourself
doing a lot of backflips or overriding a lot of methods to get your
stuff working, then let's chat about it in a github issue. You might
be missing something, or you might have found a genuine shortcoming in
the API.

Michael Jackson

unread,
Mar 25, 2013, 4:28:51 PM3/25/13
to nod...@googlegroups.com
Is it correct to assume that a Readable won't emit the "readable" event until you're registered for it?

Reading through the streams2 docs, I was under the impression that all streams start out paused and don't start emitting data until you add either a "data" (for old streams) or a "readable" listener. For new streams, this should mean that they don't emit "readable" until at least one listener is registered. Otherwise we still need to do some buffering in order to capture all the data.

For example, this code misses the readable event on node 0.10:

    var http = require('http');

    http.get('http://www.google.com', function (response) {
      console.log('got response with status ' + response.statusCode);

      setTimeout(function () {
        response.on('readable', function () {
          console.log('readable');
        });

        response.on('end', function () {
          console.log('end');
        });
      }, 5);
    });

Here's my shell session:

$ node -v
v0.10.0
$ node http-test.js 
got response with status 200
$

Is this the correct behavior?

--
Michael Jackson
@mjackson

Luke Arduini

unread,
Mar 25, 2013, 4:31:16 PM3/25/13
to nod...@googlegroups.com
new streams don't emit data events without a listener

Dan Milon

unread,
Mar 25, 2013, 4:32:16 PM3/25/13
to nod...@googlegroups.com, Michael Jackson
readable is emitted after you've actually started reading.
In your example, you dont ever `response.read()`, so no readable event
is ever emitted.

As you said, streams start in paused state and ready to be read.
> <ziggy.jo...@gmail.com <mailto:ziggy.jo...@gmail.com>>
> <mailto:nod...@googlegroups.com>
> > To unsubscribe from this group, send email to
> > nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
> > For more options, visit this group at
> > http://groups.google.com/group/nodejs?hl=en?hl=en
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups
> > "nodejs" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send an
> > email to nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
>
> --
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> <mailto:nod...@googlegroups.com>
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>
> ---
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>.

Michael Jackson

unread,
Mar 25, 2013, 4:42:49 PM3/25/13
to Dan Milon, nod...@googlegroups.com
readable is emitted after you've actually started reading.

That's not what it says in the docs.

###
Event: 'readable'
When there is data ready to be consumed, this event will fire.
When this event emits, call the read() method to consume the data.
###

Calling stream.read *before* you get the "readable" event is totally counterintuitive.

--
Michael Jackson
@mjackson

Dean Landolt

unread,
Mar 25, 2013, 4:49:45 PM3/25/13
to nod...@googlegroups.com, Dan Milon
You can always call `stream.read`, at any time. This is how data is pulled off the stream (instead of it being pushed to you, whether you're ready or not). Because of this you won't lose any data. With new streams there's no real notion of a paused state -- it's always paused. Once you grok that it may not seem so counter-intuitive.

The `readable` event is like a corollary to `drain` -- there to tell you that it's worth bothering with a call to read. You don't have to listen for it -- a (needlessly inefficient) stream reader could just as easily poll stream.read for new data periodically.

Dan Milon

unread,
Mar 25, 2013, 4:50:44 PM3/25/13
to Michael Jackson, nod...@googlegroups.com
You're right, my bad.

But still, data stay in the buffer until someone tries to `.read()`.
So, if you're being passed a stream that you dont know whether the
first `readable` event has fired, you can try to actually read from
it. If it returns null, then you wait for `readable`.

On 03/25/13 22:42, Michael Jackson wrote:
> readable is emitted after you've actually started reading.
>
>
> That's not what it says in the docs
> <http://nodejs.org/api/stream.html#stream_event_readable>.
> <mailto:ziggy.jo...@gmail.com
>> <mailto:nod...@googlegroups.com
>> <mailto:nod...@googlegroups.com>>
>>> To unsubscribe from this group, send email to
>>> nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
>> <mailto:nodejs%2Bunsu...@googlegroups.com
> <mailto:nodejs%252Buns...@googlegroups.com>>
>>> For more options, visit this group at
>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>>
>>> --- You received this message because you are subscribed to
>>> the
> Google
>> Groups
>>> "nodejs" group. To unsubscribe from this group and stop
>>> receiving emails
> from it,
>> send an
>>> email to nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
>> <mailto:nodejs%2Bunsu...@googlegroups.com
> <mailto:nodejs%252Buns...@googlegroups.com>>.
>>> For more options, visit
> https://groups.google.com/groups/opt_out.
>>>
>>>
>>
>> -- -- Job Board: http://jobs.nodejs.org/ Posting guidelines:
>>
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>
>
>
You received this message because you are subscribed to the
> Google
>> Groups "nodejs" group. To post to this group, send email to
>> nod...@googlegroups.com
> <mailto:nod...@googlegroups.com>
>> <mailto:nod...@googlegroups.com <mailto:nod...@googlegroups.com>>
>> To unsubscribe from this group, send email to
>> nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
>> <mailto:nodejs%2Bunsu...@googlegroups.com
> <mailto:nodejs%252Buns...@googlegroups.com>>
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>
>> --- You received this message because you are subscribed to the
>> Google Groups "nodejs" group. To unsubscribe from this group and
>> stop receiving emails from it, send an email to
>> nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
>> <mailto:nodejs%2Bunsu...@googlegroups.com
> <mailto:nodejs%252Buns...@googlegroups.com>>.

Mikeal Rogers

unread,
Mar 25, 2013, 4:53:13 PM3/25/13
to nod...@googlegroups.com, Michael Jackson
You *must* try to read from it. Otherwise it's likely to remain "paused."
> To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.

Marco Rogers

unread,
Mar 25, 2013, 5:01:04 PM3/25/13
to nod...@googlegroups.com, Dan Milon
I haven't experimented with streams2 as much as I should have. But I remember talking to Isaac about it early on. The way I think about it is still the same.

It feels like the semantics of how node streams produce data is much more consistent and predictable now. Node still starts by reading data off the fd fast. Instead of pushing it through to application code immediately (the old way), it starts buffering that in memory so it's ready for you to read. If you're ready to read that, then you can do it in a pull fashion by called stream.read(). The buffering is now controlled by the high and low water marks. This is a mechanism for making sure we don't fill up memory. We can more efficient with slow and fast consumers/producers. If the in memory boundaries are reached, then node will start exerting back pressure. You don't have to ask for this or manage it. It's consistent and controlled by the high/low water marks.

So this makes sense to me. It's really the semantics of the api around this that we need to play around with. So "readable" is an event that says there is data to read. But even if you missed the readable event, or you're early and it hasn't been fired, there may still be data to read. The stream.read() method is decoupled from that convenience event. If you call stream.read() it'll block until there's data or until the stream closes for some other reason. The convenience of "readable" is that it gives you a framework for getting around the blocking nature of read(). Because blocking when there's nothing to do is bad. "Readable" lets us still consume data in a way that's semantically more understandable than push "data" events, but also still efficient in a way that pure blocking calls is not. NOT using "readable" is still viable, but it puts you in a situation where you don't know how to be most efficient with reads.

Does that make sense? I think it fits with other answers here as well. I'm sure folks will correct me where I'm off base.

:Marco



You received this message because you are subscribed to a topic in the Google Groups "nodejs" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nodejs/8VGu32aczR0/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to nodejs+un...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Marco Rogers
marco....@gmail.com | https://twitter.com/polotek

Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz

Dean Landolt

unread,
Mar 25, 2013, 5:23:35 PM3/25/13
to nod...@googlegroups.com, Dan Milon
On Mon, Mar 25, 2013 at 5:01 PM, Marco Rogers <marco....@gmail.com> wrote:
I haven't experimented with streams2 as much as I should have. But I remember talking to Isaac about it early on. The way I think about it is still the same.

It feels like the semantics of how node streams produce data is much more consistent and predictable now. Node still starts by reading data off the fd fast. Instead of pushing it through to application code immediately (the old way), it starts buffering that in memory so it's ready for you to read. If you're ready to read that, then you can do it in a pull fashion by called stream.read(). The buffering is now controlled by the high and low water marks. This is a mechanism for making sure we don't fill up memory. We can more efficient with slow and fast consumers/producers. If the in memory boundaries are reached, then node will start exerting back pressure. You don't have to ask for this or manage it. It's consistent and controlled by the high/low water marks.

So this makes sense to me. It's really the semantics of the api around this that we need to play around with. So "readable" is an event that says there is data to read. But even if you missed the readable event, or you're early and it hasn't been fired, there may still be data to read. The stream.read() method is decoupled from that convenience event. If you call stream.read() it'll block until there's data or until the stream closes for some other reason.

I like this is a general metaphor, but it's important to note read will return `null` if there's no data, not block indefinitely. It's up to you to call it again later when it may have data. The best way to do that is to listen for the `readable` event, preferably with a `once` handler.

Michael Jackson

unread,
Mar 25, 2013, 5:29:44 PM3/25/13
to Dan Milon, nod...@googlegroups.com
If stream.read returns null that could mean one of two things:

  1) the stream doesn't currently have any data, but still might have some in the future
  2) the stream is already ended

AFAICT, the only way you can know which state you're in is by checking the stream.readable property which is marked as being a "legacy" property in the code, so it seems like it's not a good idea to rely on that property either.

--
Michael Jackson
@mjackson

Marco Rogers

unread,
Mar 25, 2013, 5:36:20 PM3/25/13
to nod...@googlegroups.com, Dan Milon
You're absolutely right. I totally forgot about that aspect of the semantics. It changes my metaphor a little. So stream.read() isn't a blind blocking call that can get you into lots of trouble, which I think my previous message implied. Instead it's a peek into the underlying semantics of how node is managing data off the fd. As I said, managing that data off the fd now happens under the covers in node and is managed consistently. Calling stream.read() gives you a hook into that consistent process and should come with a few expectations.

1) If there is data returned from read(), then the stream is active and you should keep calling read() until it returns null.
2) Calling read() is a signal to node that if backpressure is being exerted, e.g. you're in paused state, that should stop and you're ready to resume pulling off the fd.
3) If read returns null, you can assume you've left the stream in an active state and node wants to give you more data when it becomes available. The way node will signal that to you is the next "readable" event. So you should probably listen for that.
4) By calling read(), you are explicitly pulling out of the in memory buffer and thus explicitly affecting the high/low watermark calculations. This is where things still get a little murky for me. I'm not sure what the implications are here in terms of how application code should react.

Again, please add to this. I like talking to people about streams and may do more talks soon. So this is helpful to frame my understanding and help me convey it to other people.

:Marco

Marco Rogers

unread,
Mar 25, 2013, 5:40:05 PM3/25/13
to nod...@googlegroups.com, Dan Milon
I'm more and more convinced that need to go back and read all the available info about streams2. Answering these detail semantics questions is pretty important.

:Marco

You received this message because you are subscribed to a topic in the Google Groups "nodejs" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nodejs/8VGu32aczR0/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to nodejs+un...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--

Mikeal Rogers

unread,
Mar 25, 2013, 5:41:29 PM3/25/13
to nod...@googlegroups.com
This thread is pretty huge.

At this point, would people say there is more confusion about streams2 than old streams? I know that some of this is a little hard to get our heads around but i always got the feeling that only about 10 people really understood all of old streams.

-Mikeal

You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.

Marco Rogers

unread,
Mar 25, 2013, 5:45:26 PM3/25/13
to nod...@googlegroups.com
Yeah, I think the old streams still had lots of confusion around them and the adoption wasn't great. If we can't beat that with streams2 then we are doing a lot of work for not much gain. I think there are common themes that barrier to understanding with both approaches. I've been trying to think about what those are in between the billion other things I've got going.

:Marco

Michael Jackson

unread,
Mar 25, 2013, 5:49:59 PM3/25/13
to Dan Milon, nod...@googlegroups.com
Ah, looking through the Readable code a bit more it seems that the end event won't ever fire until read is called at least once.

So I guess what I could do is call read() *and* register an "end" event handler in the same tick. That way I'll be sure to get the data flowing.

Thanks!

--
Michael Jackson
@mjackson


On Mon, Mar 25, 2013 at 2:29 PM, Michael Jackson <mjija...@gmail.com> wrote:

Michael Jackson

unread,
Mar 25, 2013, 5:55:20 PM3/25/13
to nod...@googlegroups.com, Dan Milon
Ok, that makes sense.

So the readable event is more of an advisory event. The docs should probably say something about how you could possibly miss the event entirely if you're doing some other IO before you try and read from the stream.

For posterity's sake, I adjusted my previous example:

    var http = require('http');

    http.get('http://www.google.com', function (response) {
      console.log('got response with status ' + response.statusCode);

      setTimeout(function () {
        bufferStream(response, function (err, buffer) {
          console.log(buffer.toString());
        });
      }, 1000);
    });

    function bufferStream(stream, callback) {
      var chunks = [];

      var chunk = stream.read();
      if (chunk) {
        chunks.push(chunk);
      }

      stream.on('readable', function () {
        chunks.push(stream.read());
      });

      stream.on('error', function (error) {
        callback(error);
      });

      stream.on('end', function () {
        callback(null, Buffer.concat(chunks));
      });
    }

You can use the bufferStream function to catch all data on the stream, no matter how far in the future you are.

--
Michael Jackson
@mjackson

Dan Milon

unread,
Mar 25, 2013, 6:20:53 PM3/25/13
to nod...@googlegroups.com
That's not guaranteed to work.

You're assuming that `stream.read()` will return the whole internal
buffer, which is not documented anywhere.

The right approach is to call `.read()` until it returns null.
Something like that:

function collectStream(stream, cb) {
var bufs = []

function read() {
var chunk

while ((chunk = stream.read()) != null) {
bufs.push(chunk)
}
}

stream.on('error', cb)

stream.on('readable', read)

stream.on('end', function () {
cb(null, Buffer.concat(bufs))
})

read()
> is /pulled/ off the stream (instead of it being pushed to you,
> whether you're ready or not). Because of this you won't lose any
> data. With new streams there's no real notion of a paused state --
> it's always paused. Once you grok that it may not seem so
> counter-intuitive.
>
> The `readable` event is like a corollary to `drain` -- there to
> tell you that it's worth bothering with a call to read. You don't
> /have/ to listen for it -- a (needlessly inefficient) stream reader
> could just as easily poll stream.read for new data periodically.
>
>
> On Mon, Mar 25, 2013 at 4:42 PM, Michael Jackson
> <mjija...@gmail.com <mailto:mjija...@gmail.com>> wrote:
>
> readable is emitted after you've actually started reading.
>
>
> That's not what it says in the docs
> <http://nodejs.org/api/stream.html#stream_event_readable>.
> <mailto:ziggy.jo...@gmail.com
>> <mailto:nod...@googlegroups.com
> <mailto:nod...@googlegroups.com>>
>>> To unsubscribe from this group, send email to
>>> nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
>> <mailto:nodejs%2Bunsu...@googlegroups.com
> <mailto:nodejs%252Buns...@googlegroups.com>>
>>> For more options, visit this group at
>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>>
>>> --- You received this message because you are subscribed
> to the Google
>> Groups
>>> "nodejs" group. To unsubscribe from this group and stop
>>> receiving
> emails from it,
>> send an
>>> email to nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
>> <mailto:nodejs%2Bunsu...@googlegroups.com
> <mailto:nodejs%252Buns...@googlegroups.com>>.
>>> For more options, visit
> https://groups.google.com/groups/opt_out.
>>>
>>>
>>
>> -- -- Job Board: http://jobs.nodejs.org/ Posting guidelines:
>>
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>
> You received this message because you are subscribed to the
> Google
>> Groups "nodejs" group. To post to this group, send email to
> nod...@googlegroups.com <mailto:nod...@googlegroups.com>
>> <mailto:nod...@googlegroups.com
> <mailto:nod...@googlegroups.com>>
>> To unsubscribe from this group, send email to
>> nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
>> <mailto:nodejs%2Bunsu...@googlegroups.com
> <mailto:nodejs%252Buns...@googlegroups.com>>
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>
>> --- You received this message because you are subscribed
> to the Google
>> Groups "nodejs" group. To unsubscribe from this group and stop
>> receiving
> emails from it,
>> send an email to nodejs+un...@googlegroups.com
> <mailto:nodejs%2Bunsu...@googlegroups.com>
>> <mailto:nodejs%2Bunsu...@googlegroups.com
> <mailto:nodejs%252Buns...@googlegroups.com>>.
> nod...@googlegroups.com <mailto:nod...@googlegroups.com> To

Michael Jackson

unread,
Mar 25, 2013, 6:48:19 PM3/25/13
to nod...@googlegroups.com
I can see what you're saying, but the node docs do say that if you don't pass a size argument to stream.read then the entire contents of the internal buffer are returned.

In any case, this would all be a lot easier if the readable event were guaranteed to fire when a new readable listener is registered for the first time.

--
Michael Jackson
@mjackson


To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.

Mark Hahn

unread,
Mar 25, 2013, 6:51:08 PM3/25/13
to nodejs
How did this get so bloody complicated?

Michael Jackson

unread,
Mar 25, 2013, 6:55:10 PM3/25/13
to nod...@googlegroups.com, Dan Milon
A small amendment to #3: if read returns null it could also mean that the stream is ended. So you need to register for the "end" event as well. It looks like streams internals refuse to fire "end" until at least one call to stream.read has happened, so you're guaranteed to get that event.

It's the fact that you're not guaranteed to get the readable event that bugs me a little...

--
Michael Jackson
@mjackson

Michael Jackson

unread,
Mar 25, 2013, 7:40:26 PM3/25/13
to nod...@googlegroups.com
I don't think there's necessarily any *more* confusion over streams2 than there ever was over old streams, just a different set of questions to answer.

Part of the reason I was so excited about streams2 is that I thought it meant I could finally deprecate BufferedStream in favor of 0.10's Readable. I wrote BufferedStream back in the 0.4 days when streams would spew data regardless of whether or not you had registered for "data" events or called pause(). The basic idea behind the library was that you could pass whatever you wanted to the constructor (a stream, a string, a Buffer, or what have you) and that you would get back a predictable interface for handling streaming data without losing any events.

It was incredibly difficult to get all the semantics of streams right, but I think we came pretty close. But streams2 opens a whole new can of worms. Now we're not just keeping compatibility with the old behavior (well, pausing seems to work a lot better now) but introducing an entirely new method of consuming streams on top of that. I've found myself diving into the _stream_readable.js code a lot more than I would have liked over the past few weeks just to figure out how things work. It's an incredibly complex piece of code. Hats off to the core team - I know how tricky it can get. But honestly, when you start cracking open node core code just to figure out how things work you start relying on implementation details that may not even be specified fully. So it's kind of a precarious place to be.

Looking back, the data/pause/resume/end/error API isn't too bad. The main problem with it was that a lot of streams didn't really care if you paused them. Fixing that piece and leaving the rest (buffering, duplex, transform, etc.) up to npm modules would have been a great baby step without introducing an entirely new API.

Related: the most intuitive streams implementation I've ever used is (gasp!) PHP's. You just call fopen with a URL (file://, http://, zlib://, etc.) and boom, you've got a handle to a stream of data that every other function in the stdlib knows how to use. You can specify custom wrappers that know how to open different URL schemes, stack as many filters as you like on top of them. You never need to worry about data events, or what state different streams are in. You can seek to different positions in the stream, read just a certain number of bytes, whatever. It's just super simple and super powerful.

--
Michael Jackson
@mjackson


On Mon, Mar 25, 2013 at 2:41 PM, Mikeal Rogers <mikeal...@gmail.com> wrote:

Mikeal Rogers

unread,
Mar 25, 2013, 7:53:37 PM3/25/13
to nod...@googlegroups.com
On Mar 25, 2013, at 4:40PM, Michael Jackson <mjija...@gmail.com> wrote:

I don't think there's necessarily any *more* confusion over streams2 than there ever was over old streams, just a different set of questions to answer.

Part of the reason I was so excited about streams2 is that I thought it meant I could finally deprecate BufferedStream in favor of 0.10's Readable. I wrote BufferedStream back in the 0.4 days when streams would spew data regardless of whether or not you had registered for "data" events or called pause(). The basic idea behind the library was that you could pass whatever you wanted to the constructor (a stream, a string, a Buffer, or what have you) and that you would get back a predictable interface for handling streaming data without losing any events.

It was incredibly difficult to get all the semantics of streams right, but I think we came pretty close. But streams2 opens a whole new can of worms. Now we're not just keeping compatibility with the old behavior (well, pausing seems to work a lot better now)

Is this true?

In request I'm just abusing the backwards compatibility mode by calling .resume() on stuff a lot.

In a future release i'll pull in readable-stream and inherit from it and using the old stream wrappers so that all of request's code is using the new API.

It seems like we get to pick one API or the other. I can make all of my code use the new API or I can abuse the reverse compatibility, either way I'm only using one API.

Michael Jackson

unread,
Mar 25, 2013, 8:03:44 PM3/25/13
to nod...@googlegroups.com
The use case I always use when testing this kind of thing is http's ClientResponse object. It used to (in the 0.4 days) completely ignore pause(), as did ServerRequest. I wrote BufferedStream so I could just ignore it and move forward. However, ClientResponse seems to be able to pause() just fine in 0.10.

I would agree that's it's wise to pick one API or another. These days I'm in the process of upgrading to 0.10 and ditching all my old usage patterns, so that's why I'm running into the corner cases.

--
Michael Jackson
@mjackson

Isaac Schlueter

unread,
Mar 25, 2013, 10:13:33 PM3/25/13
to nodejs
If you add a `readable` handler, then `'readable'` (and potentially
`'end'`, if there's no data) will be emitted.

> So the readable event is more of an advisory event. The docs should probably say something about how you could possibly miss the event entirely if you're doing some other IO before you try and read from the stream.

The docs don't say that because it's not true. (Unless, of course,
someone *else* is listening for it, or reading from it, but then it's
not really *your* stream, now, is it?)

This is perfectly safe:

```
net.createServer(function(sock) {
doSomeIOandTHEN(function() {
sock.on('readable', function() {
readStuff(sock)
})
})
})
```

> It seems like we get to pick one API or the other. I can make all of my code use the new API or I can abuse the reverse compatibility, either way I'm only using one API.

I don't see how you can call this "abuse", really. It's not as if
there's "new API" and "old API" living side by side here. There's
just "The API". It's not bad to use any part of it.

One thing that sort of is unfortunate is that there's no way to switch
out of flowing-mode (pause/resume) without wrapping. So far, this
isn't much of a problem, because you usually decide what you'er going
to do with a stream, and then do it. (Either pipe, consume in chunks,
or pause/resume/on('data') it.)

So, for a module like request, you probably want to be a bit less
opinionated. If you know you're going to just collect upthe entire
response body, and pass to a callback, then fine. But if you're going
to pass the stream along, then setting it into flowing-mode is not so
great, because you don't know the user wants that.

> Part of the reason I was so excited about streams2 is that I thought it meant I could finally deprecate BufferedStream in favor of 0.10's Readable.

Part of the reason we wrote streams2 was so that you could deprecate
BufferedStream :)

> streams would spew data regardless of whether or not you had registered for "data" events or called pause().

Both bugs are fixed now.

> But honestly, when you start cracking open node core code just to figure out how things work you start relying on implementation details that may not even be specified fully.

Well, when RTFM doesn't give you the answers you need, then RTFS is
the next step. It could mean that the docs are lacking, and as you
know, help is always welcome there. If you find a question not
answered by the documentation, send a doc patch with the answer when
you do find it.

> Looking back, the data/pause/resume/end/error API isn't too bad.

And it's still there. It's not wrong to use it. Go nuts, really.

> The main problem with it was that a lot of streams didn't really care if you paused them.

There were other problems as well. Node-core has a ton of streams,
and each of them had completely different implementations of all the
streams semantics, with hardly any code reuse at all, and in many
cases, completely unique and opposite bugs and behavior. It was a
nightmare to maintain.

Streams are a core API. Like all core APIs, they're a node paradigm
that it pays to follow and use. But ultimately, like callbacks and
EventEmitters and all the rest, they're there to facilitate the APIs
that node presents. If you'd rather build your own modules in
different ways, go crazy. Really. That's what innovation means.

> Related: the most intuitive streams implementation I've ever used is (gasp!) PHP's. You just call fopen with a URL (file://, http://, zlib://, etc.) and boom, you've got a handle to a stream of data that every other function in the stdlib knows how to use.

Well, of course, we're not going to rely on magic strings, protocol
registration, or have a method called `fopen` that you pass http://
urls. That stuff is just completely crazy.

But if you present a stream for anything (say, a file, http, zlib,
etc.) then you can .pipe() to/from it from/to any other stream. If
you just use .pipe() (or some data collection method like was
presented earlier in this thread) then you have a similar sort of
portability. You can't seek() in streams, but you can read() just as
many bytes as you like. You can even create your own streams or apply
as many filters as you like, and have something that everything else
in the stdlib knows how to deal with.

Michael Jackson

unread,
Mar 26, 2013, 12:33:45 AM3/26/13
to nod...@googlegroups.com
If you add a `readable` handler, then `'readable'` (and potentially
`'end'`, if there's no data) will be emitted.

Then there's a bug.
 
> So the readable event is more of an advisory event. The docs should probably say something about how you could possibly miss the event entirely if you're doing some other IO before you try and read from the stream.

The docs don't say that because it's not true.  (Unless, of course,
someone *else* is listening for it, or reading from it, but then it's
not really *your* stream, now, is it?)

This is perfectly safe:

```
net.createServer(function(sock) {
  doSomeIOandTHEN(function() {
    sock.on('readable', function() {
      readStuff(sock)
    })
  })
})
```

Thanks for confirming. I was under the impression that this is how the API is supposed to work, but was confused because that's not what I'm seeing on node 0.10.1. I've filed an issue so we can continue the discussion there. The example code in the ticket confirms that nobody else is listening for the `readable` event or reading from the stream, so we can safely assume that the stream is in fact *my* stream and rule that out.
 
> It seems like we get to pick one API or the other. I can make all of my code use the new API or I can abuse the reverse compatibility, either way I'm only using one API.

I don't see how you can call this "abuse", really.  It's not as if
there's "new API" and "old API" living side by side here.  There's
just "The API".  It's not bad to use any part of it.

Then we probably shouldn't call it an "old" API in the docs. In a fast-moving target like node, using any API that is called "old" in the docs just feels like instant technical debt.
 
> Part of the reason I was so excited about streams2 is that I thought it meant I could finally deprecate BufferedStream in favor of 0.10's Readable.

Part of the reason we wrote streams2 was so that you could deprecate
BufferedStream :)

Gladly, once I can see that it's working! :D
 
It could mean that the docs are lacking, and as you
know, help is always welcome there.  If you find a question not
answered by the documentation, send a doc patch with the answer when
you do find it.

Absolutely. But if we all can't agree what the expected behavior is, then it seems a bit premature to send pull requests for the docs. That's in part what I'm using this mailing list for. To ping other people and see how *they* think the APIs work. So far in this thread we've already had several misunderstandings about how things work in streams2. Once I can actually *confirm* something one way or the other I'm more than happy to put it in the docs.
 
> Looking back, the data/pause/resume/end/error API isn't too bad.

And it's still there.  It's not wrong to use it.

If I can't see the readable event (see the ticket I filed above) then it's not only "not wrong" to use pause/resume but actually *necessary* in certain cases.

--
Michael Jackson
@mjackson
 

Michael Jackson

unread,
Mar 26, 2013, 2:40:26 PM3/26/13
to nod...@googlegroups.com
On Mon, Mar 25, 2013 at 4:53 PM, Mikeal Rogers <mikeal...@gmail.com> wrote:

On Mar 25, 2013, at 4:40PM, Michael Jackson <mjija...@gmail.com> wrote:

(well, pausing seems to work a lot better now)

Is this true?

In request I'm just abusing the backwards compatibility mode by calling .resume() on stuff a lot.

I ran quite a few tests on various node core streams last night and this morning and it seems that pause/resume actually do what they're supposed to in 0.10. Streams start out "paused" so you shouldn't need to pause/resume yourself. At least in every case that I can think of.

However, until #5141 is fixed the most reliable way to read from a stream is still the "data" event, not "readable" and stream.read. To illustrate, here is the simplest example that I can come up with to reliably buffer the entire contents of a stream in 0.10, whether called immediately or after some IO has occurred.

```
function bufferStream(stream, callback) {
  var chunks = [];

  stream.on('data', function (chunk) {
    chunks.push(chunk);
  });

  stream.on('end', function () {
    callback(null, Buffer.concat(chunks));
  });

  stream.on('error', callback);
}
```

Instead of relying on the "readable" event, we use the "data" event which puts the stream into "old" mode. Then we let node to all the stream.read's for us. We don't have to check the return value of stream.read and we don't have to worry about missing anything.

--
Michael Jackson
@mjackson

Dean Landolt

unread,
Mar 26, 2013, 2:59:49 PM3/26/13
to nod...@googlegroups.com
On Tue, Mar 26, 2013 at 2:40 PM, Michael Jackson <mjija...@gmail.com> wrote:
On Mon, Mar 25, 2013 at 4:53 PM, Mikeal Rogers <mikeal...@gmail.com> wrote:

On Mar 25, 2013, at 4:40PM, Michael Jackson <mjija...@gmail.com> wrote:

(well, pausing seems to work a lot better now)

Is this true?

In request I'm just abusing the backwards compatibility mode by calling .resume() on stuff a lot.

I ran quite a few tests on various node core streams last night and this morning and it seems that pause/resume actually do what they're supposed to in 0.10. Streams start out "paused" so you shouldn't need to pause/resume yourself. At least in every case that I can think of.

However, until #5141 is fixed the most reliable way to read from a stream is still the "data" event, not "readable" and stream.read. To illustrate, here is the simplest example that I can come up with to reliably buffer the entire contents of a stream in 0.10, whether called immediately or after some IO has occurred.

You're still doing it wrong. This is like complaining that you can't tell a stream is ended because you waited to attach an `end` event handler. If you've deferred to the event loop you have to check if a stream isn't ended first. Likewise, you should call read and, only when it's exhausted, add a readable handler. Here's your example from the issue you filed reworked to use an adaption of code ripped strait from a streams2 example (totally untested, of course):


    var http = require('http');

    http.get('http://joyent.com/', function (response) {
      setTimeout(function () {
       
        function flow() {
          var chunk;
          while ((chunk = response.read()).length) {
            doSomething(chunk);
          }
          response.once('readable', flow);
        }
        flow();


        response.on('end', function () {
          // oh noes -- using this contrived example this event could be missed too!
          console.log('end');
        });
      }, 10);
    });


Dan Milon

unread,
Mar 26, 2013, 3:05:04 PM3/26/13
to nod...@googlegroups.com
inline

On 26/03/2013 04:13 πμ, Isaac Schlueter wrote:
> If you add a `readable` handler, then `'readable'` (and
> potentially `'end'`, if there's no data) will be emitted.
>
>> So the readable event is more of an advisory event. The docs
>> should probably say something about how you could possibly miss
>> the event entirely if you're doing some other IO before you try
>> and read from the stream.
>
> The docs don't say that because it's not true. (Unless, of
> course, someone *else* is listening for it, or reading from it, but
> then it's not really *your* stream, now, is it?)
>
> This is perfectly safe:
>
> ``` net.createServer(function(sock) { doSomeIOandTHEN(function() {
> sock.on('readable', function() { readStuff(sock) }) }) }) ```
>

Could you provide a functional test case where attaching a readable
listener late, works? According to #5141 [1], it doesn't, at least for
http.request.

[1] https://github.com/joyent/node/issues/5141

Michael Jackson

unread,
Mar 26, 2013, 4:15:02 PM3/26/13
to nod...@googlegroups.com
You're still doing it wrong. This is like complaining that you can't tell a stream is ended because you waited to attach an `end` event handler. If you've deferred to the event loop you have to check if a stream isn't ended first. Likewise, you should call read and, only when it's exhausted, add a readable handler.

I understand what you're saying and I know that works. But unless I'm misinterpreting Isaac that's not how the API was designed to work. I shouldn't have to call stream.read *before* I get a readable event just to start triggering them. It should be perfectly safe to wait a while before registering a readable handler.

--
Michael Jackson
@mjackson

Isaac Schlueter

unread,
Mar 26, 2013, 5:38:24 PM3/26/13
to nodejs
> You're still doing it wrong. This is like complaining that you can't tell a stream is ended because you waited to attach an `end` event handler.

But that's supported, now :) If you don't consume the stream, it
never emits 'end', even if it never has any data. You still have to
read() or on('data') or resume() it.

You could argue that the 'readable' is only for state transitions, I
suppose, and the correct approach is to try to read, and only wait for
'readable' if there's nothing there. But there's little harm in
re-emitting the event, at least the first time, and it's an easy
change to make. (We already special-case 'data' and 'readable' for
other reasons.)

Part of the reason why this is an issue for you is that the HTTP
implementation's interaction with the TCP layer is utter shit. Look
for changes there in 0.12. In the meantime, we should make this work,
if only because some other crappily-built stream is likely to be
similarly broken in this surprising way ;)

Taking the bug report to an issue is 100% the correct move, thanks.
Let's continue the discussion of that specific point there.

Dominic Tarr

unread,
Apr 1, 2013, 1:41:12 PM4/1/13
to nod...@googlegroups.com
@mikeal when I last checked a few weeks ago there where over 350 stream modules in npm.




Mikeal Rogers

unread,
Apr 1, 2013, 1:45:19 PM4/1/13
to nod...@googlegroups.com
350 is a big number until you factor out the number of those written by 3 people (you, substack and Raynos). many modules might exists but, it is my impression, that most of them are written by about a dozen people. this impression could be wrong but i've been around since we first started defining this API and every time i sit down to write a new stream i get bit by something i forgot i needed to do.

-Mikeal

Isaac Schlueter

unread,
Apr 1, 2013, 8:39:25 PM4/1/13
to nodejs
Compared with the stuff that you needed to do with streams1, streams2
is much much easier. Basically:

1. Pick the class you're extending: Readable, Writable, Duplex, Transform
2. Implement the constructor, and one method. (Well, 2 for Duplex,
but that's to be expected.)

If you are doing a non-byte string, then set `objectMode:true` in the
appropriate options object. This is not something I personally
recommend, but hey, some people like it.

For 99% or more of use cases, this is all you need to do.

Compare with streams1:

1. For readable streams, implement pause, resume, emit 'data' and
'end' events at the appropriate time. (Do you buffer while paused?
Answers vary from "always" to "never".)
2. For writable streams, implement write() (which has to return the
appropriate value if it buffers), and make sure to emit 'drain' if you
ever return false, and implement end() (which can also can take a
chunk argument).
3. Make sure to not do anything even slightly wrong!

Jake Verbaten

unread,
Apr 3, 2013, 2:06:01 PM4/3/13
to nod...@googlegroups.com
3. Make sure to not do anything even slightly wrong!

This part is really hard. I don't think I've written a single stream module that complies with it. (and I've written 60. it's really hard)

Isaac Schlueter

unread,
Apr 3, 2013, 3:40:40 PM4/3/13
to nodejs
Dude, tell me about it! Even the streams in core were a total mess!

Having a single place where most of the implementation lives is a huge win.

Floby

unread,
Apr 4, 2013, 12:26:03 PM4/4/13
to nod...@googlegroups.com, i...@izs.me
What i found is that when you do something even slightly wrong you hardly ever notice.
Because when piping streams together, thrown exceptions are silent and don't crash the process. No way to tell where your error comes from.
Reply all
Reply to author
Forward
0 new messages