Is Request.pause useless?

2,688 views
Skip to first unread message

Pau

unread,
Jan 4, 2011, 6:17:40 PM1/4/11
to nod...@googlegroups.com
I'm trying to deal with some file uploads, but before I need to do a blocking operation (db access).
I have to pause the request so I can handle the on 'data'. It doesn't work, at all.

Example:

server = require('http').createServer(function(req, res) {
  req.pause();
  req.on('data', function(buffer) {
    console.log('data parsed');
  });

  setTimeout(function () {
    console.log('fffuuuuuuu!');
    req.resume();
  }, 1000);
});

server.listen(3000);

"data parsed" should come after "ffuuuuuuu!" but it doesn't.

Thank's in advance!

Marco Rogers

unread,
Jan 4, 2011, 6:32:02 PM1/4/11
to nodejs
In short, you're doing it wrong :) The first thing to do in node is
start getting used to everything being non-blocking.

Your db operation shouldn't block. It should be asynchronous and have
a callback. In that callback, you write whatever you need to the
response object with res.write(data). Then you call res.end() to send
the response.

Another handy tip: console.log is non-blocking as well. So it's not
exactly accurate for determining what order things happen in. For
convenience you can use console.warn or console.error. They write to
the stderr stream and they are blocking.

:Marco

Pau

unread,
Jan 4, 2011, 6:53:36 PM1/4/11
to nod...@googlegroups.com
I don't agree with your answer. Maybe I didn't explain well enough..

Let's change a little bit the example above.
Imagine I want to process the request AFTER some asynchronous operation (db, file access, whatever).
As you said, I provide a callback, but when I'm on that callback, the request already emited all the 'data' events.

The node documentation recommends using request.pause for this scenarios "Pauses request from emitting events. Useful to throttle back an upload.".
But something wrong is happening because the events are emitted anyway.

// Example where "data parsed" is never shown
server = require('http').createServer(function(req, res) {
  req.pause();

  setTimeout(function () {
    req.on('data', function(buffer) {
      console.log('data parsed');
    });
    req.resume();
  }, 1000);
});

I also addressed the issue to Felix (author of formidable https://github.com/felixge/node-formidable) and I got this answer:

Unfortunately request.pause() currently does not pause the request right away. A few more data events may still be emitted. That's a problem in node and ryan wants to adress it on the event loop level, but hasn't done so yet.
server.listen(3000); 

Shimon Doodkin

unread,
Jan 4, 2011, 6:55:43 PM1/4/11
to nodejs
Could be that there is a problem with the request buffer.

First thing I do is to parse uploads.

tjholowaychuk

unread,
Jan 4, 2011, 10:18:50 PM1/4/11
to nodejs
You are not alone with this issue. In connect we need this for the
middleware to act as "proper" serial layers.
Currently we have a utility (aka hack) that buffers them in memory and
re-emits them when resumed, its pretty sketch

Charlie Robbins

unread,
Jan 4, 2011, 11:18:43 PM1/4/11
to nod...@googlegroups.com
I don't know if I'd call event buffering a hack, but it is non-standard in nodejs right now. Could be interesting to see events.EventEmitter.buffer and events.EventEmitter.flush in core. 

--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.


Marco Rogers

unread,
Jan 4, 2011, 11:52:14 PM1/4/11
to nodejs
@Pau Sorry, I didn't misunderstand your issue. But maybe I didn't do a
good job of explaining why it works that way. The pause() method is a
little confusing because it doesn't take effect immediately. What it
comes down to is that setting a callback on the "data" event is the
*first* thing you should do. Before doing any other async calls or
yielding anything back to the event loop. This ensures that you won't
miss any data. If you're not ready to do something with that data
yet, then yes, you have to buffer it.

One of the philosophies behind node is that data should move through
it in streams. You should actively process data as it comes in and
move it to the next stage. That keeps memory usage low and keeps
throughput high. But it's not a perfect system obviously. Sometimes
you have to do some setup before you're ready to process incoming
data. That's when you need to buffer the data yourself until your
other callbacks return.

This situation could be improved. But in my opinion, I'm not sure
making pause kill the flow immediately is the right idea. It leaves
the door open for people to abuse it. They'll just call pause() at
the beginning of every request and wait until everything else is
done. I'm sure the reason this is still an open issue is that a
better solution hasn't yet been presented.

@Charlie I don't think buffer() and flush() should be part of the
EventEmitter api. Emitters can be based on anything. Data streams are
just the most common example in node right now. We could make it part
of the "data" stream api, but I also don't think it's a good idea to
start specializing interfaces on top of the basic Stream interface.

I wouldn't be opposed to making it an inner feature of socket streams
though. So the req stream would buffer internally by default until at
least one "data" event handler is set. Then it would play back the
buffered chunks in order before resuming real time streaming. This is
what everyone's doing when they run into this problem anyway. It
should just be part of the internal net.Stream implemenation. That
way the Stream interface stays pretty pure, all the current semantics
still work as expected, but you have a little more flexibility when
handling connections. And it can be removed transparently later when
a better solution presents itself.

:Marco

On Jan 4, 11:18 pm, Charlie Robbins <charlie.robb...@gmail.com> wrote:
> I don't know if I'd call event buffering a hack, but it is non-standard in
> nodejs right now. Could be interesting to see events.EventEmitter.buffer and
> events.EventEmitter.flush in core.
>
> > nodejs+un...@googlegroups.com<nodejs%2Bunsu...@googlegroups.com>
> > .

Jorge

unread,
Jan 5, 2011, 4:48:01 AM1/5/11
to nod...@googlegroups.com
On 05/01/2011, at 00:32, Marco Rogers wrote:
> (...)

> Another handy tip: console.log is non-blocking as well. So it's not
> exactly accurate for determining what order things happen in. (...)

Wrt this, I think you're mistaken, Marco.
--
Jorge.

Marco Rogers

unread,
Jan 5, 2011, 1:25:36 PM1/5/11
to nodejs
Actually I think you're right Jorge. Someone correct me if I'm wrong,
but the issue is that stdout isn't flushed immediately when you write
to it using console.log. But stderr is flushed when you use
console.error.

Either way, you see console.error data immediately while console.log
data could potentially be delayed. That I'm pretty sure of.

:Marco

Felix Geisend?rfer

unread,
Jan 5, 2011, 1:55:10 PM1/5/11
to nod...@googlegroups.com
I've just discussed this problem with Ryan the other day - stream.pause() needs to be fixed. At this point it will require a little more work on the http parser, but it sounds like a patch is in the works.

fs.ReadStream is already behaving properly btw.

--fg

Pau

unread,
Jan 5, 2011, 2:43:04 PM1/5/11
to nod...@googlegroups.com
Cool! I´ll be checking the new versions of node in order to see when this patch is available. :)

Jorge

unread,
Jan 5, 2011, 2:47:06 PM1/5/11
to nod...@googlegroups.com
Yes, right. What I meant to say was that wrt the order, while this

var ONE= new Buffer("ONE");
var TWO= new Buffer("TWO");

fs.write(process.stdout.fd, ONE, 0, 3, -1, cb);
fs.write(process.stdout.fd, TWO, 0, 3, -1, cb);

might eventually produce the -surprising but correct (!)- output TWO ONE, however one can be confident that

console.log("ONE");
console.log("TWO");

is going to produce the output ONE TWO in the expected order, always.
--
Jorge.

Ryan Dahl

unread,
Jan 5, 2011, 3:04:03 PM1/5/11
to nod...@googlegroups.com

So, just to be clear req.pause() currently does work - kind of.
It will /eventually/ will stop the req from emitting 'data' events,
just not immediately. It's arguable that this is the correct API - the
packet has already arrived - it's being parsed - there's not much to
"pause".

That said, many people have found this to be ugly and I'm 70% though
fixing it. The todo item is
https://github.com/ry/node/blob/bb7bf58cc77c07b6421c44872926460479aee496/TODO#L31

Matt Ranney

unread,
Jan 5, 2011, 10:45:59 PM1/5/11
to nod...@googlegroups.com
On Wed, Jan 5, 2011 at 12:04 PM, Ryan Dahl <r...@tinyclouds.org> wrote:
That said, many people have found this to be ugly and I'm 70% though
fixing it.

I think fixing this is a good idea.

I should point out though, that with this fix it'll be even more important to understand what pause actually does.  Just because your program stops getting "data" events, it doesn't necessarily mean that the sender is getting any kind of backpressure.  What it really does is to stop pulling data out of the socket buffer.  In my testing, I've found that pausing an uploader will often allow the sender to keep on sending another 512KB, end the HTTP request, and possibly even close the TCP connection, all while the server is "paused" and not getting any more data events.

Marco Rogers

unread,
Jan 5, 2011, 10:56:57 PM1/5/11
to nodejs
Yeah, I definitely don't have a good mental model of things at that
low level. Ry, can you give a quick explanation of how you're
planning to address this issue? How will the behavior change exactly?

:Marco

Joran Greef

unread,
Jan 21, 2011, 11:11:58 AM1/21/11
to nodejs
This would be terrific, I'm about to write a wrapper for
http.ServerRequest to handle this until the fix comes.

On Jan 5, 10:04 pm, Ryan Dahl <r...@tinyclouds.org> wrote:
> On Tue, Jan 4, 2011 at 3:17 PM, Pau <masy...@gmail.com> wrote:
> > I'm trying to deal with some file uploads, but before I need to do a
> > blocking operation (db access).
> > I have topausethe request so I can handle the on 'data'. It doesn't work,
> > at all.
> > Example:
> > server = require('http').createServer(function(req, res) {
> >   req.pause();
> >   req.on('data', function(buffer) {
> >     console.log('data parsed');
> >   });
> >   setTimeout(function () {
> >     console.log('fffuuuuuuu!');
> >     req.resume();
> >   }, 1000);
> > });
> > server.listen(3000);
> > "data parsed" should come after "ffuuuuuuu!" but it doesn't.
> > Thank's in advance!
>
> So, just to be clear req.pause() currently does work - kind of.
> It will /eventually/ will stop the req from emitting 'data' events,
> just not immediately. It's arguable that this is the correct API - the
> packet has already arrived - it's being parsed - there's not much to
> "pause".
>
> That said, many people have found this to be ugly and I'm 70% though
> fixing it. The todo item ishttps://github.com/ry/node/blob/bb7bf58cc77c07b6421c44872926460479aee...

Ryan Dahl

unread,
Jan 21, 2011, 4:00:37 PM1/21/11
to nod...@googlegroups.com
On Wed, Jan 5, 2011 at 7:56 PM, Marco Rogers <marco....@gmail.com> wrote:
> Yeah, I definitely don't have a good mental model of things at that
> low level.  Ry, can you give a quick explanation of how you're
> planning to address this issue?  How will the behavior change exactly?

https://github.com/ry/node/tree/http_parser_refactor

It will make it so no data events will happen after a call to pause().

Ryan Dahl

unread,
Jan 21, 2011, 4:02:13 PM1/21/11
to nod...@googlegroups.com

Joran Greef

unread,
Feb 24, 2011, 12:27:47 PM2/24/11
to nodejs
Has pause been changed yet?

Ryan Dahl

unread,
Feb 24, 2011, 3:25:08 PM2/24/11
to nod...@googlegroups.com
On Thu, Feb 24, 2011 at 9:27 AM, Joran Greef <joran...@gmail.com> wrote:
> Has pause been changed yet?

no

Mikeal Rogers

unread,
Feb 24, 2011, 6:35:08 PM2/24/11
to nod...@googlegroups.com
i actually like pause in streams being advisory because when streams are implemented as a general interface it's better in many cases to emit any data events that still get put in to memory before the file handler actually messages back to the client to stop sending.

Felix Geisendörfer

unread,
Feb 25, 2011, 5:16:54 AM2/25/11
to nod...@googlegroups.com
i actually like pause in streams being advisory because when streams are implemented as a general interface it's better in many cases to emit any data events that still get put in to memory before the file handler actually messages back to the client to stop sending.

I strongly disagree. To me that behavior is very unexpected, and it's a PITA to program around. It's also the #1 source of bug reports I get formidable, and even when I explain what's going on, people don't really "get it".

However, I would be curious to hear about the use cases where you think the current `pause()` behavior is more desirable.

--fg

Joran Greef

unread,
Feb 25, 2011, 7:26:17 AM2/25/11
to nodejs
Agreed.

At the moment my server has to wrap every request object to patch the
pause method, before dispatching the request to the relevant
controller, since middleware may need to do something asynchronous
before the controller gets the request and all request handling is
otherwise up to the controller. It would be better to have the server
go "request.pause()", then hand off to middleware and have the
controller go "request.resume()". A few kilobytes in memory in the
Node layer is not a problem since it would otherwise be taken up in
the application layer albeit with more complexity.

There's also a side effect of using pause (which must be documented or
fixed to save people hours of debugging it): if the controller forgets
to resume (say because it decides not to use the request body), then
subsequent requests on keep-alive connections will be delayed 2
minutes before reaching the server. It's understandable from a bird's
eye point of view but when you're writing a controller which gets
"request, response" you sometimes forget that keep-alive connections
may be shared between controllers, and not resuming a paused request
may have disastrous effects. See:
http://groups.google.com/group/nodejs-dev/browse_thread/thread/2f0da77c6d39e06a
for a gist.

Cheers

Joran

Mikeal Rogers

unread,
Feb 28, 2011, 1:16:58 PM2/28/11
to nod...@googlegroups.com
if you have a chain of pumped streams that are outputting to an endpoint, like say you're rendering rows from a  database in to html and outputting to a client, it's better to let the data events come through and send them back out to the client rather than have them sitting in memory just because you needed to pause the database stream.

Even if you're waiting for a drain event on the client it's better to process the data and have between calling pause and it actually pausing and have it ready next to the client socket so that once it does drain it's already ready to send.

Sure, there are some warts in the programming model when you are this close to the metal but I don't think it's the place of node core to make that decision, it's trivial to implement buffering your own stream interface if you that is the model you desire and if that is your main source of complaints maybe you should implement it in the form stream.

It's also worth noting that even if we implement this buffering for http.Client it's still not going to be the case for other stream interfaces.

Mikeal Rogers

unread,
Feb 28, 2011, 1:24:18 PM2/28/11
to nod...@googlegroups.com
i really don't understand this complaint.

the issue here is that you cannot process data because you wrote middleware that doesn't support streaming, it is your responsibility to handle the streaming by buffering not the other way around (having node support buffering for you so that you can have an easier way to lazily pause streaming).

wrapping every request object to buffer data is not that hard, this is part of the programming model that you have to deal with.

also, you should try at all costs not to incure IO in order to handle the initial request/response events in HTTP, you should try to keep your routing and application rules in memory to handle this event and update them asynchronously.

also, what you are suggesting, pausing every input stream, means that every connection is incurring two roundtrips to signal pause on the TCP stream and then resume it unless you have some crazy half meg socket buffer like OS X keeps. this is probably exaggerating TCP slow start for all the clients that connect to you. if you instead just buffer in memory in the application space and don't signal pause and resume on the socket it would be faster and more efficient.

Dean Landolt

unread,
Feb 28, 2011, 1:33:26 PM2/28/11
to nod...@googlegroups.com
Amen. This inconsistency is arguably worse than having to explain (over and over, no doubt) that pause is advisory. As a simple alternative, why not stick a stream buffering helper into core for the best of both worlds?

Isaac Schlueter

unread,
Feb 28, 2011, 1:56:58 PM2/28/11
to nod...@googlegroups.com
Why not add .cork() which is like .pause() but more forceful? The
default Stream#pipe case would use .pause(), since it's efficient and
good, but it'd be easier to handle the cases when you want to pause as
well as buffer any chunks that come through.

--i

> --
> You received this message because you are subscribed to the Google Groups
> "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com.
> To unsubscribe from this group, send email to

> nodejs+un...@googlegroups.com.

Mikeal Rogers

unread,
Feb 28, 2011, 2:13:22 PM2/28/11
to nod...@googlegroups.com
rather than increase the surface area of the default stream interface to include buffering might it be better to add a corkable stream object that people can either use as their base prototype or pipe through?

i like the idea of a new method tho that buffers data but does *not* signal a pause event which seems like the thing people really want.

keeping pause/resume mapped to calls to file descriptors also seems like a good idea.

Joran Greef

unread,
Feb 28, 2011, 2:27:26 PM2/28/11
to nodejs
Thank you Mikeal for the explanation. You are more familiar with these
issues then I am.

What if the current pause implementation was extended with an in-
memory buffer that could smooth over a quick succession of pause
resume calls, without incurring roundtrips unless necessary?

Mikeal Rogers

unread,
Feb 28, 2011, 2:39:51 PM2/28/11
to nod...@googlegroups.com
that's just too much magic for core.

what we need to do is separate the file handler messaging from the need for application layer buffering of the data because they are totally different concerns and we're trying to leverage one for the other and it's just not working.

for the most part, you only want to cause a pause() on the incoming file handler if the one you're writing to returns false, then wait for drain and resume, blah blah blah. i think we have a good handle on this workflow and a working solution in stream.Stream.pipe().

if we want to add application layer buffering between hooking that stream up to another socket (like if you need to do IO in order to know where to proxy some data) then we should add it in another stream that can buffer data up to a certain cap without trigger pause and then can write all the buffered data to the stream it's piped to once your application layer knows where to route to.

that way all you need to do is

function (req, res) {
  var b = new BufferedStream(4096);
  req.pipe(b);
  route(req.url, function (e, location) {
    var client = http.request(location);
    b.pipe(client);
    client.pipe(res);
  }
}

i think that's right.

Marco Rogers

unread,
Feb 28, 2011, 3:13:17 PM2/28/11
to nod...@googlegroups.com, Mikeal Rogers
+1 for having the buffering functionality be wrapped in it's own little stream.  I thought that was the idea from the beginning to be honest.  I don't want the buffering to be in the bowels of the net code where I can't have some type of control over it.

This feels like one of those things where some solutions should be developed in user land, e.g. a connect middleware, custom Stream module, etc.  Then people would use them and see what actually makes the most sense.  Several people on this thread admit that they have developed solutions for it already.  Why are these not more generalized and released to the community?  If we settle on something that seems to work really well for the vast majority of people, then maybe it can be adopted by node as a custom Stream in the stream module.

Maybe I'm missing some reason why this needs to be done at the core level.  As soon as it's in core and it doesn't work the way you want, it's either "tough shit" or you have to do more work to work around it. I would prefer to avoid that.

:Marco


--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.



--
Marco Rogers
marco....@gmail.com

Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz

tjholowaychuk

unread,
Mar 1, 2011, 1:05:16 AM3/1/11
to nodejs
Even with things like connect we COULD do it without buffering the
events, but it would complicate the API
quite substantially, o

On Feb 28, 12:13 pm, Marco Rogers <marco.rog...@gmail.com> wrote:
> +1 for having the buffering functionality be wrapped in it's own little
> stream.  I thought that was the idea from the beginning to be honest.  I
> don't want the buffering to be in the bowels of the net code where I can't
> have some type of control over it.
>
> This feels like one of those things where some solutions should be developed
> in user land, e.g. a connect middleware, custom Stream module, etc.  Then
> people would use them and see what actually makes the most sense.  Several
> people on this thread admit that they have developed solutions for it
> already.  Why are these not more generalized and released to the community?
>  If we settle on something that seems to work really well for the vast
> majority of people, then maybe it can be adopted by node as a custom Stream
> in the stream module.
>
> Maybe I'm missing some reason why this needs to be done at the core level.
>  As soon as it's in core and it doesn't work the way you want, it's either
> "tough shit" or you have to do more work to work around it. I would prefer
> to avoid that.
>
> :Marco
>
> marco.rog...@gmail.com

Marco Rogers

unread,
Mar 1, 2011, 9:19:19 AM3/1/11
to nod...@googlegroups.com

I thought we were talking about buffering explicitly. Nobody wants a more complicated API.

Felix Geisendörfer

unread,
Mar 1, 2011, 3:48:53 PM3/1/11
to nod...@googlegroups.com
How about changing the API to this?

.pause(buffer = false)

My biggest concern about the API is consistency. Right now the behavior of pause() is very inconsistent between net streams and fs streams.

--fg

Mikeal Rogers

unread,
Mar 1, 2011, 3:50:03 PM3/1/11
to nod...@googlegroups.com
how is it inconsistent right now between fs and net streams?

Marco Rogers

unread,
Mar 1, 2011, 4:09:43 PM3/1/11
to nod...@googlegroups.com, Mikeal Rogers
@Mikeal

I'm assuming Felix is referring to the behavior you can expect when you call pause on an fs stream or a net stream.  With fs there's little to no delay from when you call pause to when it stops emitting data.  Because it just stops reading from the fd.  But with a net stream it 's much more likely that you'll get several more data events before it's actually paused.  Because there is a pipeline between when it's actually read of the socket and when it actually get's emitted.  This is even more pronounced with http because it's going through the parser.

Essentially this issue with pause/resume doesn't come up as often with fs streams because they generally exhibit the "immediate pause" behavior that everyone seems to expect.

If I have the wrong idea about this please let me know.

@Felix

Could you comment on why you prefer having the buffering handled in core instead of in userland or at least in a js module in stream?  The pause(buffer = true) idea would throw a wrench in the simplicity of stream.pipe.  My idea is if you know you want it buffered set it up that way.

    socket.pipe( new BufferedStream(4096) ).pipe( process.stdout );

or even

    ( new BufferedStream( socket, 4046 ) ).pipe( process.stdout );

:Marco



On Tue, Mar 1, 2011 at 3:50 PM, Mikeal Rogers <mikeal...@gmail.com> wrote:
how is it inconsistent right now between fs and net streams?

--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.



--
Marco Rogers
marco....@gmail.com

Felix Geisendörfer

unread,
Mar 1, 2011, 5:19:46 PM3/1/11
to nod...@googlegroups.com, Mikeal Rogers
Essentially this issue with pause/resume doesn't come up as often with fs streams because they generally exhibit the "immediate pause" behavior that everyone seems to expect.

fs.ReadStream pause() is *always* immediate. It buffers. However, it is guaranteed that the maximum buffer size is identical to the buffer sized used for chunking the reads, so it seems like a no-brainer there.
 
Could you comment on why you prefer having the buffering handled in core instead of in userland or at least in a js module in stream?

I don't want anything to be in core, I merely want the core to be consistent and follow the stated goals of the project:

Goal: Because nothing blocks, less-than-expert programmers are able to develop fast systems.

Consistency with this goal means: pause() behaves the same way on all streams

If I have to know about the semantics of the underlaying stream, and that there is a latency problem when calling pause(), it gets really complex. Now, if that's the route we decide to go, fine. But lets be consistent about it.

--fg

Mikeal Rogers

unread,
Mar 1, 2011, 5:32:45 PM3/1/11
to nod...@googlegroups.com




Felix Geisendörfer
March 1, 2011 March 1, 20112:19 PM

Essentially this issue with pause/resume doesn't come up as often with fs streams because they generally exhibit the "immediate pause" behavior that everyone seems to expect.

fs.ReadStream pause() is *always* immediate. It buffers. However, it is guaranteed that the maximum buffer size is identical to the buffer sized used for chunking the reads, so it seems like a no-brainer there.
 
Could you comment on why you prefer having the buffering handled in core instead of in userland or at least in a js module in stream?

I don't want anything to be in core, I merely want the core to be consistent and follow the stated goals of the project:

Goal: Because nothing blocks, less-than-expert programmers are able to develop fast systems.
If you write for your code directly on top of node core you're going to need to know how node core works. I doubt that in the future anyone will need this level of understanding to write most node apps because we're solving the abstractions for them right now in this thread.


Consistency with this goal means: pause() behaves the same way on all streams
Then we should emit that last data chunk on fs streams to make it consistent, not add additional abstractions to net streams.


If I have to know about the semantics of the underlaying stream, and that there is a latency problem when calling pause(), it gets really complex. Now, if that's the route we decide to go, fine. But lets be consistent about it.
You want pause() to be an abstract method rather than just a signal to the underlying fd in net streams. That is an extra abstraction for the sake of making net streams more consistent with fs streams when the the fs stream is already an abstraction and we can freely change it's behavior without removing the more direct signal system we have with pause() on net streams.

It's quite simple. pause() and resume() are methods (and events in a pump chain) that correspond to fd signals to stop and resume pulling data. If you want to buffer data, run it through a buffering stream, which we can easily provide.

Why is there an obsession with making the current pause method buffer?

--fg
 
On Tue, Mar 1, 2011 at 3:50 PM, Mikeal Rogers <mikeal...@gmail.com> wrote:
how is it inconsistent right now between fs and net streams?
--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.



--
Marco Rogers
marco....@gmail.com

Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz
--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.


Marco Rogers
March 1, 2011 March 1, 20111:09 PM

@Mikeal

I'm assuming Felix is referring to the behavior you can expect when you call pause on an fs stream or a net stream.  With fs there's little to no delay from when you call pause to when it stops emitting data.  Because it just stops reading from the fd.  But with a net stream it 's much more likely that you'll get several more data events before it's actually paused.  Because there is a pipeline between when it's actually read of the socket and when it actually get's emitted.  This is even more pronounced with http because it's going through the parser.

Essentially this issue with pause/resume doesn't come up as often with fs streams because they generally exhibit the "immediate pause" behavior that everyone seems to expect.

If I have the wrong idea about this please let me know.

@Felix

Could you comment on why you prefer having the buffering handled in core instead of in userland or at least in a js module in stream?  The pause(buffer = true) idea would throw a wrench in the simplicity of stream.pipe.  My idea is if you know you want it buffered set it up that way.

    socket.pipe( new BufferedStream(4096) ).pipe( process.stdout );

or even

    ( new BufferedStream( socket, 4046 ) ).pipe( process.stdout );

:Marco

Marco Rogers

unread,
Mar 1, 2011, 5:40:50 PM3/1/11
to nod...@googlegroups.com
Haha, okay I get it.  But that's not a consistency problem to me.  "There is possible latency with calling pause" is a fact on all streams.  It's just that it rarely manifests with fs streams.  That doesn't mean it will never manifest with fs streams.  Mikeal and I were having a side conversation about this that I think is applicable here.  I'll forward it.

:Marco


2011/3/1 Felix Geisendörfer <fe...@debuggable.com>

Marco Rogers

unread,
Mar 1, 2011, 5:45:13 PM3/1/11
to nodejs
See this thread for a discussion on when an fs stream might also be exhibit delayed pause behavior. We could run a test on a network drive and see how it behaves.

:Marco


---------- Forwarded message ----------
From: Mikeal Rogers <mikeal...@gmail.com>
Date: Tue, Mar 1, 2011 at 5:05 PM
Subject: Re: [nodejs] Re: Is Request.pause useless?
To: Marco Rogers <marco....@gmail.com>


This depends on how node implements it's streams. Does node make sequential read() calls of a fixed size or does it just try to read as fast as it can?

If you do an async read call and we're waiting for it to actually read the data off the remote volume and we pause before it resolves I'm pretty sure we will emit. This just doesn't happen often with local volumes because they are relatively fast.

Felix Geisendörfer

unread,
Mar 1, 2011, 5:53:52 PM3/1/11
to nod...@googlegroups.com
Then we should emit that last data chunk on fs streams to make it consistent, not add additional abstractions to net streams.

Fine with me. I won't have time to do a patch right away, but I think it should be done.
 
It's quite simple. pause() and resume() are methods (and events in a pump chain) that correspond to fd signals to stop and resume pulling data. If you want to buffer data, run it through a buffering stream, which we can easily provide.

It's kind of annoying so, because you'll have to clone & hack an HttpRequest object for every request.

--fg 

Mikeal Rogers

unread,
Mar 1, 2011, 5:55:42 PM3/1/11
to nod...@googlegroups.com
why clone and hack rather than just pumping to a buffered stream?

Marco Rogers

unread,
Mar 1, 2011, 5:56:05 PM3/1/11
to nod...@googlegroups.com
That's what I'm not getting about the other side of this debate.  Why do you need to clone and hack?  

2011/3/1 Felix Geisendörfer <fe...@debuggable.com>

--
You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.

Felix Geisendörfer

unread,
Mar 1, 2011, 6:21:12 PM3/1/11
to nod...@googlegroups.com
On Tuesday, March 1, 2011 11:55:42 PM UTC+1, Mikeal Rogers wrote:
why clone and hack rather than just pumping to a buffered stream?

Because otherwise I'll have two keep track of two objects now. My original request object, as well as the new bufferedRequest one.

That's because bufferedRequest won't have any properties like headers, url, etc.

--fg

Mikeal Rogers

unread,
Mar 1, 2011, 6:59:37 PM3/1/11
to nod...@googlegroups.com
i guess i'm a little confused here because the workflow for most of my apps is.

- request comes in.
- app code decides what we want to do with it
- i pipe the request body to some stream object

if the app code that decides what to do needs IO i just create the buffered object, pipe to it, and then pipe from it once i have a stream.

i've never really had a situation where mid-way, or after, processing the entity body i now need to change my logic based on something else in the request object.

Preston Guillory

unread,
Mar 1, 2011, 9:55:45 PM3/1/11
to nod...@googlegroups.com
Mark me down as yet one more person who ran into this "feature," banged my head on the desk, and wrote a BufferableRequest wrapper.  Surely it's a red flag when the implementation conflicts with both the documentation and nearly everyone's natural assumption.


On Tue, Mar 1, 2011 at 4:32 PM, Mikeal Rogers <mikeal...@gmail.com> wrote:
It's quite simple. pause() and resume() are methods (and events in a pump chain) that correspond to fd signals to stop and resume pulling data. If you want to buffer data, run it through a buffering stream, which we can easily provide.


That's stretching the definition of simple.  This may come as a surprise, but most programmers aren't familiar with the implementation details of file systems and the TCP protocol.

Pop quiz: Streams let you process data as a series of emitted 'data' events.  What do you suppose stream.pause() does?
Answer: Pauses the 'data' events.

If you said, "Issues an advisory signal to the underlying communication layer and potentially continues emitting events for an indeterminate period of time," you are out of touch.

Reminds me of this: http://browsertoolkit.com/fault-tolerance.png

Joran Greef

unread,
Mar 2, 2011, 5:43:43 AM3/2/11
to nodejs
I was wondering, does it not seem a waste to create a whole new
BufferedStream constructor simply to be able to have .pause() work
immediately?

On Mar 1, 11:09 pm, Marco Rogers <marco.rog...@gmail.com> wrote:
> @Mikeal
>
> I'm assuming Felix is referring to the behavior you can expect when you call
> pause on an fs stream or a net stream.  With fs there's little to no delay
> from when you call pause to when it stops emitting data.  Because it just
> stops reading from the fd.  But with a net stream it 's much more likely
> that you'll get several more data events before it's actually paused.
>  Because there is a pipeline between when it's actually read of the socket
> and when it actually get's emitted.  This is even more pronounced with http
> because it's going through the parser.
>
> Essentially this issue with pause/resume doesn't come up as often with fs
> streams because they generally exhibit the "immediate pause" behavior that
> everyone seems to expect.
>
> If I have the wrong idea about this please let me know.
>
> @Felix
>
> Could you comment on why you prefer having the buffering handled in core
> instead of in userland or at least in a js module in stream?  The
> pause(buffer = true) idea would throw a wrench in the simplicity of
> stream.pipe.  My idea is if you know you want it buffered set it up that
> way.
>
>     socket.pipe( new BufferedStream(4096) ).pipe( process.stdout );
>
> or even
>
>     ( new BufferedStream( socket, 4046 ) ).pipe( process.stdout );
>
> :Marco
>
> On Tue, Mar 1, 2011 at 3:50 PM, Mikeal Rogers <mikeal.rog...@gmail.com>wrote:
>
> > how is it inconsistent right now between fs and net streams?
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "nodejs" group.
> > To post to this group, send email to nod...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > nodejs+un...@googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/nodejs?hl=en.
>
> --
> Marco Rogers
> marco.rog...@gmail.com

Joran Greef

unread,
Mar 2, 2011, 5:55:08 AM3/2/11
to nodejs
On Mar 2, 12:32 am, Mikeal Rogers <mikeal.rog...@gmail.com> wrote:
> I doubt that in the future anyone will
> need this level of understanding to write most node apps because we're
> solving the abstractions for them right now in this thread.

Solving abstractions on behalf of other people who are not
contributing to this thread and who may or may not even use Node "in
the future" may be building on a tower of assumptions.

That the current pause implementation maps directly to fd signals does
not mean that there is not a problem with the current pause interface
itself, as several people have suggested. For instance, if the current
pause implementation were to be left unchanged, the documentation
would at least need to be updated as to how to use the interface given
the underlying implementation. If the interface were right, that would
not be necessary.

Marco Rogers

unread,
Mar 2, 2011, 9:11:01 AM3/2/11
to nod...@googlegroups.com
I feel like we're running in circles a bit. I'll try to make some clear points for my side of this. And then there's some code at the bottom.

1) I want to be clear that I'm not suggesting we don't address the pause issue. And I don't think Mikeal is either. What I am saying is I would prefer to have some control over how the buffering is done, rather than have it be done in somewhere in core.  If folks can address that concern, I would let go of the BufferedStream thing.

2) This feels like one of those things where we should try a few different things and gather some real usage data before settling on the right approach to address it.  Several people here have solved this problem themselves and I bet not all approaches are the same.  So I'm not sure how we've all decided that we can put a solution into node right now that would make everyone happy.

3) This also feels like one of those one way features.  We can have things the way they are and write abstractions on top to buffer *when needed*.  But as soon as we buffer by default, it's done, we can't go backwards.  And even if it's configurable, we all know most folks won't bother investigating enough to make the informed decision.

4) I won't argue that the current behavior of pause is non-intuitive (fyi, I remember that I didn't want to call it pause for just that reason).  But the reason I'm arguing against buffering in node by default is that I'm a big fan of "fast by default".  Instead this will be making node slower and more memory intensive by default, and no one but Felix has suggested a solution that allows you to keep the fast behavior if you want (could be wrong, it's a long thread).

Now for something more constructive.  I've created a little script to test some of the things we've been talking about.  I took Mikeal's rough BufferedStream implementation and made it work. Then I set up a server with some endpoints to test various stream scenarios. Sorry it's not as simple as it should be but only because I tend to have too much fun writing code :)


For now I just wanted to get this out there.  I'll post some findings and reactions a bit later.  But one important thing to note is that it was difficult for me to get *any* of these to buffer. And when I did it was only one or two drain calls.

:Marco

-- 
Marco Rogers

Mikeal Rogers

unread,
Mar 2, 2011, 2:42:03 PM3/2/11
to nod...@googlegroups.com
I had a talk with Ryan about this last night at the Joyent event.

He can correct me if I'm wrong but this was the outcome.

1. fs streams should be consistent with net streams and emit that last data event after you pause them.
2. we should work on a good buffered stream object that will eventually make it's way in to core.

i'm going to create a library and push it to npm today with the bufferedStream, probably stealing Marco's improved (read "actually works") implementation. it will also eventually have some other stream experiments.

Mikeal Rogers

unread,
Mar 4, 2011, 10:19:46 PM3/4/11
to nod...@googlegroups.com
Pushed morestreams. morestreams.BufferedStream.

https://github.com/mikeal/morestreams
http://search.npmjs.org/#/morestreams

Didn't end up going with Marco's code cause it was too big and complicated for my small brain.

Marco Rogers

unread,
Mar 6, 2011, 5:04:43 PM3/6/11
to nod...@googlegroups.com, Mikeal Rogers
Mikeal:

Forked morestreams and updated the test.  The current version doesn't really address the specific issue of buffering data so you can make async calls before piping.  You've got a pull request for that.

I've also got a revamped version of my BufferedStream on this branch.


I killed the PipedState thing and it's now more similar to yours accept it adds a few key things.

- You can go back and forth between buffering and passthrough state. This is important to support the pause/resume scheme of pipe.
- Added flush semantics with flush on resume. In buffer state it buffers everything. When resume is called, the buffer queue is "flushed". It plays back the queue of buffered data in order before going back to passthrough state.

The flush semantics are flawed somehow though.  Either that or the test is flawed because it fails.  Take a look.

:Marco

-- 
Marco Rogers

Forwarded message:

From: Mikeal Rogers <mikeal...@gmail.com>
Reply To: nod...@googlegroups.com
To: nod...@googlegroups.com
Date: Friday, March 4, 2011 10:19:46 PM
Subject: Re: [nodejs] Re: Is Request.pause useless?

--

Marco Rogers

unread,
Mar 6, 2011, 7:32:47 PM3/6/11
to nod...@googlegroups.com

Mikeal Rogers

unread,
Mar 7, 2011, 12:46:56 AM3/7/11
to nod...@googlegroups.com
So, I specifically didn't handle the pause/resume thing for a few reasons, which we should probably talk about.

I think the main use case we came up with, that you want to do some IO in order to figure out where to route a request to, is a low latency operation, you're talking to a local redis or memcached, hopefully.

For that use case, you shouldn't pause the input stream, and once it's piped, you should never do any more buffering. You have a short amount of time that you need to do some IO, you don't want to pause the input stream because it's going to cause at least two roundtrips to the client which will probably take longer than talking to redis. Once the streams are piped together, you want the fd signals to go back and forth uninterrupted because the amount of time you would need to buffer for is unknown and the last stream you write to will buffer any left over data anyway, you need to immediately pause the input stream if the output stream can't handle data.

Just in case Redis or Memcached takes longer than expected, on the client upload is absurdly fast, you have the limit parameter that will pause the input stream once a buffering threshold is hit.

I'm not saying there isn't a use case for buffering in between pause/resume but I don't know what it is and I'd like it laid out before we talk about the best way to solve it. If the use case is "you have crazy application code that is causing pause/resume after you pipe to it" that's not a real use case, that's just bad and/or lazy code.

-Mikeal

Marco Rogers

unread,
Mar 7, 2011, 9:58:40 AM3/7/11
to nod...@googlegroups.com
I'm not sure I completely understand all the points you made. But the one that did come across is that a full pause that goes all the way back to the read fd isn't always desirable when piping. I'll buy that.


I think the main use case we came up with, that you want to do some IO in order to figure out where to route a request to, is a low latency operation, you're talking to a local redis or memcached, hopefully.

I don't think we can assume this at all. There are lots of reasons people might need to do async operations before piping. And they're not all going to be fast. Yes you can argue that they're doing it wrong (and I would agree). But that doesn't mean this solution shouldn't be robust enough to support them.

For that use case, you shouldn't pause the input stream, and once it's piped, you should never do any more buffering.

Um, why not? One scenario I'm imagining is the formidable module.  At any point when reading an upload stream, it may need to direct part of the input somewhere different. There might be an async seam in a few places in this pipeline where you want to buffer briefly. I haven't dug into the details of formidable. Just using it for an example (felixge could tell me I'm nuts)

You have a short amount of time that you need to do some IO, you don't want to pause the input stream because it's going to cause at least two roundtrips to the client which will probably take longer than talking to redis.

This is reasonable.

Once the streams are piped together, you want the fd signals to go back and forth uninterrupted because the amount of time you would need to buffer for is unknown and the last stream you write to will buffer any left over data anyway, you need to immediately pause the input stream if the output stream can't handle data.
You're losing me here. How do you immediately pause the input stream?  If you can do that, doesn't it obviate the need for any of this?  Also what do you mean "the last stream you write to will buffer any left over data anyway".  Is this the way it works currently?  I wasn't aware that write streams would buffer.  Are you saying this happens after write() returns false on a writestream with an fd?


Just in case Redis or Memcached takes longer than expected, on the client upload is absurdly fast, you have the limit parameter that will pause the input stream once a buffering threshold is hit.

Yes, but my point is that this can happen multiple times. But I think I see where we are thinking differently about this.  I'll summarize below.

I'm not saying there isn't a use case for buffering in between pause/resume but I don't know what it is and I'd like it laid out before we talk about the best way to solve it. If the use case is "you have crazy application code that is causing pause/resume after you pipe to it" that's not a real use case, that's just bad and/or lazy code.

It's not just crazy application code that causes pause/resume.  It's clients that are slow in receiving.  Correct me if I'm thinking of this wrong, but once you start piping, anytime the writestream returns false it'll try to pause. That's how pipe works.  So either the pause will go all the way back to the read client, or the bufferedstream has to preempt that, by buffering again.

It sounds like what you're saying is once you start piping to the fd receiver, you should turn off the bufferedstream. Basically it should be a dumb passthrough. But that doesn't support the idea that if possible, we want to prevent pause from reaching back to the read client. The bufferedstream does this work.  And it might have to do it multiple times if the write receiver is being temperamental.

One thing we can agree on is that the bufferedstream should not immediately forward the pause request.  My thought is that when it gets a pause, it resumes buffering data input.  If it sees resume, it'll replay the data in order before resuming passthrough.  If the limit threshold is reached, then you're probably really in trouble and at that point it forwards the pause to exert backpressure.

I also think it's very possible that we're talking around each other.  It's difficult to discuss these scenarios with words.  What would be really handy is a test harness that would allow us to simulate these scenarios in a controlled way.  I'm going to work on that, unless someone has a better idea.

:Marco

Marco Rogers

unread,
Mar 7, 2011, 10:01:15 AM3/7/11
to nod...@googlegroups.com

Hmmm, I was hoping this would move to a new topic.
Reply all
Reply to author
Forward
0 new messages