File Read Stream -pump> Gzip Stream -pump> HTTP response.What you want to do, ideally, when the HTTP response write returns false is get the File Read stream to pause but with the current Stream API and pump methods you can't reliably do this.
--You received this message because you are subscribed to the Google Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nodejs?hl=en.
To unsubscribe from this group, send email to nodejs+un...@googlegroups.com.
I'm okay with that scenario accept for the fact that you've called pause(). What's the point of calling it if you can't expect it to actually do anything?
I think it's misleading and will have people making assumptions that aren't true.
The OP in that original thread already spent a lot of time working with the api assuming that pause() actually means... pause. As soon as it doesn't, we're going to have lots more people having trouble getting consistent behavior. But let's come back to that.Okay, here's a few questions to see if I'm following you. Assume the setup mentioned previouslyFileReader -pump1-> GZipper -pump2-> HttpResponse/* GZipper is a read/write stream. pump1 and pump2 are the pipes connecting these streams. In Mikeal's node-utils implementation they are simple event emitters */Now assume the HttpResponse has returned false on write.- GZipper.pause() is called right? Who calls it? pump2 i'm assuming.
- Does the "pause" event make it all the way back to the file reader? And does that actually pause, as in stop reading from the file?
- Which agent actually does buffering when it's necessary? Is it pump2 or the gzipper?
- When drain is emitted GZipper.resume() is called right? and the resume event makes it all the way back to the file reader and restarts it?
This makes sense to me I guess. The important distinction is that pause/resume is for _throttling_ streams when there is a block at a writer. But it doesn't necessarily _stop_ them. Each link in the chain should still expect to get some data events. And the last link before the write block needs to buffer. I'm fine with that. But where does the buffering code live?
If we're not expecting userland modules to write their own every time (e.g. in the GZipper), it needs to be inherited from a node agent right? That's why I asked if the pump mechanism encapsulates the buffering feature. If so then it has to live in pump2 and pump has to be a first class api method in node.
I guess I'm saying the semantics for read streams and write streams should be consistent irrespective of each other. If a stream happens to be both, you can expect the read behavior on one side and the write behavior on the other, nothing fancy. When you have to do special things or make special assumptions based on if a particular object has both properties, it becomes harder to reason about. And unnecessarily so IMO.
With that being said, I still have a lot to learn in order to recognize when that kind of straight forward semantics just isn't possible. Mikeal you and ry have been very helpful in that respect. The questions I'm raising are meant to help clarify rather than causing contention.
What I'm actually suggesting is that pause() attempt to stopRead() on it's file handler if it has one and emit "pause" if it cannot.
Streams are a generic interface description so they don't *require* stopRead() because it might not be applicable. The "pause" event gives Stream implementors something to do when they cannot stopRead().
Which assumption? That data events won't be fired after pause() is called?
Yes, one of the things a pump is responsible for is pausing the readable stream when the write() returns false.FYI, in my stream utils event emitter a "pause" event is emitted *on the pump EventEmitter* and my thinking was that people who setup the pumps could use this to propagate the pause up to the parent stream but this proved difficult and kind of annoying.
I'm proposing that Gzipper just emit "pause" and not attempt to buffer.Whoever is sending data to the Gzipper from a file handler is responsible for handling that event and pausing the file reader stream, this will most likely be a pump.
Neither. The HTTP Response handles the buffering. It, conveniently, already implements buffering and when the client is available to accept writes again it's in the best position to optimize that write.
- When drain is emitted GZipper.resume() is called right? and the resume event makes it all the way back to the file reader and restarts it?Correct, since Gzipper isn't responsible for a file handler it should just emit a "resume" event and the pump should be listening to this and resume the file reader.We *could* simply use the "drain" event again in this case I'm just a little worried about some case where you might need to differentiate the two, but it could be a better idea to just use "drain".
Buffering already *must* be implemented by any stream that is responsible for writing a file handler except in special cases like synchronous filesystem operations (for obvious reasons).Also, it's not implemented yet but there are great optimizations you can make to fsync() when you're writing sequential data if you have 8 chunks in your buffer you can batch the write and the fsync().If we're not expecting userland modules to write their own every time (e.g. in the GZipper), it needs to be inherited from a node agent right? That's why I asked if the pump mechanism encapsulates the buffering feature. If so then it has to live in pump2 and pump has to be a first class api method in node.pumps, for the most part, aren't responsible for any buffering, only writable Streams that have a file handler.the exception is pumps that pump to multiple sources like my multiPump because they only pause the input streams if *all* the writable streams pause, if not the data is buffered in the pump. but, this is a rare case.I guess I'm saying the semantics for read streams and write streams should be consistent irrespective of each other. If a stream happens to be both, you can expect the read behavior on one side and the write behavior on the other, nothing fancy. When you have to do special things or make special assumptions based on if a particular object has both properties, it becomes harder to reason about. And unnecessarily so IMO.the problem we're discussing *only* happens when a stream is both readable and writable. if a stream is only readable or only writable it can't be in the middle of two pumps like this.
Ah, having the buffering in the httpresponse makes this a lot easier to swallow. A few things.What I'm actually suggesting is that pause() attempt to stopRead() on it's file handler if it has one and emit "pause" if it cannot.I would expect that if there is a "pause" event it gets fired every time pause() is called, even if stopRead is also called. If you don't need it, just don't listen for it.
Streams are a generic interface description so they don't *require* stopRead() because it might not be applicable. The "pause" event gives Stream implementors something to do when they cannot stopRead().Agreed, except you should always emit pause. There's no harm and it keeps things consistent. But this leaves the question of ordering. Is "pause" emitted before or after stopRead is called? Is stopRead async? And if so, should the "pause" event wait until it's successful? What if it's not?
Which assumption? That data events won't be fired after pause() is called?Yeah that's the assumption that I had and the OP had. It's easy to read pause() as "call this and the stream is stopped. you're safe from events until you resume". I understand now why that's a false assumption, but it's not clear from the spec. I don't think that problem is going to go away. It's just a teaching point I suppose. I was trying to think of better names that would put people more in the mind of throttling rather than "stop this stream". No luck yet.
Yes, one of the things a pump is responsible for is pausing the readable stream when the write() returns false.FYI, in my stream utils event emitter a "pause" event is emitted *on the pump EventEmitter* and my thinking was that people who setup the pumps could use this to propagate the pause up to the parent stream but this proved difficult and kind of annoying.Right, that's what I meant by "pause makes it back to the file reader". Through successive pause events down the chain. That's sensible and keeps things decoupled.I'm proposing that Gzipper just emit "pause" and not attempt to buffer.Whoever is sending data to the Gzipper from a file handler is responsible for handling that event and pausing the file reader stream, this will most likely be a pump.
Yeah I think we're on the same page with the pause/resume events.Neither. The HTTP Response handles the buffering. It, conveniently, already implements buffering and when the client is available to accept writes again it's in the best position to optimize that write.This was the critical piece I was missing in your explanation. Essentially write() returning false should really only happen on core i/o streams like socket output, file writing, etc. And these should also include code to buffer any data if they can't write, because data events can still come in. These data events will be written in the same order when the write stream is flushed. I like this.So the things that weren't clear in terms of implementation are these:1) If you are an upstream data source and stream.write returns false, you should start the pause chain that will eventually make it back to the core read stream.
2) But you may still get data events from further upstream. You should still accept those and still call stream.write. The write stream should still be able to accept writes, and if it can't actually write to a file descriptor or socket, it should buffer them. Essentially returning false from write() is a signal to throttle the data if possible, but it shouldn't stop the stream.
- When drain is emitted GZipper.resume() is called right? and the resume event makes it all the way back to the file reader and restarts it?Correct, since Gzipper isn't responsible for a file handler it should just emit a "resume" event and the pump should be listening to this and resume the file reader.We *could* simply use the "drain" event again in this case I'm just a little worried about some case where you might need to differentiate the two, but it could be a better idea to just use "drain".Nah, I like sticking with pause/resume. drain is specific to the "blocked" write stream. It's paired with write() == false which only happens sometimes.Buffering already *must* be implemented by any stream that is responsible for writing a file handler except in special cases like synchronous filesystem operations (for obvious reasons).Also, it's not implemented yet but there are great optimizations you can make to fsync() when you're writing sequential data if you have 8 chunks in your buffer you can batch the write and the fsync().If we're not expecting userland modules to write their own every time (e.g. in the GZipper), it needs to be inherited from a node agent right? That's why I asked if the pump mechanism encapsulates the buffering feature. If so then it has to live in pump2 and pump has to be a first class api method in node.pumps, for the most part, aren't responsible for any buffering, only writable Streams that have a file handler.the exception is pumps that pump to multiple sources like my multiPump because they only pause the input streams if *all* the writable streams pause, if not the data is buffered in the pump. but, this is a rare case.I guess I'm saying the semantics for read streams and write streams should be consistent irrespective of each other. If a stream happens to be both, you can expect the read behavior on one side and the write behavior on the other, nothing fancy. When you have to do special things or make special assumptions based on if a particular object has both properties, it becomes harder to reason about. And unnecessarily so IMO.the problem we're discussing *only* happens when a stream is both readable and writable. if a stream is only readable or only writable it can't be in the middle of two pumps like this.Yeah I understand all of this now. I think the disconnect was I didn't understand where the buffering functionality lived. The thing is I don't think these read/write streams will be that uncommon. Processing raw streams before writing them out to wherever will probably happen a lot. That's why I'm concerned about getting the semantics right. Thanks for the clarification.Ry, do you agree with the consensus that is forming here? I want to take another shot at updating the api to reflect this thinking.
:Marco
--Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz
--