Message from discussion
Stream tweaks proposal
Date: Sun, 29 Jul 2012 10:51:58 -0700 (PDT)
From: Bruno Jouhier <bjouh...@gmail.com>
To: nodejs-dev@googlegroups.com
Message-Id: <a05847f8-66dc-422f-ab2a-d5adc82a5b04@googlegroups.com>
In-Reply-To: <0D7F480B-05A3-4272-AEE9-3D8FDD4CD428@gmail.com>
References: <CAPTWwjCEk5PAa619danSn0B1LeOaVnpxXY6Zb5pvk4=BfLVYuA@mail.gmail.com> <CADcwD-HfkPr0ZcBg6EeycbksiXvZ9HNVVzyHH21=zWLJhsdyzw@mail.gmail.com> <DFFB2C11-9FE6-4D4F-BC1F-6F77FEB64E24@gmail.com> <1a399faa-ece8-4cb9-ae11-e709fa698758@googlegroups.com> <CAGkHjAUHpCKWp+KSg03y8x+5hG4BdB7ib6BjmEUPtCncYns9Og@mail.gmail.com>
<0D7F480B-05A3-4272-AEE9-3D8FDD4CD428@gmail.com>
Subject: Re: [node-dev] Stream tweaks proposal
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_474_7171907.1343584318792"
------=_Part_474_7171907.1343584318792
Content-Type: multipart/alternative;
boundary="----=_Part_475_10734916.1343584318792"
------=_Part_475_10734916.1343584318792
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
@tim
The API that I used in this blog post is a simplified version of the API I
implemented in streamline. I simplified it in the blog post because I just
wanted to demo the equivalence between the two styles of API.
The streams module that I am using
(https://github.com/Sage/streamlinejs/blob/master/lib/streams/server/streams.md)
has most of the features that you saw missing:
* an optional "len" parameter in the read call.
* low and high water mark options in the ReadableStream constructor.
The "len" parameter has your "bytes" semantics and I use it exactly the way
you describe (typically to read 4 bytes to get a frame length and then read
N bytes for a frame). I did not implement "maxBytes" semantics because I
did not need it (which does not mean it would not be useful). The thing is
that all the additional bells and whistles can be implemented around the
basic read(cb) call (called readChunk in my module).
I introduced low and high mark options because I wanted to avoid a
pause/resume dance around every data event when the data arrives faster
than it is consumed. My assumption was that a little queue with high and
low marks would reduce the number of pause/resume calls and improve
performance. Basically tradiing a bit of space for speed. But I have to
admit that I did not bench it. So, if the pause/resume dance costs very
little this may be overkill.
@isaac and mikeal,
This callback proposal may sound very "anti-eventish" and it may give the
impression that I'm sorta trying to eradicate events from node's APis
(nobody said it but I can see how it could be perceived this way). This is
not the case. I like node's event API and I find it very elegant. But node
gives us two API styles (callbacks and events) and it is not always easy to
choose between the two. Here is the rationale that I use to decide between
them:
My main criteria is CORRELATION. Basically, I start with the assumption
that the API is event-oriented and then I analyze the degree of correlation
between the various events. If the events are highly correlated, I choose
the callback style. If there are loosely correlated, I keep the event
style. Some examples:
* User events (browser side) are very loosely correlated => event style
* Incoming HTTP requests (server side) are also very loosely correlated =>
event style
* Data streams vary. If each data chunk is a complete message which is more
or less independent from other messages, the event style is best. If, on
the other hand, the chunks are correlated (because the whole stream has a
strong internal structure, or because it has been chunked on arbitrary
boundaries that don't match its internal structure), then the callback
style is best.
* Confirmation events (like "connect/error" events that follows a
connection attempt, or a "drain" event that follows a write returning
false) are fully correlated => callback style.
Also, the event style API is more powerful than the callback style API as
it supports multiple listeners.
BUT:
* It is very easy to wrap a callback API with an event listener.
* Very often, in the correlated case, there is a "main" consumer which
needs to correlate the events, and auxiliary consumers that don't care that
much about the correlations (log them, feed statistics, etc). A dual API
with callbacks for the main consumer and events for the auxiliary ones
works great.
* Wrapping an event style API with a callback style API is a lot more
difficult.
* Callback style APIs are easier to use when the events are correlated
because you don't need to setup state machines to re-correlate the events.
Given this, I probably favor the callback style a lot more than most node
developers. But this is not a systematic "anti-event" attitude, there is a
rationale behind it and I wanted to share it with you.
Bruno
On Saturday, July 28, 2012 9:14:11 PM UTC+2, Mikeal Rogers wrote:
>
>
> On Jul 28, 2012, at July 28, 201212:05 PM, Tim Caswell <t...@creationix.com>
> wrote:
>
> > FWIW, I actually like Bruno's proposal. It doesn't cover all the use
> > cases, but it makes backpressure enabled pumps really easy.
> >
> > One use case missing that's easy to add is when consuming a binary
> > protocol, I often only want part of the input. For example, I might
> > want to get the first 4 bytes, decode that as a uint32 length header
> > and then read n more bytes for the body. Without being able to
> > request how many bytes I want, I have to handle putting data back in
> > the stream that I don't need. That's very error prone and tedious.
> > So on the read function, add an optional "maxBytes" or "bytes"
> > parameter. The difference is in the maxBytes case, I want the data as
> > soon as there is anything, even if it's less than the number of bytes
> > I want. In the "bytes" case I want to wait till that many bytes are
> > available. Both are valid for different use cases.
>
> The early stuff I saw included a "length" option.
>
> >
> > Also streams (both readable and writable) need a configurable
> > low-water mark. I don't want to wait till the pipe is empty before I
> > start piping data again. This mark would control how soon writable
> > streams called my write callback and how much readable streams would
> > readahead from their data source before waiting for me to call read.
> > I want to keep it always full. It would be great if this was handled
> > internally in the stream and consumers of the stream simply configured
> > what the mark should be.
>
> I think you're missing how this works. Nobody automatically asks for data
> so watermarks aren't strictly necessary. You ask for data if it's available
> and you read as much as you can handle.
>
> There is no "readahead". If someone stops calling read() then the buffer
> fills and, if it's a TCP stream, it's asked to stop sending data.
>
> Remember that when the "readable" event goes off it's expected that the
> pending data is read in the same event loop cycle.
>
>
>
>
>
------=_Part_475_10734916.1343584318792
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
@tim<br><br>The API that I used in this blog post is a simplified version o=
f the API I implemented in streamline. I simplified it in the blog post bec=
ause I just wanted to demo the equivalence between the two styles of API.<b=
r><br>The streams module that I am using (https://github.com/Sage/streamlin=
ejs/blob/master/lib/streams/server/streams.md) has most of the features tha=
t you saw missing:<br><br>* an optional "len" parameter in the read call.<b=
r>* low and high water mark options in the ReadableStream constructor.<br><=
br>The "len" parameter has your "bytes" semantics and I use it exactly the =
way you describe (typically to read 4 bytes to get a frame length and then =
read N bytes for a frame). I did not implement "maxBytes" semantics because=
I did not need it (which does not mean it would not be useful). The thing =
is that all the additional bells and whistles can be implemented around the=
basic read(cb) call (called readChunk in my module).<br><br>I introduced l=
ow and high mark options because I wanted to avoid a pause/resume dance aro=
und every data event when the data arrives faster than it is consumed. My a=
ssumption was that a little queue with high and low marks would reduce the =
number of pause/resume calls and improve performance. Basically tradiing a =
bit of space for speed. But I have to admit that I did not bench it. So, if=
the pause/resume dance costs very little this may be overkill.<br><br>@isa=
ac and mikeal,<br><br>This callback proposal may sound very "anti-eventish"=
and it may give the impression that I'm sorta trying to eradicate events f=
rom node's APis (nobody said it but I can see how it could be perceived thi=
s way). This is not the case. I like node's event API and I find it very el=
egant. But node gives us two API styles (callbacks and events) and it is no=
t always easy to choose between the two. Here is the rationale that I use t=
o decide between them:<br><br>My main criteria is CORRELATION. Basically, I=
start with the assumption that the API is event-oriented and then I analyz=
e the degree of correlation between the various events. If the events are h=
ighly correlated, I choose the callback style. If there are loosely correla=
ted, I keep the event style. Some examples:<br><br>* User events (browser s=
ide) are very loosely correlated =3D> event style<br>* Incoming HTTP req=
uests (server side) are also very loosely correlated =3D> event style<br=
>* Data streams vary. If each data chunk is a complete message which is mor=
e or less independent from other messages, the event style is best. If, on =
the other hand, the chunks are correlated (because the whole stream has a s=
trong internal structure, or because it has been chunked on arbitrary bound=
aries that don't match its internal structure), then the callback style is =
best.<br>* Confirmation events (like "connect/error" events that follows a =
connection attempt, or a "drain" event that follows a write returning false=
) are fully correlated =3D> callback style.<br><br>Also, the event style=
API is more powerful than the callback style API as it supports multiple l=
isteners. <br>BUT:<br><br>* It is very easy to wrap a callback API with an =
event listener.<br>* Very often, in the correlated case, there is a "main" =
consumer which needs to correlate the events, and auxiliary consumers that =
don't care that much about the correlations (log them, feed statistics, etc=
). A dual API with callbacks for the main consumer and events for the=
auxiliary ones works great.<br>* Wrapping an event style API with a callba=
ck style API is a lot more difficult.<br>* Callback style APIs are easier t=
o use when the events are correlated because you don't need to setup state =
machines to re-correlate the events.<br><br>Given this, I probably favor th=
e callback style a lot more than most node developers. But this is not a sy=
stematic "anti-event" attitude, there is a rationale behind it and I wanted=
to share it with you.<br><br>Bruno<br><br><br>On Saturday, July 28, 2012 9=
:14:11 PM UTC+2, Mikeal Rogers wrote:<blockquote class=3D"gmail_quote" styl=
e=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left:=
1ex;">
<br>On Jul 28, 2012, at July 28, 201212:05 PM, Tim Caswell <<a href=3D"m=
ailto:t...@creationix.com" target=3D"_blank">t...@creationix.com</a>> wrot=
e:
<br>
<br>> FWIW, I actually like Bruno's proposal. It doesn't cover all=
the use
<br>> cases, but it makes backpressure enabled pumps really easy.
<br>>=20
<br>> One use case missing that's easy to add is when consuming a binary
<br>> protocol, I often only want part of the input. For example, =
I might
<br>> want to get the first 4 bytes, decode that as a uint32 length head=
er
<br>> and then read n more bytes for the body. Without being able =
to
<br>> request how many bytes I want, I have to handle putting data back =
in
<br>> the stream that I don't need. That's very error prone and te=
dious.
<br>> So on the read function, add an optional "maxBytes" or "bytes"
<br>> parameter. The difference is in the maxBytes case, I want th=
e data as
<br>> soon as there is anything, even if it's less than the number of by=
tes
<br>> I want. In the "bytes" case I want to wait till that many b=
ytes are
<br>> available. Both are valid for different use cases.
<br>
<br>The early stuff I saw included a "length" option.
<br>
<br>>=20
<br>> Also streams (both readable and writable) need a configurable
<br>> low-water mark. I don't want to wait till the pipe is empty =
before I
<br>> start piping data again. This mark would control how soon wr=
itable
<br>> streams called my write callback and how much readable streams wou=
ld
<br>> readahead from their data source before waiting for me to call rea=
d.
<br>> I want to keep it always full. It would be great if this was=
handled
<br>> internally in the stream and consumers of the stream simply config=
ured
<br>> what the mark should be.
<br>
<br>I think you're missing how this works. Nobody automatically asks for da=
ta so watermarks aren't strictly necessary. You ask for data if it's availa=
ble and you read as much as you can handle.
<br>
<br>There is no "readahead". If someone stops calling read() then the buffe=
r fills and, if it's a TCP stream, it's asked to stop sending data.
<br>
<br>Remember that when the "readable" event goes off it's expected that the=
pending data is read in the same event loop cycle.
<br>
<br>
<br>
<br>
<br></blockquote>
------=_Part_475_10734916.1343584318792--
------=_Part_474_7171907.1343584318792--