limitation with new streams

143 views
Skip to first unread message

Mark Volkmann

unread,
Mar 29, 2013, 12:17:11 PM3/29/13
to nod...@googlegroups.com
I have a node module in npm called liner. See http://github.com/mvolkmann/node-liner.
It reads from a file or a stream and emits each line by looking for newline characters.

I just pushed a new version today that supports both old-style and new-style streams.

With old-style streams, if there is a blank line in the source I can emit a data event with an empty string.
With new-style streams, it seems I can't do that. Calling push with an empty string doesn't allow listeners to read anything. So for now when I want to push an empty string, I am pushing just a newline character instead.

Is there a better workaround for this?

--
R. Mark Volkmann
Object Computing, Inc.

Mark Hahn

unread,
Mar 29, 2013, 1:38:32 PM3/29/13
to nodejs
It seems new streams were rushed out a bit.  As long as I continue to see usability problems almost every day I'm holding off on trying them.


--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Marco Rogers

unread,
Mar 29, 2013, 1:42:56 PM3/29/13
to nod...@googlegroups.com
I'm not sure why this stopped working. But I'm also not sure why you would expect pushing no data to do something. Don't get me wrong, I understand the argument. I just don't know which behavior really makes the most sense. There's an argument that the new behavior is better.

:Marco

Matt

unread,
Mar 29, 2013, 1:50:09 PM3/29/13
to nod...@googlegroups.com
On Fri, Mar 29, 2013 at 1:42 PM, Marco Rogers <marco....@gmail.com> wrote:
I'm not sure why this stopped working. But I'm also not sure why you would expect pushing no data to do something. Don't get me wrong, I understand the argument. I just don't know which behavior really makes the most sense. There's an argument that the new behavior is better.

How is an empty string "no data"? 

Marco Rogers

unread,
Mar 29, 2013, 1:54:20 PM3/29/13
to nod...@googlegroups.com
Is that a real question? Let me repeat, I know the arguments. But consider it from a socket standpoint. The empty string is converted to a Buffer. That buffer has zero length and no bytes in it. It's reasonable for the underlying system to determine that there is nothing to do. The question I'm bringing up, and just bringing up mind you because I have no stomach to argue about it, is whether this reasonable assumption belongs at the level of the streams api.

And again, I've done no research whatsoever into this. What I'm talking about may not even be the problem here. Just talking out loud while I avoid doing actual work. Please carry on.

:Marco



--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "nodejs" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nodejs/xzt5CLqIJe0/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to nodejs+un...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Marco Rogers
marco....@gmail.com | https://twitter.com/polotek

Life is ten percent what happens to you and ninety percent how you respond to it.
- Lou Holtz

Mark Hahn

unread,
Mar 29, 2013, 1:57:35 PM3/29/13
to nodejs
How is an empty string "no data"? 

+1  

In general, if I tell something to do something then it should do something.  I hate SW that tries to be clever and override me.


You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.

Matt

unread,
Mar 29, 2013, 2:10:40 PM3/29/13
to nod...@googlegroups.com
On Fri, Mar 29, 2013 at 1:54 PM, Marco Rogers <marco....@gmail.com> wrote:
Is that a real question?

Yes. I'm serious.
 
Let me repeat, I know the arguments. But consider it from a socket standpoint. The empty string is converted to a Buffer. That buffer has zero length and no bytes in it.

Only if you're converting it to a C string. If you converted it to a P-string* it would be different.
 
It's reasonable for the underlying system to determine that there is nothing to do.

Isaac Schlueter

unread,
Mar 29, 2013, 2:12:52 PM3/29/13
to nodejs
> It seems new streams were rushed out a bit. As long as I continue to see usability problems almost every day I'm holding off on trying them.

Well, such is the nature of platform software development. Most
people won't it until it is marked "stable" and you can't possibly
address use cases without trying them, so any change ends up breaking
some things.

Actually, I'm seeing a lot *fewer* issues with new streams than I
would have expected, considering that they have involved a rewrite of
so much of the JavaScript layer in Node.

The number of problems is more due to the size of Node's community
right now, than any problems with the change itself. Consider it to
the fallout from the close/end event change in 0.8 child processes
(which was a relatively minor change that caused TONS of issues), or
the transition to libuv in 0.6 (zomg), and really, I'm quite proud of
how well it's gone.


Let's focus on the issue:

Your line-buffering module is treating any string of any length to be
valid output. Also, it's expecting that strings will not be
concatenated, or otherwise manipulated. A zero-length string is a
valid data point.

This is an objectMode stream on the readable side. What about an
approach like this?

https://gist.github.com/5272525

Isaac Schlueter

unread,
Mar 29, 2013, 2:20:37 PM3/29/13
to nodejs
Whoops, I accidentally the word.

> Most people won't *try* it until it is marked "stable"

Mark Hahn

unread,
Mar 29, 2013, 2:37:02 PM3/29/13
to nodejs
 Consider it to the fallout from the close/end event change in 0.8 child processes

Yes that bit me.  I had a couple of old modules that broke and I had to fix their code.

Yes, I am guilty of waiting for stability and then trying it.  That makes me a leech.  Considering the schedule pressure I'm under it's not possible for me to test unstable.

On the empty-data question:  What problem is the new behavior trying to solve?  It seems like a random change.

Jake Verbaten

unread,
Mar 29, 2013, 4:16:14 PM3/29/13
to nod...@googlegroups.com
you can put the stream in `objectMode` and then calling `.push("")` will append a the literal value `""` to the internal buffer. So calling `.read()` will return "". 

Of course in objectMode there is no buffer concatenation so you can't do `read(bytes)`

Mark Volkmann

unread,
Mar 29, 2013, 4:28:03 PM3/29/13
to nod...@googlegroups.com
That worked. Thanks!

Isaac Schlueter

unread,
Mar 29, 2013, 7:29:23 PM3/29/13
to nodejs
On Fri, Mar 29, 2013 at 11:37 AM, Mark Hahn <ma...@hahnca.com> wrote:
> Yes, I am guilty of waiting for stability and then trying it. That makes me
> a leech. Considering the schedule pressure I'm under it's not possible for
> me to test unstable.

Oh, I think leech is too strong a word. Sorry, it wasn't my intent to
imply that there's anything *wrong* with waiting for it to be stable.
I'm sure there are a lot of people who wait for it to be not just
stable, but at least x.y.6 or something.

The even/odd transition is still valuable. It means "the core team
and bleeding-edge experimenters can't find any more serious bugs, and
it's up to par on performance, so we're ready to commit to this API,
and you should try it out now, even if you don't upgrade for a while".
It's totally fine to wait as long as makes sense for your needs.

That being said, I will probably try to convince you to try out the
unstable builds, because that makes my life a whole lot easier ;)


> On the empty-data question: What problem is the new behavior trying to
> solve? It seems like a random change.

This is a great question!

Basically, the requirement for an "empty push", comes from the need to
have a way to say "I'm not reading any more, but I don't have any more
data, and I won't be getting any unless you try to ask for it again,
so check back when you know I might have more".

It's VERY rare that you'll need to say that. However, in core, we
have to support TLS, which is about the most crazy dance you've ever
imagined. Here's (in broad strokes) how it works:

There's a "pair" of CryptoStreams. One of the two things in the pair
is an EncryptedStream, and the other is a CleartextStream. They are
attached at the hip, so to speak, and each is a duplex.

Whenever you write() into the EncryptedStream, this sends data into
OpenSSL's machinery. Occasionally, this may cause data to pop *out*
of OpenSSL's machinery for the CleartextStream to consume. Likewise,
whenever you write() into the CleartextStream, this sends data into
the other side of the OpenSSL machine, which may cause data to pop out
on the encrypted side.

From far away, this almost looks like it ought to be two Transform
streams. One of them you write crypto in and get clear out, and the
other you write clear in and get crypto out. However, TLS is not so
simple. There's all kinds of handshakes and other back-and-forth that
has to happen on the crypto side, without ever producing any cleartext
data. So, sometimes, you write to the EncryptedStream, and this
creates more data for the readable side of the EncryptedStream, but
*not* any data for the CleartextStream.

So, even if you did it as two Transform streams, you'd still have the
attached-at-the-hip complications.

So, once you've got the pair, you do something like this:

socket.pipe(encrypted);
encrypted.pipe(socket);

and then instead of using the TCP socket for your HTTP stuff, you use
the CleartextStream, which mostly looks just like a net.Socket object.
You write() plain data to it, and plain data comes out from the
readable side, so all is good.

So, why the push('')?

Well, every time we write() into one stream in the pair, we need to
trigger a check to OpenSSL to see if there's any more data for the
other side. We do this by calling stream.read(0), which can kick off
a _read() call. However, the Readable machinery is smart enough to
not call _read() if you're already reading and haven't pushed
anything. (Otherwise, you'd have to be careful to not accidentally
fs.read() the file while there's already a pending read, etc., and the
complications for implementing streams gets quite a bit worse.)

Of course, you could get around this by doing
`stream._readableState.reading = false` but that's definitely too much
of a mingling of concerns. Touching those flags in the constructor is
mildly gross, but usually OK. Messing around with them in the midst
of operation is definitely crossing a line, even for core modules that
are good friends.

So, `push('')` was implemented as a way to say "I have zero bytes of
data, and the read is completed, but this is not the EOF. It's YOUR
responsibility to try again later. I'm going to sit here and do
nothing until then." This situation does not exist for sockets,
files, or basically any other stream except TLS and some other bizarre
use-cases.

Regarding whether or not "" is "no data", it really depends on whether
you're a byte stream or an object stream. In fact, your view of "" is
a great litmus test. If "" is relevant data, then you're an object
stream. That is, even if you don't emit "objects" per se, the nature
of the object is itself relevant; not just the bytes it represents.
From a byte stream's point of view, "" is "zero bytes", and thus "zero
data".

Core streams have to support all the streaming APIs in core. Luckily
for you guys, this means that it'll probably cover all the streaming
APIs that you have as well, but of course, as we've seen, there are
always surprising edge cases that come up when people start using your
code for real things.

Mark Hahn

unread,
Mar 30, 2013, 2:42:47 AM3/30/13
to nodejs
Thanks for the detailed explanation.  Much was over my head but I think the object vs bytes concept sunk in.

And I promise to try and find the time to run some odd versions.


Reply all
Reply to author
Forward
0 new messages