[nodejs] announcing event-stream

218 views
Skip to first unread message

Dominic Tarr

unread,
Sep 18, 2011, 6:34:25 AM9/18/11
to nod...@googlegroups.com
hey every one!

I think I've leveled up at node recently, and one of the new moves I've gained is how to use streams.

Streams are AWESOME, they are one of the best things about node, yet, not many people are using them.

Streams can be used for more than just input and output, streams can also be used for throughput!  
which is a great idea, because streams have a naturally scalable API.


npm install event-stream

you can do things like this:

es.map(function (data, callback) {
  callback(null, data)
})

this would transform a stream of file names into a stream of file Stats:

es.map(fs.stat)

connect lets you join streams like middle ware:
var connected = 
es.connect(
    process.openStdin(),
    es.split(), //break by /n
    es.map(fs.stat),
    es.map(function (d.cb) { callback(null, JSON.stringify(d) + '/n') }),
    process.stdout
    )

the advantage this has over using `pipe` it that it returns a single stream that is composed of a bunch of streams, but still acts like one through stream (a stream that is both readable and writable)

(also, listening for 'error' on `connected` will receive errors from all the composite streams)

comments are most appreciated!

cheers, Dominic

Richard Marr

unread,
Sep 18, 2011, 10:23:54 AM9/18/11
to nod...@googlegroups.com
On 18 September 2011 11:34, Dominic Tarr <domini...@gmail.com> wrote:
> connect lets you join streams like middle ware:

Like it a lot. The first thing that hit me when looking at this was the TokenStream model in Lucene. I feel an experiment coming on.





--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en



--
Richard Marr

Dominic Tarr

unread,
Sep 18, 2011, 11:27:29 PM9/18/11
to nod...@googlegroups.com
yes, that would be a natural application of it.

event-stream has a function that rechunks the stream on newlines, but you could rechunk on word boundries also, and then just pipe into lucene.

isn't lucene written in java? how do you connect it to node?

Floby

unread,
Sep 19, 2011, 3:30:33 AM9/19/11
to nodejs
this is cool. I like the stream transformation. And it seems easier to
write than a bunch of streams you'd pipe together.
I made a few of these in my days
https://github.com/Floby/node-json-streams
https://github.com/Floby/node-parser
https://github.com/Floby/node-tokenizer

On Sep 19, 5:27 am, Dominic Tarr <dominic.t...@gmail.com> wrote:
> yes, that would be a natural application of it.
>
> event-stream has a function that rechunks the stream on newlines, but you
> could rechunk on word boundries also, and then just pipe into lucene.
>
> isn't lucene written in java? how do you connect it to node?
>
> On Mon, Sep 19, 2011 at 12:23 AM, Richard Marr <richard.m...@gmail.com>wrote:
>
>
>
>
>
>
>
> > On 18 September 2011 11:34, Dominic Tarr <dominic.t...@gmail.com> wrote:
>
> > > connect lets you join streams like middle ware:
>
> > Like it a lot. The first thing that hit me when looking at this was the
> > TokenStream model in Lucene. I feel an experiment coming on.
>
> > On 18 September 2011 11:34, Dominic Tarr <dominic.t...@gmail.com> wrote:
>
> >> hey every one!
>
> >> I think I've leveled up at node recently, and one of the new moves I've
> >> gained is how to use streams.
>
> >> Streams are AWESOME, they are one of the best things about node, yet, not
> >> many people are using them.
>
> >> Streams can be used for more than just input and output, streams can also
> >> be used for *throughput! *
> >> which is a great idea, because streams have a naturally scalable API.
>
> >>https://github.com/dominictarr/event-stream
>
> >> npm install event-stream
>
> >> you can do things like this:
>
> >> es.map(function (data, callback) {
> >>   callback(null, data)
> >> })
>
> >> this would transform a stream of file names into a stream of file Stats:
>
> >> es.map(fs.stat)
>
> >> connect lets you join streams like middle ware:
> >> var connected =
> >> es.connect(
> >>     process.openStdin(),
> >>     es.split(), //break by /n
> >>     es.map(fs.stat),
> >>     es.map(function (d.cb) { callback(null, JSON.stringify(d) + '/n') }),
> >>     process.stdout
> >>     )
> >> *
> >> *
> >> the advantage this has over using `pipe` it that it returns a single
> >> stream that is composed of a bunch of streams, but still acts like one *through
> >> stream* (a stream that is both readable and writable)

Dominic Tarr

unread,
Sep 19, 2011, 4:02:21 AM9/19/11
to nod...@googlegroups.com

awesome, those look good.

I didn't see these when I searched on search.npmjs.org ... oh it's only showing one page.. okay, posting an issue. https://github.com/isaacs/npmjs.org/issues/36

I'm assembling a list of compatible modules, I'll add yours.

wow! you have a streaming parser! I've been looking for one of those.
how hard would it be to get it to emit objects on the fly?

i'm thinking you would need to pass it a reference to the main array,

for example, couchdb returns

{"total_rows":20,"offset":0,"rows":[...]}

so, if you could say `new Parser('rows')`

and it would stream the contents of the rows array? another approach might be to emit the contents of the bottom most array...

what do you think?

Floby

unread,
Sep 20, 2011, 8:19:32 AM9/20/11
to nodejs
I assume you're talking about the json parser. currently you can only
get notified when the whole object is parsed. I know it's not optimal
and I've been meaning to find some way to support finer events. It
currently keeps reference to parent objects in closures which is very
easy to use.
Modifying it to emit objects on the fly is not really difficult, you
probably just have to put an emit() call on this line [1]. What is
harder is finding a way to specify what objects you want to be
notified of. I've been looking into JSONpath [2] but I feel it's not
ready yet.

[1] https://github.com/Floby/node-json-streams/blob/master/lib/ParseStream.js#L99
[2] http://goessner.net/articles/JsonPath/

On Sep 19, 10:02 am, Dominic Tarr <dominic.t...@gmail.com> wrote:
> awesome, those look good.
>
> I didn't see these when I searched on search.npmjs.org ... oh it's only
> showing one page.. okay, posting an issue.https://github.com/isaacs/npmjs.org/issues/36

Dominic Tarr

unread,
Sep 20, 2011, 9:40:45 AM9/20/11
to nod...@googlegroups.com
yes.

It would certainly be complicated to stream arbitrary json objects, and emit every leaf. If you are retrieving some massive amorphous object, then streaming probobly doesn't make sense.

But on the other hand, if you are retrieving a list of possibly amorphous objects, which is probably the 90% case, then streaming would make a lot of sense.

In cases like the github api, or couchdb views, it would work completely fine if you just streamed on the elements of the bottom most array. 

(in these cases, an array is either the root, or a child of the root object.)

I am currently working on a Stream interface to couchdb, (https://github.com/dominictarr/couch-stream
working on the API first, then optimization like a streaming parser will be next. 
(although it does not yet have have actual streaming, using the Stream api does make sense, for example pulling in pages of views, but respecting downstream pause events... I plan to attach this to pause/resume events generated by say, scroll bar position -- streams all the way!)


Reply all
Reply to author
Forward
0 new messages