Similarily to Storm

21 views
Skip to first unread message

Tom Robinson

unread,
Nov 30, 2012, 7:44:51 AM11/30/12
to webp...@googlegroups.com
I realized WebPipes is kind of similar to Storm's model, without the distributedness.

The core abstraction in Storm is the "stream". A stream is an unbounded sequence of tuples. Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way. For example, you may transform a stream of tweets into a stream of trending topics.

The basic primitives Storm provides for doing stream transformations are "spouts" and "bolts". Spouts and bolts have interfaces that you implement to run your application-specific logic.

A spout is a source of streams. For example, a spout may read tuples off of a Kestrel queue and emit them as a stream. Or a spout may connect to the Twitter API and emit a stream of tweets.

A bolt consumes any number of input streams, does some processing, and possibly emits new streams. Complex stream transformations, like computing a stream of trending topics from a stream of tweets, require multiple steps and thus multiple bolts. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.

https://github.com/nathanmarz/storm/wiki/Tutorial

A spout is kind of like our triggers, and a bolt is like a block.

The major differences are:

1. Storm is "distributed". You specify how many instances of each spout/bolt you want, and how inputs to those instances should be grouped. This lets Storm to spread computation across multiple servers like in MapReduce.

2. Instead of piping around individual values, you pipe tuples. Each bolt inputs a single tuple (which has multiple fields). This simplifies the dependency resolution of a pipeline executor since there's only one input (with multiple fields). However the bolts' input/output fields must match, so it's harder to have a collection of generic bolts, which is really what we want in WebPipes.

Anyway, it would be cool to build Storm topologies with the pipes editor, but maybe that's a separate project.

Another related thought I had was the node-webpipes API isn't HTTP specific at all. Maybe the library could expose other transports, like TCP sockets, WebSockets, ZeroMQ, even Storm bolts. Then it's no longer WebPipes, but more like JSONPipes or something.

I'm not proposing anything in particular, just wanted to throw these ideas out there (but it's 4am so they may not make any sense)

-tom

Tom Robinson

unread,
Nov 30, 2012, 8:41:48 AM11/30/12
to webp...@googlegroups.com
Actually, now that I think about it, the blocks and triggers on their own are exactly equivalent to bolts and spouts (tuple in, one or more tuple out), it's just the pipelines and pipeline executor that differ from Storm's model. i.e. you could model a Storm topology as WebPipe blocks and triggers, you'd just need to make sure the fields match up as you do in Storm.

Perhaps we shouldn't put too much effort into specing the pipeline behavior. There could potentially be multiple different types of pipelines / pipeline executors. I think we should just start implementing and see what works and makes sense.

-tom

Jeff Lindsay

unread,
Nov 30, 2012, 7:47:08 PM11/30/12
to Tom Robinson, webp...@googlegroups.com
Actually I was probably influenced a lot by Storm because I did a whole report on using it at Twilio. I wasn't planning on spec'ing the pipeline executor so much as building something that will make sense and extract behavior expectations (the semantic implications). 

But I'll definitely revisit Storm for my own inspiration. I wouldn't say WebPipes is less distributed. It's actually more distributed. HA is just not guaranteed because we push that responsibility to the edge, but it's still possible (like keeping up any API). 


--
 
 



--
Jeff Lindsay
http://progrium.com

Tom Robinson

unread,
Dec 1, 2012, 1:56:47 AM12/1/12
to webp...@googlegroups.com, Tom Robinson
Yeah I guess I meant fault tolerant / high availability, not distributed.

The pipeline definition and executor semantics are fairly dependent on each other. I don't think we should even have a formal pipeline definition spec yet.

Tom Robinson

unread,
Dec 1, 2012, 4:14:47 PM12/1/12
to webp...@googlegroups.com
"Streaming joins" discusses a bit how you can "join" multiple inputs to a bolt: https://github.com/nathanmarz/storm/wiki/Common-patterns

But they have more flexibility than what we've been discussing, because the bolts themselves do the joining rather than the pipeline executor. They can do this more easily because everything is asynchronous, whereas only our triggers are, but blocks aren't.

Jeff Lindsay

unread,
Dec 1, 2012, 4:20:32 PM12/1/12
to Tom Robinson, webp...@googlegroups.com
Mmm... whenever I talked about joining anything it was done by a block, not the executor. And I don't think async matters.
Reply all
Reply to author
Forward
0 new messages