My GSoC proposal for Twitter Storm

185 views
Skip to first unread message

Varun Thacker

unread,
Apr 6, 2012, 5:52:11 AM4/6/12
to storm...@googlegroups.com
Hi,

I have been working on my proposal on "Building a higher level abstraction API on top of Storm". To achieve this I am in touch with Nathan Marz, discussing with him on what is needed for the project.


I would love suggestions from other contributors and users on how to make my application proposal stronger.

--


Regards,
Varun Thacker
http://www.vthacker.in/

Debojyoti Dutta

unread,
Apr 6, 2012, 12:29:57 PM4/6/12
to storm...@googlegroups.com
Hi Varun

This is an ambitious proposal! It might be very nice to give a concrete but simple use case that will consume these primitives and then break it down into these primitives. Also, you might want to divide the features into a must have by the end of summer and others that you might finish or you might want to get help on.

Awesome idea though...

debo
--
-Debo~

Debojyoti Dutta

unread,
Apr 6, 2012, 12:35:24 PM4/6/12
to storm...@googlegroups.com
Also if you could clarify the term pipe assembly..... given that the pipes are high level tuple processing prmitives (each/every etc). To define a pipe assembly, you would still have to define the pipe topology. So will your layer convert the pipe topology to an underlying storm topology with the pipe blocks?

debo
--
-Debo~

Varun Thacker

unread,
Apr 6, 2012, 12:36:17 PM4/6/12
to storm...@googlegroups.com
Hi Debojyoti,

Thanks for the suggestions. The idea was Nathan's though :)

I'll make the changes and update the proposal by tonight. 

James Xu

unread,
Apr 6, 2012, 12:36:53 PM4/6/12
to storm...@googlegroups.com
Yeah, a typical use case would be very helpful for understanding

2012/4/7 Debojyoti Dutta <ddu...@gmail.com>

Brian Tieman

unread,
Apr 6, 2012, 12:26:12 PM4/6/12
to storm...@googlegroups.com
Varun,
 
Interesting project :)
 
Storm feels very much like a workflow management system to me.  Something like Kepler[1] or Knime[2].  You might look there for inspiration...or maybe there's even a way to fit Storm into those systems.  It would be very cool to have a GUI representation of a topology that one can build up from a collection of spouts and bolts similar to the way these systems work.
 
Brian

Varun Thacker

unread,
Apr 6, 2012, 12:45:54 PM4/6/12
to storm...@googlegroups.com
Hi James,

Thanks for the suggestion. I'll update my proposal by adding a use case. 

P. Taylor Goetz

unread,
Apr 6, 2012, 12:59:56 PM4/6/12
to storm...@googlegroups.com
Awesome! Looking forward to it.

I had some discussions about something like this on the mailing list back in (I think) the late November time frame. If I have some time I'll try to dig them up.

This is something I had wanted to work on, but it's a huge undertaking and I just haven't found the time.

Glad to hear someone is picking it up. If there's any way I can help, let me know.

- Taylor
signature.asc

P. Taylor Goetz

unread,
Apr 6, 2012, 1:09:52 PM4/6/12
to storm...@googlegroups.com

Ted Dunning

unread,
Apr 6, 2012, 1:09:26 PM4/6/12
to storm...@googlegroups.com
I would consider your API choices very carefully.  Your proposal looks like Cascading, but I would suggest looking at the Google paper on FlumeJava for an alternative.  The big difference is that FlumeJava provides much tighter integration into the language using type inferencing in some very interesting ways to allow much better expressivity.

In my experiments on a clone of FlumeJava (see https://github.com/tdunning/plume ), I was encapsulate all of the environmental considerations into a single object from which all subsequent objects derive.  This allowed a lot of transparent repurposing of code.

Another consideration is that quite frankly, FlumeJava is more stylish.  If you want to use your project for academic purposes that might be important to you.

On Fri, Apr 6, 2012 at 2:52 AM, Varun Thacker <varuntha...@gmail.com> wrote:

Varun Thacker

unread,
Apr 6, 2012, 1:12:00 PM4/6/12
to storm...@googlegroups.com
Hi,

On Fri, Apr 6, 2012 at 10:29 PM, P. Taylor Goetz <ptg...@gmail.com> wrote:
Awesome! Looking forward to it.

I had some discussions about something like this on the mailing list back in (I think) the late November time frame. If I have some time I'll try to dig them up.


Even I'll look it up for reference.
 
This is something I had wanted to work on, but it's a huge undertaking and I just haven't found the time.

Glad to hear someone is picking it up. If there's any way I can help, let me know.

Sure :) Nathan Marz has already told me it won't be small.

- Taylor


On Apr 6, 2012, at 5:52 AM, Varun Thacker wrote:

Hi,

I have been working on my proposal on "Building a higher level abstraction API on top of Storm". To achieve this I am in touch with Nathan Marz, discussing with him on what is needed for the project.


I would love suggestions from other contributors and users on how to make my application proposal stronger.

--


Regards,
Varun Thacker
http://www.vthacker.in/

Varun Thacker

unread,
Apr 6, 2012, 1:20:39 PM4/6/12
to storm...@googlegroups.com
Hi Ted,

On Fri, Apr 6, 2012 at 10:39 PM, Ted Dunning <ted.d...@gmail.com> wrote:
I would consider your API choices very carefully.  Your proposal looks like Cascading, but I would suggest looking at the Google paper on FlumeJava for an alternative.  The big difference is that FlumeJava provides much tighter integration into the language using type inferencing in some very interesting ways to allow much better expressivity.


I will go through the FlumeJava paper. While making the API design for the project Nathan Marz has already told me that only parts of Cascading which would be beneficial for real time processing would make it. 
   
In my experiments on a clone of FlumeJava (see https://github.com/tdunning/plume ), I was encapsulate all of the environmental considerations into a single object from which all subsequent objects derive.  This allowed a lot of transparent repurposing of code.

When I am designing the API after the project starts it would be great if you could also guide me in shaping it.
 
Another consideration is that quite frankly, FlumeJava is more stylish.  If you want to use your project for academic purposes that might be important to you.

On Fri, Apr 6, 2012 at 2:52 AM, Varun Thacker <varuntha...@gmail.com> wrote:
Hi,

I have been working on my proposal on "Building a higher level abstraction API on top of Storm". To achieve this I am in touch with Nathan Marz, discussing with him on what is needed for the project.


I would love suggestions from other contributors and users on how to make my application proposal stronger.

--


Regards,
Varun Thacker
http://www.vthacker.in/

Louis Wasserman

unread,
Apr 6, 2012, 1:21:34 PM4/6/12
to storm...@googlegroups.com
Flume is pretty cool, and I had definitely thought about doing some experimenting with Storm-style operations on it myself.  (Also, I really do like how it uses the type system and preserves type safety.)

Ted Dunning

unread,
Apr 6, 2012, 5:12:04 PM4/6/12
to storm...@googlegroups.com
Happy to help.

Ted Dunning

unread,
Apr 6, 2012, 5:35:02 PM4/6/12
to storm...@googlegroups.com
Note that Flume != FlumeJava.

Flume is the Hadoop data ingest system.  FlumeJava is the Google developed java binding for map-reduce-combine programming.

Louis Wasserman

unread,
Apr 6, 2012, 5:40:03 PM4/6/12
to storm...@googlegroups.com
Sorry, I really did mean FlumeJava.  I've gotten to play with it a little.

hbagchi

unread,
Jul 14, 2013, 10:10:11 PM7/14/13
to storm...@googlegroups.com
In what state is it? 
Reply all
Reply to author
Forward
0 new messages