cascading real-time

Koert Kuipers

unread,

Oct 25, 2011, 4:27:19 PM10/25/11

to cascadi...@googlegroups.com

There is a lot of talk these days about realtime distributed processing (realtime "hadoop"). We now have S4, Storm and i am sure more candidates for this role that i am not aware of.

One of the biggest challenges i see is how to write code once and be able to deploy it both real-time and in batch on hadoop. it strikes me that cascading with its pipe and flow metaphores would be a good platform for this. Has this been discussed before?

Chris K Wensel

unread,

Oct 25, 2011, 4:52:46 PM10/25/11

to cascadi...@googlegroups.com

Cascading 2 has isolated all Hadoop dependencies and now provides both a Hadoop and Local planner.

Local runs everything in memory from local files (using zero hadoop libraries). or could work from Taps talking to queues. This would be stream processing without distribution of processes (think Esper and similar systems). This would be an easy way to pre-process logs and put them on hdfs or some data store for example.

The most immediate use would be for Cascalog and other DSLs on Cascading to run locally against local files very fast. very helpful for prototyping queries.

In theory we could layer on Storm or anything new that pops up on the new Hadoop resource architecture (YARN i think) by building a new planner.

Cascading 2 wip is available now if you want to play with it. All regression tests work on both the hadoop and local planners.
http://www.concurrentinc.com/downloads/

chris

On Oct 25, 2011, at 1:27 PM, Koert Kuipers wrote:

> There is a lot of talk these days about realtime distributed processing (realtime "hadoop"). We now have S4, Storm and i am sure more candidates for this role that i am not aware of.
>
> One of the biggest challenges i see is how to write code once and be able to deploy it both real-time and in batch on hadoop. it strikes me that cascading with its pipe and flow metaphores would be a good platform for this. Has this been discussed before?
>

> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support for Cascading

Nathan Stults

unread,

Nov 3, 2011, 9:28:00 PM11/3/11

to cascading-user

FYI, This idea is being discussed on the Storm group as well.
http://groups.google.com/group/storm-user/browse_thread/thread/b1d8732c5cbf0dfa

On Oct 25, 1:52 pm, Chris K Wensel <ch...@wensel.net> wrote:
> Cascading 2 has isolated all Hadoop dependencies and now provides both a Hadoop and Local planner.
>
> Local runs everything in memory from local files (using zero hadoop libraries). or could work from Taps talking to queues. This would be stream processing without distribution of processes (think Esper and similar systems). This would be an easy way to pre-process logs and put them on hdfs or some data store for example.
>
> The most immediate use would be for Cascalog and other DSLs on Cascading to run locally against local files very fast. very helpful for prototyping queries.
>
> In theory we could layer on Storm or anything new that pops up on the new Hadoop resource architecture (YARN i think) by building a new planner.
>

> Cascading 2 wip is available now if you want to play with it. All regression tests work on both the hadoop and local planners.http://www.concurrentinc.com/downloads/

>
> chris
>
> On Oct 25, 2011, at 1:27 PM, Koert Kuipers wrote:
>
> > There is a lot of talk these days about realtime distributed processing (realtime "hadoop"). We now have S4, Storm and i am sure more candidates for this role that i am not aware of.
>
> > One of the biggest challenges i see is how to write code once and be able to deploy it both real-time and in batch on hadoop. it strikes me that cascading with its pipe and flow metaphores would be a good platform for this. Has this been discussed before?
>
> > --
> > You received this message because you are subscribed to the Google Groups "cascading-user" group.
> > To post to this group, send email to cascadi...@googlegroups.com.
> > To unsubscribe from this group, send email to cascading-use...@googlegroups.com.

> > For more options, visit this group athttp://groups.google.com/group/cascading-user?hl=en.
>
> --
> Chris K Wensel
> ch...@concurrentinc.comhttp://www.concurrentinc.com

Reply all

Reply to author

Forward