cascading real-time

141 views
Skip to first unread message

Koert Kuipers

unread,
Oct 25, 2011, 4:27:19 PM10/25/11
to cascadi...@googlegroups.com
There is a lot of talk these days about realtime distributed processing (realtime "hadoop"). We now have S4, Storm and i am sure more candidates for this role that i am not aware of.

One of the biggest challenges i see is how to write code once and be able to deploy it both real-time and in batch on hadoop. it strikes me that cascading with its pipe and flow metaphores would be a good platform for this. Has this been discussed before?

Chris K Wensel

unread,
Oct 25, 2011, 4:52:46 PM10/25/11
to cascadi...@googlegroups.com
Cascading 2 has isolated all Hadoop dependencies and now provides both a Hadoop and Local planner.

Local runs everything in memory from local files (using zero hadoop libraries). or could work from Taps talking to queues. This would be stream processing without distribution of processes (think Esper and similar systems). This would be an easy way to pre-process logs and put them on hdfs or some data store for example.

The most immediate use would be for Cascalog and other DSLs on Cascading to run locally against local files very fast. very helpful for prototyping queries.

In theory we could layer on Storm or anything new that pops up on the new Hadoop resource architecture (YARN i think) by building a new planner.

Cascading 2 wip is available now if you want to play with it. All regression tests work on both the hadoop and local planners.
http://www.concurrentinc.com/downloads/

chris

On Oct 25, 2011, at 1:27 PM, Koert Kuipers wrote:

> There is a lot of talk these days about realtime distributed processing (realtime "hadoop"). We now have S4, Storm and i am sure more candidates for this role that i am not aware of.
>
> One of the biggest challenges i see is how to write code once and be able to deploy it both real-time and in batch on hadoop. it strikes me that cascading with its pipe and flow metaphores would be a good platform for this. Has this been discussed before?
>

> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support for Cascading

Nathan Stults

unread,
Nov 3, 2011, 9:28:00 PM11/3/11
to cascading-user
FYI, This idea is being discussed on the Storm group as well.
http://groups.google.com/group/storm-user/browse_thread/thread/b1d8732c5cbf0dfa


On Oct 25, 1:52 pm, Chris K Wensel <ch...@wensel.net> wrote:
> Cascading 2 has isolated all Hadoop dependencies and now provides both a Hadoop and Local planner.
>
> Local runs everything in memory from local files (using zero hadoop libraries). or could work from Taps talking to queues. This would be stream processing without distribution of processes (think Esper and similar systems). This would be an easy way to pre-process logs and put them on hdfs or some data store for example.
>
> The most immediate use would be for Cascalog and other DSLs on Cascading to run locally against local files very fast. very helpful for prototyping queries.
>
> In theory we could layer on Storm or anything new that pops up on the new Hadoop resource architecture (YARN i think) by building a new planner.
>
> Cascading 2 wip is available now if you want to play with it. All regression tests work on both the hadoop and local planners.http://www.concurrentinc.com/downloads/
>
> chris
>
> On Oct 25, 2011, at 1:27 PM, Koert Kuipers wrote:
>
> > There is a lot of talk these days about realtime distributed processing (realtime "hadoop"). We now have S4, Storm and i am sure more candidates for this role that i am not aware of.
>
> > One of the biggest challenges i see is how to write code once and be able to deploy it both real-time and in batch on hadoop. it strikes me that cascading with its pipe and flow metaphores would be a good platform for this. Has this been discussed before?
>
> > --
> > You received this message because you are subscribed to the Google Groups "cascading-user" group.
> > To post to this group, send email to cascadi...@googlegroups.com.
> > To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/cascading-user?hl=en.
>
> --
> Chris K Wensel
> ch...@concurrentinc.comhttp://www.concurrentinc.com
Reply all
Reply to author
Forward
0 new messages