Local runs everything in memory from local files (using zero hadoop libraries). or could work from Taps talking to queues. This would be stream processing without distribution of processes (think Esper and similar systems). This would be an easy way to pre-process logs and put them on hdfs or some data store for example.
The most immediate use would be for Cascalog and other DSLs on Cascading to run locally against local files very fast. very helpful for prototyping queries.
In theory we could layer on Storm or anything new that pops up on the new Hadoop resource architecture (YARN i think) by building a new planner.
Cascading 2 wip is available now if you want to play with it. All regression tests work on both the hadoop and local planners.
http://www.concurrentinc.com/downloads/
chris
On Oct 25, 2011, at 1:27 PM, Koert Kuipers wrote:
> There is a lot of talk these days about realtime distributed processing (realtime "hadoop"). We now have S4, Storm and i am sure more candidates for this role that i am not aware of.
>
> One of the biggest challenges i see is how to write code once and be able to deploy it both real-time and in batch on hadoop. it strikes me that cascading with its pipe and flow metaphores would be a good platform for this. Has this been discussed before?
>
> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com
-- Concurrent, Inc. offers mentoring, support for Cascading