Using Cascadcing Local vs Cascading on Hadoop Local Mode

31 views
Skip to first unread message

Kunal Lahiri

unread,
Aug 21, 2015, 3:59:37 PM8/21/15
to cascading-user
Hi,

I have just started using Cascading some time back. I started off with building flows using local taps. But when i want to run the jobs on the cluster i have to use HFS taps. So is it a good idea to always run using HFS taps running on Hadoop in local mode. I have Hadoop 2.6 configured on my local Windows machine as well.

This way i can just test all the code on my machine and once done i can deploy it to the cluster without making any changes in the code or using any additional libraries in the code for local mode.

Regards,
Kunal

Ken Krugler

unread,
Aug 21, 2015, 4:58:40 PM8/21/15
to cascadi...@googlegroups.com
Hi Kunal,

Cascading's local mode is (much) faster than Hadoop local mode, which is why we typically run most of our tests using it.


But this obviously creates the issue of how to switch to using Hadoop taps and other classes when you want to run on Hadoop, either locally or on a cluster.

There is built-in support in Cascading for abstracting away the platform for tests - e.g. see HadoopPlatform

This is used for running tests against multiple platforms, but we wanted something that would let us run the same workflow tool code with -test parameter (for Cascading local) vs. Hadoop.

So we added something for our needs in cascading.utils - see BasePlatform, among others.

-- Ken


From: Kunal Lahiri

Sent: August 21, 2015 12:59:37pm PDT

To: cascading-user

Subject: Using Cascadcing Local vs Cascading on Hadoop Local Mode




--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply all
Reply to author
Forward
0 new messages