How run cascalog/jcascalog as standalon application

81 views

Skip to first unread message

sourab...@corp.247customer.com

unread,

May 9, 2013, 11:00:03 AM5/9/13

to cascal...@googlegroups.com

Hi

Currently I have created uberjar including all the dependecies for my jcascalog application and running it using "hadoop jar ..". But I was in an impression that we can run jcascalog application as standalone application by providing core-site.xml , mapred-site.xml and hdfs-site.xml in the class path. But I was never able to connect to the remote namenode or jobtracker directly.

mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value><ip>:<port></value>
</property>

core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://<ip>:<port></value>
</property>

Please let me know if I am missing something here. As I am using these xmls, do I still need to add job-conf.clj ??

Thanks
Sourabh

vmar...@gmail.com

unread,

May 10, 2013, 1:16:13 AM5/10/13

to cascal...@googlegroups.com

I dunno if I understood you correctly, but maybe this will help you ... ( i planned to create a blog post about this)

I just jumped into BigData/Hadoop land, and the thing that bothers me is that 95% of all examples on the web of submitting MR job on hadoop is shown via shell command (hadoop jar), whereas I consider that to be a bit limiting, becuase I want to have control/freedom to decide where I want to submit my jobs from (from hadoop or non-hadoop machine, remotely from my web app running in Tomcat, remotely from my IDE on windows...). Especially is useful to be able to submit job from your IDE, because that way you have fastest development cycle. But problem is that Hadoop jobs require code jars deployed, and that means tackling with with "job driver" app deployment package, which brings collision with how that app is deployed when running it inside IDE or within some other deployment platform.

So currently, I ended one with using Gradle to mark as special "configuration" all 3rd party JARs needed for my MR jobs(such as Cascalog jars and others) so I can copy them to external directory prior to running application, This directory contains also JAR containing your user-defined classes needed for jobs (such as your MR functions):

configurations {

mapreduce {

description = 'Map reduce jobs dependencies'

}

compile {

extendsFrom mapreduce

}

task prepareMapReduceLibs(type: Sync, dependsOn: jar) {

from jar.outputs.files

from configurations.mapreduce.files

into 'mapreducelib'

}

I don't want to have hadoop jars inside this directory, so I exclude them (unfortunately a bit clumsy since I did it per-dependency basis), such as:

mapreduce ("cascalog:cascalog-core:${cascalogVersion}") {

exclude group: "org.apache.hadoop", module: "hadoop-core"

}

Of course, hadoop dependency is included in "compile" Gradle configuration.

Now, you can use this directory ("mapreducelib" in above example) to copy this JARs to hadoop HDFS automatically at boot time of your application, and add them to cache afterwards so the jobs could use them. I created my utility class JobHelper to encapsulate that code, but here is just a usage of that:

String hdfsJarsDir = "/myjobs/mylibs";

JobHelper.copyLocalJarsToHdfs("./mapreducelib", hdfsJarsDir, configuration);

JobHelper.addHdfsJarsToDistributedCache(hdfsJarsDir, configuration);

JCascalog requires setting of configuration properties via Map, so you have to convert your COnfiguration into Map, such as:

private static void configureCascalog(Configuration configuration) {

Map map = convertConfigurationToMap(configuration);

System.out.println("Configuring Cascalog with properties: " + map);

Api.setApplicationConf(map);

}

private static Map<String, String> convertConfigurationToMap(Configuration configuration) {

Map<String, String> map = new HashMap<String, String>();

for (Map.Entry<String, String> configurationEntry : configuration) {

map.put(configurationEntry.getKey(), configurationEntry.getValue());

}

return map;

}

regards,

Vjeran

Reply all

Reply to author

Forward

0 new messages