Issues running the pipeline on hadoop2.4.0

17 views
Skip to first unread message

Samudra Banerjee

unread,
Apr 23, 2014, 8:20:51 PM4/23/14
to dkpro-big...@googlegroups.com
Hi Experts,

I hit the following exception when I try to run the pipeline on a hadoop-2.4.0 cluster (I recently upgraded it).

java.io.FileNotFoundException: File does not exist: hdfs://cpu02.nbl.cewit.stonybrook.edu:54137/home/sabanerjee/de.tudarmstadt.ukp.dkpro.bigdata.examples.UimaPipelineOnHadoop-run/de.tudarmstadt.ukp.dkpro.bigdata.examples-0.1.1-SNAPSHOT.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
    at de.tudarmstadt.ukp.dkpro.bigdata.hadoop.DkproHadoopDriver.run(DkproHadoopDriver.java:336)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at de.tudarmstadt.ukp.dkpro.bigdata.examples.UimaPipelineOnHadoop.main(UimaPipelineOnHadoop.java:135)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Looks like the job looks for the libraries in the hdfs instead of the local file system. This used to work for hadoop-1.0.3 and hadoop-2.2.0 (single node). Any thoughts?

Regards,
Samudra

Hans-Peter Zorn

unread,
May 1, 2014, 6:49:25 AM5/1/14
to dkpro-big...@googlegroups.com
Hi Samudra,

unfortunately I hadn't the chance to test it against 2.4.0. Are you using local mode or pseudo-distributed mode. It is the distributed cache which
is using HDFS to distribute the dependencies to the worker nodes. So while the jars at the end are in the local file system, the cache first copies
them to HDFS.  But the path seems to be a "local" path just with hdfs://!? I can't say where this comes from. However, this is not dkpro-specific and should
happen with all jobs that use the -libjars option.

Best,
-hp
Reply all
Reply to author
Forward
0 new messages