Issues running the pipeline on hadoop2.4.0

瀏覽次數:15 次
跳到第一則未讀訊息

Samudra Banerjee

未讀,
2014年4月23日 晚上8:20:512014/4/23
收件者:dkpro-big...@googlegroups.com
Hi Experts,

I hit the following exception when I try to run the pipeline on a hadoop-2.4.0 cluster (I recently upgraded it).

java.io.FileNotFoundException: File does not exist: hdfs://cpu02.nbl.cewit.stonybrook.edu:54137/home/sabanerjee/de.tudarmstadt.ukp.dkpro.bigdata.examples.UimaPipelineOnHadoop-run/de.tudarmstadt.ukp.dkpro.bigdata.examples-0.1.1-SNAPSHOT.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
    at de.tudarmstadt.ukp.dkpro.bigdata.hadoop.DkproHadoopDriver.run(DkproHadoopDriver.java:336)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at de.tudarmstadt.ukp.dkpro.bigdata.examples.UimaPipelineOnHadoop.main(UimaPipelineOnHadoop.java:135)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Looks like the job looks for the libraries in the hdfs instead of the local file system. This used to work for hadoop-1.0.3 and hadoop-2.2.0 (single node). Any thoughts?

Regards,
Samudra

Hans-Peter Zorn

未讀,
2014年5月1日 清晨6:49:252014/5/1
收件者:dkpro-big...@googlegroups.com
Hi Samudra,

unfortunately I hadn't the chance to test it against 2.4.0. Are you using local mode or pseudo-distributed mode. It is the distributed cache which
is using HDFS to distribute the dependencies to the worker nodes. So while the jars at the end are in the local file system, the cache first copies
them to HDFS.  But the path seems to be a "local" path just with hdfs://!? I can't say where this comes from. However, this is not dkpro-specific and should
happen with all jobs that use the -libjars option.

Best,
-hp
回覆所有人
回覆作者
轉寄
0 則新訊息