Issues running the pipeline on hadoop2.4.0

瀏覽次數：15 次

跳到第一則未讀訊息

Samudra Banerjee

未讀,

2014年4月23日晚上8:20:512014/4/23

收件者：dkpro-big...@googlegroups.com

Hi Experts,

I hit the following exception when I try to run the pipeline on a hadoop-2.4.0 cluster (I recently upgraded it).

java.io.FileNotFoundException: File does not exist: hdfs://cpu02.nbl.cewit.stonybrook.edu:54137/home/sabanerjee/de.tudarmstadt.ukp.dkpro.bigdata.examples.UimaPipelineOnHadoop-run/de.tudarmstadt.ukp.dkpro.bigdata.examples-0.1.1-SNAPSHOT.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
    at de.tudarmstadt.ukp.dkpro.bigdata.hadoop.DkproHadoopDriver.run(DkproHadoopDriver.java:336)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at de.tudarmstadt.ukp.dkpro.bigdata.examples.UimaPipelineOnHadoop.main(UimaPipelineOnHadoop.java:135)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Looks like the job looks for the libraries in the hdfs instead of the local file system. This used to work for hadoop-1.0.3 and hadoop-2.2.0 (single node). Any thoughts?

Regards,
Samudra

Hans-Peter Zorn

未讀,

2014年5月1日清晨6:49:252014/5/1

收件者：dkpro-big...@googlegroups.com

Hi Samudra,

unfortunately I hadn't the chance to test it against 2.4.0. Are you using local mode or pseudo-distributed mode. It is the distributed cache which

is using HDFS to distribute the dependencies to the worker nodes. So while the jars at the end are in the local file system, the cache first copies

them to HDFS. But the path seems to be a "local" path just with hdfs://!? I can't say where this comes from. However, this is not dkpro-specific and should

happen with all jobs that use the -libjars option.

Best,

-hp

回覆所有人

回覆作者

轉寄

0 則新訊息