Exception while running sample code, UimaPipelineOnHadoop

Samudra Banerjee

unread,

Feb 19, 2014, 7:23:53 PM2/19/14

to dkpro-big...@googlegroups.com

Hi Experts,

I started with running the example, "UimaPipelineOnHadoop" and was having some trouble. My understanding of how this works using dkpro-bigdata is as follows (correct me if I am wrong. I really want to understand this stuff :) ):

You specify a path on your file system from where the CollectionReader reader loads the txt files. These files are converted into a sequence file and then dumped into HDFS as a sequence file in the location specified by the first argument in args. The second argument specifies the location where the job output (that of the reducer) will be stored.

Am I right?

My code is in https://github.com/sam8dec/hadoop-annotator/blob/master/src/main/java/edu/sunysb/cs/dsl/lydia2/annotatorhadoop/UimaPipelineOnHadoop.java

Now the problem is when I run this code on hadoop using the following command,

hadoop jar <project_jar_file>.jar edu.sunysb.cs.dsl.lydia2.annotatorhadoop.UimaPipelineOnHadoop /user/sabanerjee/annotatorhadoop/ /user/sabanerjee/annotatorhadoop/output

I get the following exception:

Feb 19, 2014 6:35:06 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(410)

SEVERE: Exception occurred

org.apache.uima.analysis_engine.AnalysisEngineProcessException

at de.tudarmstadt.ukp.dkpro.bigdata.io.hadoop.CASWritableSequenceFileWriter.process(CASWritableSequenceFileWriter.java:144)

at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)

at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378)

at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298)

at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)

at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:224)

at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:145)

at de.tudarmstadt.ukp.dkpro.bigdata.hadoop.DkproHadoopDriver.run(DkproHadoopDriver.java:158)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at edu.sunysb.cs.dsl.lydia2.annotatorhadoop.UimaPipelineOnHadoop.main(UimaPipelineOnHadoop.java:80)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:160)job.setOutputValueClass(BinCasWritable.class);

Caused by: java.io.IOException: wrong value class: de.tudarmstadt.ukp.dkpro.bigdata.io.hadoop.BinCasWithTypeSystemWritable is not class de.tudarmstadt.ukp.dkpro.bigdata.io.hadoop.BinCasWritable

at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:1177)

at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1039)

at de.tudarmstadt.ukp.dkpro.bigdata.io.hadoop.CASWritableSequenceFileWriter.process(CASWritableSequenceFileWriter.java:139)

... 14 more