Problem in Mapper Output

6 views
Skip to first unread message

Samudra Banerjee

unread,
Mar 18, 2014, 1:53:22 AM3/18/14
to dkpro-big...@googlegroups.com
I was trying to run an annotation job over a set of documents. On reading the Sequence files output by the mapper, I cannot find the annotations on the CAS objects. On debugging further, I found that the "this.engine.process()" line seems to work as expected, because text dumps of the CASes just after execution of that line show the expected annotations. However something seems to go wrong when collecting the CAS object. Am I missing anything?


The DkproMapper class with some debugging lines is at


Thanks,
Samudra

Samudra Banerjee

unread,
Mar 18, 2014, 12:12:51 PM3/18/14
to dkpro-big...@googlegroups.com
I notice that on commenting out the line output.collect(outkey, outValue); also, I get the part-yyyyy sequence files in the output folder (and they are unannotated CAS objects). So looks like this line is not having any effect. I think there is something I am not able to understand correctly. I am new to hadoop, so maybe I am missing something. The number of reducers is set to 0.

Samudra Banerjee

unread,
Mar 19, 2014, 12:58:44 AM3/19/14
to dkpro-big...@googlegroups.com
It seems to work now .. But I don't think I changed anything .. maybe I was doing something wrong .. I cannot reproduce the issue again .. Thanks and apologies for this!

Samudra
Reply all
Reply to author
Forward
0 new messages