Inputreader option usage

16 views
Skip to first unread message

francois kawala

unread,
Apr 1, 2012, 11:46:25 AM4/1/12
to dumbo...@googlegroups.com
Hi all, 

After spending some time trying to figure it out, I can't manage to use a custom inputreader. My intention was to use as inputreader the following Mahout class : org.apache.mahout.classifier.bayes.XmlInputFormat, in order to process xml wikipedia dumps. (as proposed in the following thread : 
https://groups.google.com/forum/#!msg/mrjob/4J-Kdw3AXMI/WIBlzSSLGxcJ)

But according to my tests, the input seems to be processed as raw text, without any use of the org.apache.mahout.classifier.bayes.XmlInputFormat

May be I am totally missing the point with the org.apache.mahout.classifier.bayes.XmlInputFormat class ?

Would have you any clue to help me ? It will be very appreciated =)
François.

Klaas Bosteels

unread,
Apr 2, 2012, 4:07:46 AM4/2/12
to dumbo...@googlegroups.com
Input readers and input formats are very different things. Seems like you want to use a custom input format, which should just be a matter of using the  -inputformat org.apache.mahout.classifier.bayes.XmlInputFormat  option and sending the mahout jar along with your dumbo job via the  -libjar  option.

-K

--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/dumbo-user/-/R3WvvNinxmwJ.
To post to this group, send email to dumbo...@googlegroups.com.
To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/dumbo-user?hl=en.

francois kawala

unread,
Apr 2, 2012, 6:25:17 PM4/2/12
to dumbo...@googlegroups.com
Hi, 

Thank for the tip, I've managed to use org.apache.mahout.classifier.bayes.XmlInputFormat as I intended to. The trick was to derive it from FileInputFormat, and pass parameters xmlinput.start / xmlinput.end through jobconf. 
Reply all
Reply to author
Forward
0 new messages