mongodb-hadoop

26 views

Skip to first unread message

senior7515

unread,

May 3, 2012, 6:21:11 PM5/3/12

to mongodb-user

I got hadoop 1.0.2 working with mongo-hadoop on ubuntu 12.04LTS
successfully.

Question on the wordcount program.

https://github.com/mongodb/mongo-hadoop/blob/master/examples/wordcount_split_test/src/main/java/com/mongodb/hadoop/examples/wordcount/split/WordCountSplitTest.java#L119

job.setCombinerClass( IntSumReducer.class );
job.setReducerClass( IntSumReducer.class );

job.setOutputKeyClass( Text.class );
job.setOutputValueClass( IntWritable.class );

job.setInputFormatClass( MongoInputFormat.class );
job.setOutputFormatClass( MongoOutputFormat.class );

1. The input class is not defined I think that the input class in java
can be inferred by the query object. (DBObject)
2. Is it customary set the OutputValueClass ( MyCustomClass.class) or
is it more idiomatic to use hadoop's data types?

Mike O'Brien

unread,

May 4, 2012, 5:15:37 PM5/4/12

to mongodb-user

1. When using the mongo+hadoop connector, the input class is always
DBObject - this is handled by the input format class (look at
MongoInputFormat.java) this takes the raw data and parses it to
produce documents which are sent to the mapper.

2. From what I've seen it's more idiomatic to use hadoop data types
for output value classes, these are optimized for serialization over
the network.

On May 3, 6:21 pm, senior7515 <gallego.al...@gmail.com> wrote:
> I got hadoop 1.0.2 working with mongo-hadoop on ubuntu 12.04LTS
> successfully.
>
> Question on the wordcount program.
>