HIVE: how to set mongo mappers count?

109 views
Skip to first unread message

Ярослав Литвинов

unread,
Oct 14, 2015, 6:05:55 PM10/14/15
to mongodb-user
We are trying to query our mongo collection in hive, as it described in example.

It works with relatively small collections, 
but when we issuing queries to our biggest collection it always fails on stage Map 1 with errors:
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: Java heap space

How can we configure mongo-hadoop to increase maps count and get results for our queries?
select count(*) from mongodb;
select * from mongodb where id=1;

Also tried to set memory option with no effect:
set mapred.child.java.opts="-Xmx12g -XX:+UseConcMarkSweepGC";

Hive 1.2.1.2.3.0.0-2557
mongo-hadoop 1.3.3 / mongo-hadoop 1.5.0


Luke Lovett

unread,
Oct 15, 2015, 2:31:48 PM10/15/15
to mongodb-user
The number of map tasks and reduce tasks is determined by the number of input splits. You can try changing what splitter implementation to use by setting the property "mongo.splitter.class". See https://github.com/mongodb/mongo-hadoop/wiki/Configuration-Reference#mongosplitterclass.

The following StackOverflow post also seems relevant to your situation: http://stackoverflow.com/questions/22870565/setting-mapred-child-java-opts-in-hive-script-results-in-mr-job-getting-killed

Basically, it seems that Hive expects no double-quotes around the Java options for mapred.child.java.opts. You might also try setting mapreduce.map.java.opts and mapreduce.reduce.java.opts, which have superceded mapred.child.java.opts (the latter is a deprecated option).

Ярослав Литвинов

unread,
Oct 15, 2015, 4:11:56 PM10/15/15
to mongodb-user
Thanks, it was a helpful reply!

We tried different split classes, and among them following definitely works for us, though mappers count is still the same :
> set mongo.splitter.class=com.mongodb.hadoop.splitter.MultiMongoCollectionSplitter;
> set mongo.input.split_size=512;
> SELECT * FROM mongodb;

Also we found useful this method, which allow to chunkify our collection:
> SET mongo.input.query={ "$and": [ {"_id": {"$gte":1}}, {"_id": {"$lt":2000}} ] };
> SELECT * FROM mongodb;
Reply all
Reply to author
Forward
0 new messages