MongoDB MapReduce can't handle large datasets (BSON Size error)

603 views
Skip to first unread message

Abhishek Shrivastava

unread,
Jul 5, 2014, 3:20:46 AM7/5/14
to mongod...@googlegroups.com
Hi folks,

I'm trying to run a map-reduce job on a single-node mongo server but getting frustrated because after 10-12 hours it always fails with this exception: 

{ "errmsg" : "exception: BSONObj size: 16800905 (0x895C0001) is invalid. Size must be between 0 and 16793600(16MB) First element: ...... }", "code" : 10334, "ok" : 0 }

Now my source collection has 15 Million documents, and my mapper fires about 3 emits for each document. So total emit calls ~ 45 Millions. The max no. of elements an emit key can aggregate is about 3-4 Million, and there are about 70,000 emit keys. Does all these numbers look like they are too big for Mongo to handle?

BTW, when I reduced the no. of emits fired per document to just 1, then the job finished successfully to 8 hours. So I'm assuming this error has something to do with the large no. of emits.

Any help would be greatly appreciated friends. Thanks!

Ali hallaji

unread,
Jul 7, 2014, 4:22:47 AM7/7/14
to mongod...@googlegroups.com
set allowDiskUse to true and test again.

Abhishek Shrivastava

unread,
Jul 8, 2014, 11:22:49 AM7/8/14
to mongod...@googlegroups.com
As far as I know, allowDiskUse is for aggregate() function and not mapReduce(). I'm not facing a memory crunch, I'm facing an internal mongo-db BSON size limit constraint. After mapper is done, and before going to the reducer, it seems Mongo internally aggregates each key-values pair into a single BSON document, which inherently has a limit of 16 MB. This means that if the value array is too large (i.e. lots of emits), then it won't fit into a single BSON document.

Abhishek Shrivastava

unread,
Jul 8, 2014, 11:51:03 AM7/8/14
to mongod...@googlegroups.com
This is very similar to the issue these guys faced - https://github.com/variety/variety/issues/6 . But they had to completely get rid of map-reduce to avoid this. Apparently, the issue is mongo can't handle when there are just TOO MANY values grouped against a single key passed by a mapper to the reducer internally.

Ali hallaji

unread,
Jul 8, 2014, 12:13:41 PM7/8/14
to mongod...@googlegroups.com

Yes of course,
Are you know or familar with gridfs?
Maybe help you for your problem.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/j4tWfP8nA5Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/9a4b5667-7baa-4185-9a39-e3ab656fa53d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Asya Kamsky

unread,
Jul 8, 2014, 2:33:05 PM7/8/14
to mongodb-user
The issue is that the 16MB document limit applies to everything - documents you store, documents MapReduce tries to generate, documents aggregation tries to return, etc.

What exactly is the reduce code doing in your MapReduce job?  Are you accumulating some sort of array of values?   Because it sounds like you are exceeding maximum document size.

Asya



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
Reply all
Reply to author
Forward
0 new messages