How to handle document size error, if it exceeds 16MB while inserting a document

6,133 views
Skip to first unread message

thris...@deepcompute.com

unread,
Jan 4, 2018, 4:25:39 PM1/4/18
to mongodb-user


Hi guys,


           Can anyone please suggest how to handle the document size exceeds 16MB error while inserting the document into the collection on MongoDB. I got some solutions like GridFS. By using GridsFS can handle this problem but I need a solution without using GridFS. Is there any way to make the document smaller or split into subdocuments. If yes how can we achieve? Ref Stackoverflow link: https://stackoverflow.com/questions/48093636/how-to-handle-document-size-exceeds-16mb-error-while-inserting-a-document-into-t


from pymongo import MongoClient

conn = MongoClient("mongodb://sample_mongo:27017")
db_conn = conn["test"]
db_collection = db_conn["sample"]

# the size of record is 23MB

record = { \
    "name": "drugs",
    "collection_id": 23,
    "timestamp": 1515065002,
    "tokens": [], # contains list of strings
    "tokens_missing": [], # contains list of strings
    "token_mapping": {} # Dictionary contains transformed tokens
 }

db_collection.insert(record, check_keys=False)


I got the error DocumentTooLarge: BSON document too large. In MongoDB, the maximum BSON document size is 16 megabytes.


  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 2501, in insert
check_keys, manipulate, write_concern)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 575, in _insert
check_keys, manipulate, write_concern, op_id, bypass_doc_val)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 556, in _insert_one
check_keys=check_keys)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 482, in command
self._raise_connection_failure(error)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 610, in _raise_connection_failure
raise error
  DocumentTooLarge: BSON document too large (22451007 bytes) - the connected server supports BSON document sizes up to 16793598 bytes.

Kevin Adistambha

unread,
Jan 7, 2018, 7:58:48 PM1/7/18
to mongodb-user

Hi

The quick answer is no, you cannot go around the 16 MB BSON size limitation. If you hit this limit, you will need to explore alternatives such as GridFS or different schema design for your documents.

I would start by asking a series of questions to determine the focus of your design, such as:

  1. You have fields called tokens, tokens_missing, and token_mapping. I imagine these fields are very large individually, and putting all three into one document pushes it to >16 MB. Is it possible to split this document into three collections instead?

  2. What is your application’s access pattern? What field do you need to access all the time? What field you don’t access that often? You can split up the document into different collections based on those patterns.

  3. Bear in mind the need to index the documents, since MongoDB’s performance is highly tied to good indexes that supports your query. You cannot index two arrays in a single index. There are more information in Multikey Indexes.

  4. If you need to combine all the related data in a query, MongoDB 3.2 and newer provides you with the $lookup operator, which is similar to SQL’s left outer join.

Unlike SQL’s normal form schema design, MongoDB’s schema design is based on your application’s access pattern. If you hit the 16 MB limit, it usually means that the design is probably not optimal, since such large documents will be detrimental to performance, difficult to update, etc. Typically, it’s better to have a lot of small documents as opposed to a few gigantic documents.

More examples can be found in Data Model Design and Data Model Examples and Patterns.

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages