Has sense to minify a json object before saving it to reduce space usage?

163 views
Skip to first unread message

Michael Soza

unread,
Jul 27, 2016, 8:33:56 PM7/27/16
to mongodb-user, ps2c...@gmail.com
I would like to know if minifying a json object to reduce the space usage in a mongodb has sense. I ask that because Im confused in the BSON transformation step, I read somewhere that there is somekind of compression applied, so two objects with same structure but different field size may consume a similar disk space. 

In my personal tests I see differences, for example creating a gigantic object in terms of fields size i could produce an error for attempting to save an object greater that 16mb.
If i save the same object in terms of structure but with smaller fields no error is produced. 
I think that is a sign that the object is not minified. So minifying could help in reduce space usage, but I would like to know from other peoples with more experience.



Thanks,
Michael


Amar

unread,
Aug 5, 2016, 3:21:03 AM8/5/16
to mongodb-user, ps2c...@gmail.com

Hi Michael,

The limitation of BSON Document Size is 16 MB only applies to the total size of a BSON document (including field names and values) in memory where it is uncompressed. WiredTiger provides compression on disk but this doesn’t apply to the BSON documents in memory. In fact, one of the reasons this limitation is in place is to avoid excessive use of RAM by a single document.

Minifying usually mean removing characters that is not part of the data structure, such as spaces, indentation, newlines and and comments. However, these are not recorded in BSON, since BSON only stores the structure of the document, the field names, and the data associated with the fields.

On the other hand, if by minifying you mean shortening field names while keeping the document structure, then this could reduce the total size of the document. For example:

> db.col.insert({usingalongfieldname:1})
WriteResult({ "nInserted" : 1 })
> db.col.stats()
{
    "ns" : "test.col",
    "count" : 1,
    "size" : 51,
    "avgObjSize" : 51,
  (...)

> db.col2.insert({field:1})
WriteResult({ "nInserted" : 1 })
> db.col2.stats()
{
    "ns" : "test.col2",
    "count" : 1,
    "size" : 37,
    "avgObjSize" : 37,
  (...)

Where you can see in the second case that the document size is 14 bytes less in size compared to the first example, as the field name has 14 characters less.

Regards,

Amar


Reply all
Reply to author
Forward
0 new messages