Storage of MongoDB PDF Files Taking doulbe the expected amount.

76 views
Skip to first unread message

Sebastian Rivera

unread,
Jan 6, 2017, 5:28:24 PM1/6/17
to mongodb-user
Hi,

A couple of months ago I launched a web application for a customer using mongodb for the first time. One of the collections can have documents that store a single .pdf file in bynary format. The size of the .pdf file that gets stored in the document is always around 7mb. To my surprize, the documents that do contain the .pdf files seem to be taking up double the space. As it stands the collection size is at 70gb with only 3900 documents containing the 7mb .pdf file. How can that be? Can anyone shed some light on this?

Iľja Pelech

unread,
Jan 9, 2017, 2:46:35 AM1/9/17
to mongodb-user
Hi, 

just quick thoughts on the topic...

The overhead may depend on various factors:
  • the way, binary data is stored - binary gets stored as base64 encoded string which adds a significant overhead (a 7MB pdf takes up to 10MB as base64 string)
  • storage engine: 
    • MMAP1 in pre-3.[not sure which minor] adds quite huge padding to documents. 
    • WiredTiger, on the other hand, does not use any padding, even enables you to compress the data.
  • the way you determine the size of the collection:
    • does the size count in also indexes/padding/another possible overhead?
Reply all
Reply to author
Forward
0 new messages