Storage of MongoDB PDF Files Taking doulbe the expected amount.
76 views
Skip to first unread message
Sebastian Rivera
unread,
Jan 6, 2017, 5:28:24 PM1/6/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mongodb-user
Hi,
A couple of months ago I launched a web application for a customer using mongodb for the first time. One of the collections can have documents that store a single .pdf file in bynary format. The size of the .pdf file that gets stored in the document is always around 7mb. To my surprize, the documents that do contain the .pdf files seem to be taking up double the space. As it stands the collection size is at 70gb with only 3900 documents containing the 7mb .pdf file. How can that be? Can anyone shed some light on this?
Iľja Pelech
unread,
Jan 9, 2017, 2:46:35 AM1/9/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mongodb-user
Hi,
just quick thoughts on the topic...
The overhead may depend on various factors:
the way, binary data is stored - binary gets stored as base64 encoded string which adds a significant overhead (a 7MB pdf takes up to 10MB as base64 string)
storage engine:
MMAP1 in pre-3.[not sure which minor] adds quite huge padding to documents.
WiredTiger, on the other hand, does not use any padding, even enables you to compress the data.
the way you determine the size of the collection:
does the size count in also indexes/padding/another possible overhead?