Storing and retrieving PDF files using MongoDB

19,418 views
Skip to first unread message

Raja Devarakonda

unread,
Aug 25, 2014, 1:06:20 PM8/25/14
to mongod...@googlegroups.com
Hello All:
I wanted to know if I can store and retrieve PDF files using MongoDB. I do have millions of files to store and retrieve in an efficient and faster manner. Is MongoDB a choice for it?
Give me few MongoDB features & pointers to study for implementing this usecase.

Thanks

Tugdual Grall

unread,
Aug 26, 2014, 1:50:25 AM8/26/14
to mongod...@googlegroups.com
Hello Raja,

The short answer is Yes you can do it with MongoDB, and it is a good fit. 

As you know you have 2 -maybe 3- ways of using "files" with MongoDB,  the approach will depend of your use case (and size of the document):

1. Store the files directly in the document
As you know you can store anything you want in a JSON/BSON document, you just need to store the bytes and your PDF will be part of the document.
You jut need to be careful about the document size limit of 16Mb.

You can add meta-data for the file in the JSON document, and they are stored in the same place.

For example in Java you will just store byte[] in an attribute, look at his test:


2. Use GridFS
GridFS allows you to store files of "any size" into MongoDB. The file you are storing is divided in chunks by the driver and store into smaller documents into MongoDB, when you read it it will be put back in a single file. With this approach you do not have any size limit.

In this case if you want to add metadata, you create a JSON document that you store with all the attributes and a reference to the GridFS file.

You can find information about this here : http://docs.mongodb.org/manual/core/gridfs/

and to this Java test:


3. Create Reference to an external storage
This one is not directly a "MongoDB" use case, but I think it is important to mention it. You can obviously store the files in some special storage and use MongoDB just for the meta data, and reference this file. I will take a stupid example but suppose you want to create a Video application, you can store the videos in YouTube and reference this into the document with all your application meta data.

So let's stay on your use case/question, so you can use approach 1 & 2, and it will depends of the size of the files and how do you access them. If you can give us more information about your application people may have a stronger opinion on the best approach. 

If you are looking for people doing this, you can look at this presentation from MongoDB World : http://www.mongodb.com/presentations/translational-medicine-platform-sanofi-0 

RegardsTug
@tgrall
Reply all
Reply to author
Forward
0 new messages