Re: [mongodb-user] insertion of pdf files in mongodb

779 views
Skip to first unread message
Message has been deleted

Bernie Hackett

unread,
Oct 4, 2012, 10:52:17 AM10/4/12
to mongod...@googlegroups.com
You want the GridFS API in PyMongo:

http://api.mongodb.org/python/current/api/gridfs/index.html

On Wed, Oct 3, 2012 at 11:55 PM, Rahul Kumar Sharma
<rahulshar...@gmail.com> wrote:
> Hi,
> I am currently working on mongodb and using python driver. I want to
> be insert 10 GB collection of pdf files by using single script written in
> python. please suggest me some method. Thanks in advanced...
>
> Thanks
> Rahul
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
Message has been deleted

मैं एक भारतीय बेवकूफ हूँ

unread,
Oct 5, 2012, 1:31:36 AM10/5/12
to mongod...@googlegroups.com
How many options to you need?

Rahul Kumar Sharma wrote:
> Hello Bernie,
> Thanks for the reply. By using gridfs API, I am able
> to insert my data of pdf files into Mongodb. Can you please tell its the
> only way to doing such insertion or we have some other efficient method
> for the same.....
>
>
>
> On Thursday, October 4, 2012 8:22:26 PM UTC+5:30, Bernie Hackett wrote:
>
> You want the GridFS API in PyMongo:
>
> http://api.mongodb.org/python/current/api/gridfs/index.html
> <http://api.mongodb.org/python/current/api/gridfs/index.html>
>
> On Wed, Oct 3, 2012 at 11:55 PM, Rahul Kumar Sharma
> <rahulshar...@gmail.com <javascript:>> wrote:
> > Hi,
> > I am currently working on mongodb and using python driver. I
> want to
> > be insert 10 GB collection of pdf files by using single script
> written in
> > python. please suggest me some method. Thanks in advanced...
> >
> > Thanks
> > Rahul
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com
> <javascript:>
> > To unsubscribe from this group, send email to
> > mongodb-user...@googlegroups.com <javascript:>
> > See also the IRC channel -- freenode.net#mongodb
> <http://freenode.net#mongodb>

Russell Smith

unread,
Oct 5, 2012, 1:32:41 AM10/5/12
to mongod...@googlegroups.com
May I ask why you need / want them in the DB?

Russ
Rainforest | +1-650-919-3216 | rainforestqa.com

On Oct 4, 2012, at 10:29 PM, Rahul Kumar Sharma <rahulshar...@gmail.com> wrote:

> Hello Bernie,
> Thanks for the reply. By using gridfs API, I am able to insert my data of pdf files into Mongodb. Can you please tell its the only way to doing such insertion or we have some other efficient method for the same.....
>
>
>
> On Thursday, October 4, 2012 8:22:26 PM UTC+5:30, Bernie Hackett wrote:

Sam Millman

unread,
Oct 5, 2012, 3:14:45 AM10/5/12
to mongod...@googlegroups.com
As Russell hints, the most efficient (best) way to store them depends on how you need to use them.

Do you need to search their contents or just store them for later? What is their minimum and maximum size. If they are always smaller than 16meg you might find storing them directly within a single document might be more efficient.

Russell Smith

unread,
Oct 5, 2012, 3:23:02 AM10/5/12
to mongod...@googlegroups.com
Yep;

I would suggest evaluating why you want to store them in a db in the first place - there can be valid reasons for doing it, though I believe the best one is reducing complexity. However this comes at some expense, though it might be worth if for you.

The major advantages to doing this are;

- your files will be copied to each of your replicas
- your files will be backed up with your db
- they're easy to associate with other data and add meta data to
- you don't have to replicate / manage where files are (if you have them stored statically and have multiple web servers, you would)

The major disadvantages are;

- replica resync will take longer as it has +10G to sync
- backups, unless you exclude this will take longer as they have +10G more data
- wasted storage / ram on stuff which doesn't change
- it's much slower than serving the content from apache (etc) directly as static content

Unless you're really trying to keep complexity down or have some other special use case, I would normally suggest storing files like this on Amazon S3 or equiv.

HTH

Russ
Rainforest | +1-650-919-3216 | rainforestqa.com

Russell Bateman

unread,
Oct 5, 2012, 9:58:32 AM10/5/12
to mongod...@googlegroups.com
Very lucid coverage of this. Thanks, Russell.

Rameswar Parameswar

unread,
Aug 22, 2013, 3:57:23 PM8/22/13
to mongod...@googlegroups.com
what is the normal method of inserting  small pdf   documents  ( < 1 MB ) in to MongoDB using Java ?
Reply all
Reply to author
Forward
0 new messages