Storing Images in the GridFS

369 views
Skip to first unread message

Devin Dixon

unread,
Jan 9, 2012, 10:25:20 AM1/9/12
to mongod...@googlegroups.com
I am debating on storing images in the Mongo GridFS or on an cloud file system. I am leaning towards the cloud because of a few reasons.The language being used is GridFS on a Nginx server.

1. Storing images in the GridFS increases the size of the database. Therefore more of the database has to be in memory and I will spend more time/money managing the servers when it comes to things like sharding.

2. Retrieving the image from GridFS takes longer than cloud because I have to
a) Query the image using the id
b) read the image into memory
c) use a php header to display the image

The cloud would be better because its a url of the image directly to the cloud.

Does those reasons sound valid or should I be going in a different direction with my thinking?

Octavian Covalschi

unread,
Jan 9, 2012, 10:53:33 AM1/9/12
to mongod...@googlegroups.com
On Mon, Jan 9, 2012 at 9:25 AM, Devin Dixon <ddi...@ephare.com> wrote:
I am debating on storing images in the Mongo GridFS or on an cloud file system. I am leaning towards the cloud because of a few reasons.The language being used is GridFS on a Nginx server.

1. Storing images in the GridFS increases the size of the database. Therefore more of the database has to be in memory and I will spend more time/money managing the servers when it comes to things like sharding.

Does it? It's not the actual data in the memory, but it's caching (http://www.mongodb.org/display/DOCS/Caching) If I understand the problem properly...
 
2. Retrieving the image from GridFS takes longer than cloud because I have to
       a) Query the image using the id
       b) read the image into memory
       c) use a php header to display the image

 
You could implement a simple cache-ing system and put nginx in front of it... 
 
The cloud would be better because its a url of the image directly to the cloud.

Does those reasons sound valid or should I be going in a different direction with my thinking?



Few reasons I decided to use GridFS:
- easier to replicate (just add a node to replica set)
- easier to maintain ?! (no need for complex servers with expensive raid), jut RAM
- everything is in one place, backups are made easy again...

Now there are few trade offs, imho. If you'd need to make copy of stored images you'll have to pull them from mongodb and insert again, which shouldn't be a problem if images are not huge like GBs... 

 
Just my 0.02...

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


BlackMage

unread,
Jan 9, 2012, 11:08:19 AM1/9/12
to mongodb-user
I read the article and it brings up one of my points, memory.

- MongoDB can use all free memory on the server for cache space
automatically without any configuration of a cache size.

Even though its cached, it's cache in virtual memory on the server,
which can increase server cost and performance. Maybe I missing the
point, but it still seems better to store in a cloud and cache the
image in the clients browser.

On Jan 9, 10:53 am, Octavian Covalschi <octavian.covals...@gmail.com>
wrote:

Sam Millman

unread,
Jan 9, 2012, 11:12:26 AM1/9/12
to mongod...@googlegroups.com
"1. Storing images in the GridFS increases the size of the database. Therefore more of the database has to be in memory and I will spend more time/money managing the servers when it comes to things like sharding."

The OS can handle the LRU to make sure that the entire dataset does not need to be loaded into memory to get your images, instead loading only those that are required consistantly.

As to spending time and money it depends upon the rate at which the images on the DB will increase, high traffic of uploads = off the scale spending and upkeep (at which point you may want to think about dedicated space hosting).


"2. Retrieving the image from GridFS takes longer than cloud because I have to
       a) Query the image using the id
       b) read the image into memory
       c) use a php header to display the image|"

Indeed a CDN based file system will be faster and more robust but it will cost more than the DB space required to house the images.

The maintanence part is debateable. A pre-sized CDN/cloud rack will require no maintanence since it does not come from you at all where as gridfs will require you to build and maintain it, however if the CDN/cloud rack comes from you then gridfs is less maintanence.


"Does those reasons sound valid or should I be going in a different direction with my thinking?"

It is true that no pre or post-processing will be required to get the image directly from the URL unlike gridfs which will require pre and post server queries onto either a) an app server or b) a DB server.

My advice would be depending upon your scenario really. How busy is your site? how many uploads a day can you expect? etc

Sam Millman

unread,
Jan 9, 2012, 11:17:28 AM1/9/12
to mongod...@googlegroups.com
"Maybe I missing the
point, but it still seems better to store in a cloud and cache the
image in the clients browser."

That is done by the browser and many browsers can cache over a dynamic url now-a-days so if they expect an image from that url and they have it cached despite being a .php script instead of a .png the browser will still cache the image and load the cache (but only if the url matches exactly, as can be demostrated by using a timestamp in AJAX calls to stop the browser from caching certain outputs from your server).

BlackMage

unread,
Jan 9, 2012, 11:18:32 AM1/9/12
to mongodb-user
Ok, after reading your post, the site I am dealing with is an e-
commerce site. The sites allows users to upload multiple images per
product, so the amount of images being stored can be constantly
increasing. Uploads a day can vary anywhere from 10 to a little over
100. Taking this into consideration, would GridFS still be the better
choice?

BlackMage

unread,
Jan 9, 2012, 11:24:22 AM1/9/12
to mongodb-user
Sorry, I meant to say uploaded new products can be anywhere from 10
-100 a day. Products images vary between 1 - 5 .So new images every
day range from around 20 - 300. And then you have older images from
products still being sold.

Sam Millman

unread,
Jan 9, 2012, 11:43:11 AM1/9/12
to mongod...@googlegroups.com
Hmmmmm, the answer is not a exact science tbh.

Tbh I have to admit on my own e-commerce site who's images are only uploaded by the company and its employees we have not decided to use gridfs.

We actually use a normal rackspacecloud server to house the images for the time being.

We did look into gridfs but then we calculated it would cost more money and would not be as fast since to stop us from having to use CPU with gd2 to resize our images we store our images in their different sizes on upload (which means every image has about 5 sets of itself in different sizes).

We did come to the conclusion gridfs was not totally perfect for us.

In theory you have an upload pace that should be fine for gridfs but I cant completely foresee how many requests you would have and how caching would react (remember Mongo is no CDN). I would say because you have at least one foot in the area for gridfs you could test it out, maybe just implement a simple image requester and try it out in your environment to see what it does, forecast the cost to scale ratio and see if it matches your companies budget.

As I said before I don't see the environment being any more complex to run than a physical storage server unless you were to go for a pre-built CDN solution type thing, but that might actually cost more money than the shards required to house your images in gridfs.

So I would say keep gridfs in mind and test it out but don't consider it the absolute solution just yet, your in no mans land atm.
Reply all
Reply to author
Forward
0 new messages