Replicas and Sharding or a Distributed Filesystem?

571 views
Skip to first unread message

EdemilsonLima

unread,
Jun 11, 2011, 10:27:57 AM6/11/11
to mongodb-user
What would be better: use MongoDB replicas with a sharding scheme or
run it without replicas or sharding, but on top of a distributed file
system?

I think that FhGFS (http://www.fhgfs.com) is a good option since it
is, free, fault tolerant, parallel, distributed, POSIX compliant, and
even faster than Lustre.

I read many articles talking about the inumerous advantages of using
embed binaries or GridFS for storing files. The only concern to me is
the execution speed of the application to output files from the
database, instead of having it being served by the web server from a
filesystem. An option is use a web cache like Varnish in the front.
Maybe the performance can be about the same? What about nginx with
GridFS plugin (https://github.com/mdirolf/nginx-gridfs)?

One important thing is that with a distributed file system I could
have only one copy of the application files being accessed by all
nodes in the cluster. Another option could be using a script to deploy
those files from a SVN repository to all nodes. What are your opinion
about that? Isn't it better to have redundancy for these too?

Also, the session files could be shared by the application in all
nodes. Or is it better to put session information in Memcached? By the
way, is it possible to configure PHP to store session data in a
MongoDB collection?

Eliot Horowitz

unread,
Jun 13, 2011, 12:27:26 AM6/13/11
to mongod...@googlegroups.com
A lot of this depends on personal preference and other parts of the system.

For example, if you're using MongoDB for other pieces of your system, then using for files makes sense as you have few type of pieces.

If its purely a mongo vs distributed file system choice, I think your best bet is to experiment with each and see which you like better.




--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


EdemilsonLima

unread,
Jun 13, 2011, 9:05:47 PM6/13/11
to mongodb-user
I was thinking about this. Maybe using MongoDB on top of a distributed
file system will not work if we want to run multiple daemons over the
same database, because of file locking. The best feature of MongoDB is
to run paralell queries over many nodes. We can't give up of this! :)

About storing files in MongoDB I am not sure about using it for
images, since is much faster to Apache load images directly from the
file system than run a script to get each image. Think about a gallery
page for example, the overhead it could cause (well, we still have
options like cache or Ngix plugin for GridFS). For other types of
files (i.e. downloadable files) I think is much better to store it in
the database.

About moving the session to MongoDB I saw some performance tests at
https://github.com/mostlygeek/MongoSession that MongoDB performs very
fast. This is a must for NoSQL over SQL, since is well know that MySQL
is terrible slow for PHP sessions. By the way, a distributed file
system can be fast to store PHP session files. This option could take
the session overhead from the database. But I read somewhere that the
same does not apply to a shared NFS volume, since it perform very
poorly and doesn't manage concurrency so well.

Andrew Armstrong

unread,
Jun 13, 2011, 10:09:15 PM6/13/11
to mongodb-user
A problem with having a distributed file system (instead of just using
built-in mongo sharding/replication) is that you can only have the one
physical server serving requests (concurrent access to the same files
isn't allowed).

That means you will sometime reach a limit as to what a single server
can handle (# of requests/sec etc) at which point you can't do
anything about it.

Using sharding/replication fixes this problem by letting you have
independent machines and isolating points of failure.

On Jun 14, 11:05 am, EdemilsonLima <puls...@gmail.com> wrote:
> I was thinking about this. Maybe using MongoDB on top of a distributed
> file system will not work if we want to run multiple daemons over the
> same database, because of file locking. The best feature of MongoDB is
> to run paralell queries over many nodes. We can't give up of this! :)
>
> About storing files in MongoDB I am not sure about using it for
> images, since is much faster to Apache load images directly from the
> file system than run a script to get each image. Think about a gallery
> page for example, the overhead it could cause (well, we still have
> options like cache or Ngix plugin for GridFS). For other types of
> files (i.e. downloadable files) I think is much better to store it in
> the database.
>
> About moving the session to MongoDB I saw some performance tests athttps://github.com/mostlygeek/MongoSessionthat MongoDB performs very

Min

unread,
Jun 13, 2011, 11:19:34 PM6/13/11
to mongod...@googlegroups.com
I'm heavily using the GridFS to store files as a distribute file storage. It works excellent. 
But it doesn't mean it is a silver bullet as a distribute file storage. 

IMHO, storing files to GridFS and accessing the files through multiple web servers could be good approach. Also to boost performance, a http cache could reduce accessing to mongodb every time.
I've also tested nginx-gridfs, its a inspiring project. I don't know how big files your service should support. For small files, it will work very well but would have poor performance for bigger files.
Nginx is asynchronous server and nginx-gridfs uses blocking mongoding C-library. If it is not re-written with Nginx upstream or a similar way, it might have a concurrency issue at higher requests.

Thanks

2011/6/14 Andrew Armstrong <phpl...@gmail.com>

EdemilsonLima

unread,
Jun 15, 2011, 7:44:12 AM6/15/11
to mongodb-user
But at http://www.fhgfs.com/wiki/wikka.php?wakka=SystemArchitecture it
says:

"FhGFS clients have direct access to the storage servers and
communicate with multiple servers simultaneously, giving your
applications truly parallel access to the file data."

I think this is possible because of a distributed metadata system.

On Jun 13, 11:09 pm, Andrew Armstrong <phpla...@gmail.com> wrote:
> A problem with having a distributed file system (instead of just using
> built-in mongo sharding/replication) is that you can only have the one
> physical server serving requests (concurrent access to the same files
> isn't allowed).
>
> That means you will sometime reach a limit as to what a single server
> can handle (# of requests/sec etc) at which point you can't do
> anything about it.
>
> Using sharding/replication fixes this problem by letting you have
> independent machines and isolating points of failure.
>
> On Jun 14, 11:05 am, EdemilsonLima <puls...@gmail.com> wrote:
>
>
>
>
>
>
>
> > I was thinking about this. Maybe using MongoDB on top of a distributed
> > file system will not work if we want to run multiple daemons over the
> > same database, because of file locking. The best feature of MongoDB is
> > to run paralell queries over many nodes. We can't give up of this! :)
>
> > About storing files in MongoDB I am not sure about using it for
> > images, since is much faster to Apache load images directly from the
> > file system than run a script to get each image. Think about a gallery
> > page for example, the overhead it could cause (well, we still have
> > options like cache or Ngix plugin for GridFS). For other types of
> > files (i.e. downloadable files) I think is much better to store it in
> > the database.
>
> > About moving the session to MongoDB I saw some performance tests athttps://github.com/mostlygeek/MongoSessionthatMongoDB performs very

EdemilsonLima

unread,
Jun 15, 2011, 7:51:30 AM6/15/11
to mongodb-user
I think GridFS is great as a file storage, specially small files, like
images.
My only concern is about get files fast from it without compromise the
server.
Imagine a gallery page with, for example, 40 images.
Each image will be a request to a script in the server, like:

<img src="getimage.php?id=123456" />

The script must do a connection to Mongo and output the binary stream.
Now multiply this by the number of concurrent users.

How much RAM will be used in this case?
How fast is that compared to images served directly from the
filesystem by a web server?


On Jun 14, 12:19 am, Min <mini...@gmail.com> wrote:
> I'm heavily using the GridFS to store files as a distribute file storage. It
> works excellent.
> But it doesn't mean it is a silver bullet as a distribute file storage.
>
> IMHO, storing files to GridFS and accessing the files through multiple web
> servers could be good approach. Also to boost performance, a http cache
> could reduce accessing to mongodb every time.
> I've also tested nginx-gridfs, its a inspiring project. I don't know how big
> files your service should support. For small files, it will work very well
> but would have poor performance for bigger files.
> Nginx is asynchronous server and nginx-gridfs uses blocking mongoding
> C-library. If it is not re-written with Nginx upstream or a similar way, it
> might have a concurrency issue at higher requests.
>
> Thanks
>
> 2011/6/14 Andrew Armstrong <phpla...@gmail.com>
>
>
>
>
>
>
>
> > A problem with having a distributed file system (instead of just using
> > built-in mongo sharding/replication) is that you can only have the one
> > physical server serving requests (concurrent access to the same files
> > isn't allowed).
>
> > That means you will sometime reach a limit as to what a single server
> > can handle (# of requests/sec etc) at which point you can't do
> > anything about it.
>
> > Using sharding/replication fixes this problem by letting you have
> > independent machines and isolating points of failure.
>
> > On Jun 14, 11:05 am, EdemilsonLima <puls...@gmail.com> wrote:
> > > I was thinking about this. Maybe using MongoDB on top of a distributed
> > > file system will not work if we want to run multiple daemons over the
> > > same database, because of file locking. The best feature of MongoDB is
> > > to run paralell queries over many nodes. We can't give up of this! :)
>
> > > About storing files in MongoDB I am not sure about using it for
> > > images, since is much faster to Apache load images directly from the
> > > file system than run a script to get each image. Think about a gallery
> > > page for example, the overhead it could cause (well, we still have
> > > options like cache or Ngix plugin for GridFS). For other types of
> > > files (i.e. downloadable files) I think is much better to store it in
> > > the database.
>
> > > About moving the session to MongoDB I saw some performance tests
> > athttps://github.com/mostlygeek/MongoSessionthatMongoDB performs very
Reply all
Reply to author
Forward
0 new messages