incremental backups for gridfs files

275 views
Skip to first unread message

Fabrizio Regini

unread,
Aug 17, 2010, 8:49:21 AM8/17/10
to mongo...@googlegroups.com
Hi,
In all our previous applications, my company used to store user files (uploaded images and files) in a directory on filesystem, which was daily backuped with incremental backups (BackupPC). We are able this way to keep an history of files for each website up to 6 months back.

Storing files in gridfs, I can no longer do this. Mongodumps produces I big file per collection, so I can't save differences on uploaded images.

Does anyone encountered the same problem? Or does anyone have any idea on how to solve this?


John Nunemaker

unread,
Aug 17, 2010, 12:21:56 PM8/17/10
to mongo...@googlegroups.com
I believe mongo recommends replication. I'd setup a slave or two and replicate everything to them. Then you can have as many backups as you want. You can even set a slave with a delay or what not.





--
You received this message because you are subscribed to the Google
Groups "MongoMapper" group.
For more options, visit this group at
http://groups.google.com/group/mongomapper?hl=en?hl=en

Fabrizio Regini

unread,
Aug 17, 2010, 1:51:01 PM8/17/10
to mongo...@googlegroups.com
I thought about that. But replicaiton does not save you from removing a collection by error. Similarly, delayed repllication does not save you from drop some data and recongnize the issue the day after. It's better for us to keep snapshots of data. 

Records in collections do not take up much space, so it's ok to keep full copies of database dumps. The problem comes when dumping fs.chunks. 

I found a solution dumping the whole database except the fs.chunks collection. Fs.chunks colleciton is dumped separately, recreating files on filesystem, so they can be backuped as usual. Restore seems to work fine. We save a lot of bandwidth this way. And keep BackupPC working as intended.

Don't want to go off-topic. Il post the script on github. 

Nathan Stults

unread,
Aug 17, 2010, 1:56:46 PM8/17/10
to mongo...@googlegroups.com

Please do send the link along when you have it worked out, I would love to see your solution.

Fabrizio Regini

unread,
Aug 18, 2010, 5:24:38 PM8/18/10
to mongo...@googlegroups.com
This is the monster I created to get gridfs files backed up as if they where on filesystem: 


One thing I wanted  was to keep the script in one file, and to make it work with the least dependencies possibile. 

This is what it does with the dump commnad 

1 - dump all collections but fs.chunks as usual, with mongodump 
2 - Gzip all bson files
3 - create a directory for grids files
4 - dump gridfs files to filesystem via Ruby, only if id+checksum did not change
5 - remove fs files that are no longer in the database

On restore 

1 - unzip bson files
2 - restore the database from bson files
3 - iterate over fs.files collection, read dumped file from filesystem and restore it into gridfs.

The operations above can be performed on a subset of databases. 

I learned a couple of things working on this script: 

1 - What :snapshot => true is for. I ended up in an endless loop on restore. Cursor was beeing fed with new records on every iteration.
2 - Authentication in mongodb is painful. You have to add admin user to each and every database in order to make this script work for the whole instance when --auth is on.  

Enjoy! 

hamin

unread,
Aug 19, 2010, 2:47:07 PM8/19/10
to MongoMapper
Fabrizio, would your incremental backup script be a good solution even
if you weren't using gridfs? It looks like a good incremental mongo
dump solution to me. But perhaps there is a better way to do that if
you're not dealing with storing files in gridfs?

Fabrizio Regini

unread,
Aug 19, 2010, 3:11:31 PM8/19/10
to mongo...@googlegroups.com
I'm not sure I understand what you mean. If you are not storing files into gridfs, that script is useless. If your database is made just of collection documents (not images or attachments too), it's way better to just use mongodump and mongorestore tools and save zipped versions of the whole database. I can't imagine to save each collection's document on the filesystem just to let rsync move the differences. It sounds crazy to me with the tiny amount of data I'm dealing right now. But It's technically possibile. The lack of join tables in mongodb makes the operation much easier than on relational databases.

hamin

unread,
Aug 20, 2010, 2:00:09 PM8/20/10
to MongoMapper
makes sense :)
Reply all
Reply to author
Forward
0 new messages