Mongoexport with version control for backups

132 views
Skip to first unread message

Peter T

unread,
Apr 4, 2010, 4:49:15 AM4/4/10
to mongodb-user
Hi there,

(New Mongodb user here).

I'm interested in keeping a production database under version control
for development and backup purposes. The database itself will probably
be somewhere on the order of 10 to 30 Gbyte, so the space used for
versioned backups has the potential to become significant quickly.

My first thought was to script a mongodump, then keep the dump under
version control. My problem with that is that, since the dumps are
binary format, I'd imagine that most VCSs would lose much efficiency
without the ability to do proper deltas (?).

The obvious alternative is to script a mongoexport of all collections,
and then keep those under version control. That'll definitely maximize
the delta efficiency, but I'm wondering how idiomatic it is for
Mongodb to use the export utility in this way.

Is this a reasonable plan? Is the export utility designed to handle
large, hot exports like this? Are there any other obvious problems
with this method that I'm missing? Or maybe just a better suggestion
for version control of a Mongo database?


Would really appreciate any feedback! Thanks.

- Peter

Eliot Horowitz

unread,
Apr 4, 2010, 9:33:29 AM4/4/10
to mongod...@googlegroups.com
mongoexport -> version control should work fine.
What's the end goal, just being able to re-create a snapshot at any time?
You could also use a snapshotting filesystem (lvm or something) and
use fsync+lock.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Peter T

unread,
Apr 4, 2010, 10:13:03 AM4/4/10
to mongodb-user
Hi Eliot,

Thanks for the quick response.

> What's the end goal, just being able to re-create a snapshot at any time?

Basically, yes. The software architecture will be changing rapidly, so
it'll be helpful to be able to pair database snapshots with software
releases for development/testing. Actually, the schema-less nature of
Mongo was one of the reasons it was attractive for the project.

> You could also use a snapshotting filesystem (lvm or something) and
> use fsync+lock.

That's probably overkill for my purposes but I'll look into it to be
sure: thanks for the suggestion.


Cheers!

(And apologies for the double post)

GVP

unread,
Apr 5, 2010, 12:44:03 AM4/5/10
to mongodb-user
Hey @Peter in your first post you made it clear that space was a
concern. But you also asked about "version control" on a structure
that has basically no DDL statements (except ensureIndex).

So the big question here is "what are you versioning?".

It sounds like you really just want a DB backup at a point in time,
but you've just stated that you can't just do this b/c space is a
concern.

So you really have the following options:
1. Build a dev DB and "snapshot" that DB. It should take less space
and cover your requirements, assuming that you have a testing
framework.
2. Hope that SVN can diff the backup file with a really small update.
3. Change the outputs of your builds. At the end of every release
build, create a VM of your setup. In some ways this is less efficient
than SVN diffs, b/c it takes more space. OTOH it's great for the type
of "versioning" that you want.
Based on your description, you really just want "the world as of build
X". And that is best accomplished by outputting a whole VM for each
released build. Typically this just goes on a drive and you run a
compression tool on the build
overnight. With a good compression tool, like 7-zip, Mongo should
shrink well and you'll have a good version of builds. It may take more
space, but it should also solve some problems.

Peter T

unread,
Apr 5, 2010, 7:46:53 AM4/5/10
to mongodb-user
Hi GVP,

> Hey @Peter in your first post you made it clear that space was a
> concern. But you also asked about "version control" on a structure
> that has basically no DDL statements (except ensureIndex).

My point was that with the spaced used by the number of snapshots I'm
going to be needing (possibly dozens a day) would become excessive if
not for a decent ability to delta between snapshots. This may have
been a worry if using a VCS to track binary Mongodb dumps.

> So the big question here is "what are you versioning?".

I'd be "versioning" the database exports/snapshots - i.e. using a VCS
simply as a convenient way of producing deltas between snapshots and
keeping them organised and easily accessible.

This way I can develop against various experimental source+db branch
pairs at once, and eventually merge changes if/when desired. This
becomes important since each experimental database branch will be
accepting live user input during development. (Sorry if that's a bit
verbose - I'm on my way out and am struggling to phrase this clearly
^^).

Eliot basically addressed the question of whether this is possible or
not by stating that the export utility should be up to the task. If it
is, I'm quite happy with the simple versioning approach. Everything
you described would do the job also, of course: I'm just partial to
the use of VCS where it's possible.

I hope this makes sense?

Ankur

unread,
Apr 5, 2010, 11:01:48 AM4/5/10
to mongodb-user
I was intrigued by this so I tried a mongoexport for our data this
weekend. It took about 70 minutes that takes only 10 minutes to do a
mongodump, so I didn't think the mongoexport was practical for us.

For our purposes I will be keeping 7 days of backups (Sun-Sat) using
Mongodump and a cron script.

Versioning wise I don't think even the best VCS will be up to the
task of keeping small diffs of 10gb dump files (our 14 million records
were at least)

Ankur

Peter T

unread,
Apr 5, 2010, 4:00:26 PM4/5/10
to mongodb-user
Hi Ankur,

> It took about 70 minutes that takes only 10 minutes to do a
> mongodump, so I didn't think the mongoexport was practical for us.

Hmm, yes - that's the kind of problem I was hoping mongoexport
wouldn't have :)

How large was this database? Do you have any idea if exporting is
somehow also more computationally expensive than dumping? Or is it
maybe just from the extra IO work, given that the exports are probably
larger in uncompressed form than the dumps? I'm guessing it'll also
make a difference if you're writing the exports to the same drive
being read from...


> Versioning wise   I don't think even the best VCS will be up to the
> task of keeping small diffs of 10gb dump files (our 14 million records
> were at least)

Actually, I don't expect the VCS would be a problem, so long as the
dumps/exports are plaintext. In my (limited) experience, Git can
handle pretty large repositories as long as individual files don't get
too large (and plaintext files can always be split easily enough).

I've heard that Perforce is also quite capable with larger repos, but
I've never used it so can't say.

Eliot Horowitz

unread,
Apr 5, 2010, 4:01:58 PM4/5/10
to mongod...@googlegroups.com
exporting to json is more expensive that bson since it has to transform.
so it just depends on your data size and which factor you care more about

Reply all
Reply to author
Forward
0 new messages