backup and restore on a sharded and replica set

Reinaldo

unread,

Jan 26, 2011, 5:20:17 PM1/26/11

to mongodb-user

After much reading I am not 100% clear on how to go about this.

One of our environment is hosting a number of databases for different
purposes, some of these databases will grow in the 1TB+ size
(multitenant environment). We found a pretty good tool to backup these
big data files from the filesystem that is fast to backup and fast to
"restore".

What is the proper restore process ?

I am thinking that just putting the restored files on a replica member
and make that the master will not work, as I need to make the other
members consistent, as far as I know there is no way to re-sync just
one database ?

Do I need to restore (replace the data files on the drive) on a
primary ? that would require downtime

All of this gets complicated if the database has sharded collections,
as I assume the backups must be from the same time from all shards ?

How does this get compounded with the oplog ? oplog is for all
databases, but I am restoring just one of the databases.

We are looking for a solution where we get no downtime for the cluster
even if one of the databases is being restored.

Ideally is not something like: restore on a slave, make it master, and
then resync all other members from scratch for all databases as that
will take too long on a multi TB database. Although I can see
something like that working

Eliot Horowitz

unread,

Jan 26, 2011, 10:28:32 PM1/26/11

to mongod...@googlegroups.com

Take a lok at: http://www.mongodb.org/display/DOCS/Backing+Up+Sharded+Cluster

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Reinaldo

unread,

Jan 26, 2011, 11:40:47 PM1/26/11

to mongodb-user

Based on that page: restoring a single collection or a single database
requires downtime on the whole cluster is that correct ? I need to
find a solution where we can do this without full downtime, i,e, on a
multi-tenant environment, the ability to restore a single database or
maybe a single collection without having to stop all servers..

On Jan 26, 7:28 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Take a lok at:http://www.mongodb.org/display/DOCS/Backing+Up+Sharded+Cluster
>

Eliot Horowitz

unread,

Jan 26, 2011, 11:45:56 PM1/26/11

to mongod...@googlegroups.com

That's for a full backup/recovery.

Can you describe exactly what you want to be able to do?

Reinaldo

unread,

Jan 27, 2011, 12:32:29 AM1/27/11

to mongodb-user

Setup:

- Mongo cluster composed of 3 shards each one with 3 members on the
replica set,
- 150 databases
- Each database with anywhere between 4 to 50 collections
- Some of those databases will run into the TB size, not all of them,
but some.

We are looking for a solution to backup/restore a single one of those
databases without causing downtime to the other 149,
Also good a solution that allows to restore a single collection for a
single DB

Mongoexport/Mongodump/mongorestore/mongoimport will take a pretty long
time for a 1TB DB based on the current estimations we've made and
compared to backing up the file system so we are trying to figure out
the best way to do this.

On Jan 26, 8:45 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> That's for a full backup/recovery.
>
> Can you describe exactly what you want to be able to do?
>

Eliot Horowitz

unread,

Jan 27, 2011, 12:41:21 AM1/27/11

to mongod...@googlegroups.com

To do that efficiently is tricky.
You can do fsync+locks on a slave to get snapshot.
There is no way to recover a single db without some downtime.
The best way would be to shut all the nodes in the set down and move
the old files for that db in place.
You should ensure all the slaves are caught up first.
Haven't actually tried it, so you should test thoroughly.

GVP

unread,

Jan 27, 2011, 3:57:24 AM1/27/11

to mongodb-user

@Reinaldo:

I don't want to override Eliot completely here, but I'm really stuck
on this line:

> We are looking for a solution to backup/restore a single one of those
databases without causing downtime to the other 149

How did you end up with just one of the 149 DBs being down?

The typical strategy for backing up a cluster + replica set is to take
down some slaves and copy data.

So you simply turn off mongod on a replica (preferably a slave) within
each cluster. Once it's down, you take a backup by copying/zipping all
of the DB files (dbname.*). When the copy is done, you restart the
mongod on those boxes and let them catch-up.

So how do you do a restore with "no" (minimal down-time)?

Well, for starters, no restore is going to be "fast". If you have to
bring in a whole backup you're talking about a serious outage. That's
why we strongly suggest replica sets. Trying to write Terabytes of
data to a hard drive is going to be slow.

This goes back to my original questions
1. How did you lose just one DB? Did you take it down on purpose?
2. In the same measure, if you just lost a 1TB DB, how do you plan to
"quickly" write that data back to the drive without completely
destroying the machine?

#2 is really important b/c it's not a MongoDB problem, it would
basically affect any DB you had in there. You can't just dump a
massive amount of data on a drive without affecting the performance of
the machine on which you're dumping.

I have some "hacks" in mind here, but it's really going to depend on
the answers to those two questions.

Reinaldo

unread,

Jan 27, 2011, 12:40:34 PM1/27/11

to mongodb-user

My commends below.

On Jan 27, 12:57 am, GVP <gate...@gmail.com> wrote:
> @Reinaldo:
>
> I don't want to override Eliot completely here, but I'm really stuck
> on this line:
>
> > We are looking for a solution to backup/restore a single one of those
> > databases without causing downtime to the other 149
>
> How did you end up with just one of the 149 DBs being down?
>

In my case each one of those is a different customer, and the typical
case is that a customer does something bad/wrong that corrupts their
data (like I really did not mean to remove all entries from that
collection ...), so they ask us to recover a backup from a day ago for
example.

The other case I ran into last week while running on 1.6.3, one
database got corrupted, seems like a mix of group/drop/insert/remove
on a highly concurrent environment caused corruption on one database
(http://groups.google.com/group/mongodb-user/browse_thread/thread/
69dfde6b64628928/b01da353333f2a1f?) everything else was just fine.

> The typical strategy for backing up a cluster + replica set is to take
> down some slaves and copy data.
> So you simply turn off mongod on a replica (preferably a slave) within
> each cluster. Once it's down, you take a backup by copying/zipping all
> of the DB files (dbname.*). When the copy is done, you restart the
> mongod on those boxes and let them catch-up.

We do exactly that I think the tool is backupedge, it can do
incremental and full backups from the filesystem, so far pretty fast
(don't know the internal details)

> So how do you do a restore with "no" (minimal down-time)?
>
> Well, for starters, no restore is going to be "fast". If you have to
> bring in a whole backup you're talking about a serious outage. That's
> why we strongly suggest replica sets. Trying to write Terabytes of
> data to a hard drive is going to be slow.
>
>
> This goes back to my original questions
> 1. How did you lose just one DB? Did you take it down on purpose?

Answered above

> 2. In the same measure, if you just lost a 1TB DB, how do you plan to
> "quickly" write that data back to the drive without completely
> destroying the machine?

The backupedge seems to be able to do that fairly fast, don't have
numbers right now, but definitely faster than combinations of ssh/
rsync/zip/tar/gz/ftp etc....

> #2 is really important b/c it's not a MongoDB problem, it would
> basically affect any DB you had in there. You can't just dump a
> massive amount of data on a drive without affecting the performance of
> the machine on which you're dumping.

Agree, but I keep thinking if something like:
- disable slave_ok
- recover the files on a slave for example, so not a huge deal that is
affecting that member.
- make that the primary
- make the rest of the replica members re-sync that one database only.
- enable slave_ok

> I have some "hacks" in mind here, but it's really going to depend on
> the answers to those two questions.

I'd like to hear about the "hacks"

GVP

unread,

Jan 27, 2011, 5:17:48 PM1/27/11

to mongodb-user

@Ronaldo: I just noticed the extension to your e-mail address, it's a
small world. I remember your company in the "YouPlusPlus" days.

I think I have a better idea of what you're looking to do :)

> Agree, but I keep thinking if something like:
> - disable slave_ok
> - recover the files on a slave for example, so not a huge deal that is
affecting that member.
> - make that the primary
> - make the rest of the replica members re-sync that one database only.
> - enable slave_ok

I actually like this plan, it was my original thought. The only
difficulty I could find was step #2. I would just be worried that
"recovering" the files could cause that machine to simply be hammered.
If you have three shards, at least you get to spread the writes out.
But if you're trying to write in lots of data manually, you're
probably going to max out the disk I/O on those three boxes.

I'm mostly just worried about co-ordination in that case. You
definitely have a lot of potential spots where something could go
sideways.

If you're looking for "roll-back" functionality, I'm seeing a couple
of other options. (not official 10gen responses, just my best guesses)

Option #1: --slavedelay
---------
If you're worried about having a roll-back window, you can simply put
a --slavedelay on one of the servers.

Here's an article detailing exactly that (Kenny Gorman from
Shutterfly):
http://www.kennygorman.com/wordpress/?p=699

In some ways, this reduces capacity (if you're using this to write),
but it also provides you an emergency buffer.

This solution is nice because you don't actually lose the data on all
nodes. In theory this means that you don't have to wait for a giant
copy operation.

In theory, you can also minimize data loss by replaying the opLog up
to the point of failure.

Option #2: drop / re-import for small sets
----------
Obviously, this depends on the size of the DB and your available
throughput.

But if you have a small DB, the lowest down-time may simply be a full
restore. If the DB is only a couple of Gigs, this is probably much
quicker than playing around trying to get the data back in. It's low
complexity and it works within the current system.

This is probably best for the "non-transactional" data. Stuff that
gets entered once and then changed rarely.

Config DB:
----------
The only big caveat I have with any of these methods is fixing the
config DBs. If you use option #1 or your idea, you'll need to ensure
that the config DB is somehow still correct.

Right now, I don't know of an easy, automated way to do this. I
suspect that it involves some manual juggling. (selective restore from
backup)

Reply all

Reply to author

Forward