Thanks your right.
I've added a comment at
http://jira.mongodb.org/browse/SERVER-1998
outlining a possible solution to cluster backups. Thoughts?
---
The problem as I understand it now is that taking a backup of Shard #1
and restoring its backup an hour later (if it were to crash) would not
be good enough:
a) Chunks originally on this shard (and in the backup) may not exist
on the shard anymore (balancer moved them to another shard)
b) Chunks allocated to this shard were never in the previous backup,
but were since moved to this shard. Their data is however available in
a different shards backup (but that shard no longer owns these
chunks)
Thinking out loud about this; what about a backup system where:
a) You ask the mongo cluster to perform a backup system wide. Its an
operation that happens for the entire cluster.
b) Each shard writes its local backup to a specific directory (or
uploads to S3, whatever) independently in parallel
Assuming then after this backup is taken; Shard #1 catches on fire and
you need to restore Shard #1 after buying new servers:
a) You ask the mongo cluster to restore Shard #1
b) Mongod Shard #1 primary (for example) asks the config servers where
Shard #1's backup data should be, based on the latest cluster backup
available
c) Config servers tell Shard #1 primary:
* You need to download /backup/shard001/ data files (majority of your
shard's data is here)
* You need to also then grab a few chunks from /backup/shard002/
because between the time the last backup was taken, and when Shard #1
caught fire, you had some more chunks allocated to you that aren't in
your original Shard #1 backup (but are available in shard002's backup
files - the original chunk owner)
* You need to ignore restoring chunks XXX from your backup files
because they were since given a new owner on a different shard, and so
you don't need to try restoring them as you don't own them anymore
Basically; let the mongo cluster be backup-aware and know how to
restore data even if chunks have since moved around.
You just need to make sure enough backup space is available.
---
Regards
- Andrew