Fault tolerance?

Korkman

unread,

Apr 4, 2011, 2:39:54 PM4/4/11

to bup-list

Hi,

it might be a bit early to ask this, but how fault tolerant is bup? We
had a power outage yesterday during backups (one that lasted much too
long, battery wouldn't last) and for a thrill I tried verifying one of
the affected backups. Instantly errors were reported about a zero-
sized .idx file. Removing it stopped the errors, but I fear the whole
backup set is now destroyed. I'm running a restore right now and it
didn't abort yet, so I'll go on with the verification. Is it in vain?

It should be noted that unexpected power outages bring up the worst in
filesystems and databases, thus potentially leading to the need to
restore from backup in the first place. I will work around the problem
by rotating two sets of backups for now. Bup reduces backup volume so
dramatically (really, the rolling checksum thing pays off), so keeping
multiple versions is cheap enough.

But maybe you can think of some atomic file operations / write-ahead
rollback journal in further development to make it rock-solid?

Greetings,

Pierre Beck

Brandon Smith

unread,

Apr 4, 2011, 2:44:01 PM4/4/11

to Korkman, bup-list

By bup's design, any new backup is additive only. Therefor all of your
previous snapshots are completely valid even though the in progress one
is not. For that matter, if you're doing a large incremental backup
(more than one .pack worth) that backup's progress up to the end of the
last complete pack is completely valid even if there is a fault.

--Brandon

Avery Pennarun

unread,

Apr 4, 2011, 3:02:07 PM4/4/11

to Brandon Smith, Korkman, bup-list

On Mon, Apr 4, 2011 at 2:44 PM, Brandon Smith <fre...@reardencode.com> wrote:
> By bup's design, any new backup is additive only. Therefor all of your
> previous snapshots are completely valid even though the in progress one
> is not. For that matter, if you're doing a large incremental backup
> (more than one .pack worth) that backup's progress up to the end of the
> last complete pack is completely valid even if there is a fault.

Exactly.

There are some situations I can think of that might cause minor
(recoverable) problems. For example, if the system crashes before all
the files are synced, we might end up with pack #7 being slightly
corrupt but pack #8 being valid. Then deleting pack #7 would result
in missing objects that you might not immediately discover.

Also, if a .pack file is invalid, bup might not notice it and might
keep using the .idx anyway, which would result in certain missing
objects not being backed up as they should be. I haven't tested that
very carefully.

Someday, we should improve 'bup fsck' to be able to check stuff like
that and throw away bad or mismatched packs, just to be safe.
However, this situation is pretty contrived and should be very rare,
even in the case of a crash.

Maybe we should fsync() more frequently, like after writing a .pack or
.idx file. I really hate fsync() though, since its performance is so
terrible on ext3. (fsync() on any file ends up being basically the
same as a full sync() of the entire filesystem. Barf.)

If you're worried about another form of corruption - ie. silent loss
of data in .pack files due to hard drive sector errors - you should
consider using 'bup fsck -g'.

Have fun,

Avery

Korkman

unread,

Apr 5, 2011, 7:37:05 AM4/5/11

to bup-list

So I can simply delete the last files created during a crashed backup
(.pack, .idx and .midx) and be good?

Brandon Smith

unread,

Apr 5, 2011, 10:51:34 AM4/5/11

to Korkman, bup-list

No need -- during writing, bup writes to temporary filenames. You can
remove some hidden '.' files that were created tho.

Korkman

unread,

Apr 5, 2011, 11:51:45 AM4/5/11

to bup-list

From what I understand, a temporary .idx file was moved to objects/
pack but not written to disk yet, because there's no fsync in between.
Therefore, 0-byte size. That's an easy case, though. I'm really
getting paranoid about my .pack files not being written completely in
crash situations. bup fsck found an invalid pack file from a crash a
week earlier ("git verify failed"), so that backup set is most likely
invalid from that point on, isn't it?

I think a very simple and safe procedure is deleting all .idx, .pack
and .midx files created at and after that day?

Avery Pennarun

unread,

Apr 5, 2011, 12:03:44 PM4/5/11

to Korkman, bup-list

On Tue, Apr 5, 2011 at 11:51 AM, Korkman <goo...@pierre-beck.de> wrote:
> From what I understand, a temporary .idx file was moved to objects/
> pack but not written to disk yet, because there's no fsync in between.
> Therefore, 0-byte size. That's an easy case, though. I'm really
> getting paranoid about my .pack files not being written completely in
> crash situations. bup fsck found an invalid pack file from a crash a
> week earlier ("git verify failed"), so that backup set is most likely
> invalid from that point on, isn't it?
>
> I think a very simple and safe procedure is deleting all .idx, .pack
> and .midx files created at and after that day?

Just watch out for one thing: the git commit that your branch points
at. It may be that the final commit of a particular backup is valid
right now, but if you delete old (valid) packs, it might become
invalid and you might need to re-point the branch at a commit that
still exists.

So basically:

cd ~/.bup
git rev-list name-of-my-branch >my-commits
...delete some packs...
git log name-of-my-branch
# if it fails, try each of the branches in my-commits in turn.
# let's call the first one that matches $TOPCOMMIT
git branch -d name-of-my-branch
git branch name-of-my-branch $TOPCOMMIT

Then you should be okay.

So basically, deleting the packs is fine, but you want to be careful
not to lose track of the part of history you *didn't* delete.

We could probably put more work into bup fsck, autorecovery, and so
on. But since it's generally *possible* (although a bit of work like
the above) to recover from problems, it hasn't been a big deal.