it might be a bit early to ask this, but how fault tolerant is bup? We
had a power outage yesterday during backups (one that lasted much too
long, battery wouldn't last) and for a thrill I tried verifying one of
the affected backups. Instantly errors were reported about a zero-
sized .idx file. Removing it stopped the errors, but I fear the whole
backup set is now destroyed. I'm running a restore right now and it
didn't abort yet, so I'll go on with the verification. Is it in vain?
It should be noted that unexpected power outages bring up the worst in
filesystems and databases, thus potentially leading to the need to
restore from backup in the first place. I will work around the problem
by rotating two sets of backups for now. Bup reduces backup volume so
dramatically (really, the rolling checksum thing pays off), so keeping
multiple versions is cheap enough.
But maybe you can think of some atomic file operations / write-ahead
rollback journal in further development to make it rock-solid?
By bup's design, any new backup is additive only. Therefor all of your previous snapshots are completely valid even though the in progress one is not. For that matter, if you're doing a large incremental backup (more than one .pack worth) that backup's progress up to the end of the last complete pack is completely valid even if there is a fault.
--Brandon
On 2011-04-04 (Mon) at 11:39:54 -0700, Korkman wrote:
> it might be a bit early to ask this, but how fault tolerant is bup? We > had a power outage yesterday during backups (one that lasted much too > long, battery wouldn't last) and for a thrill I tried verifying one of > the affected backups. Instantly errors were reported about a zero- > sized .idx file. Removing it stopped the errors, but I fear the whole > backup set is now destroyed. I'm running a restore right now and it > didn't abort yet, so I'll go on with the verification. Is it in vain?
> It should be noted that unexpected power outages bring up the worst in > filesystems and databases, thus potentially leading to the need to > restore from backup in the first place. I will work around the problem > by rotating two sets of backups for now. Bup reduces backup volume so > dramatically (really, the rolling checksum thing pays off), so keeping > multiple versions is cheap enough.
> But maybe you can think of some atomic file operations / write-ahead > rollback journal in further development to make it rock-solid?
On Mon, Apr 4, 2011 at 2:44 PM, Brandon Smith <free...@reardencode.com> wrote: > By bup's design, any new backup is additive only. Therefor all of your > previous snapshots are completely valid even though the in progress one > is not. For that matter, if you're doing a large incremental backup > (more than one .pack worth) that backup's progress up to the end of the > last complete pack is completely valid even if there is a fault.
Exactly.
There are some situations I can think of that might cause minor (recoverable) problems. For example, if the system crashes before all the files are synced, we might end up with pack #7 being slightly corrupt but pack #8 being valid. Then deleting pack #7 would result in missing objects that you might not immediately discover.
Also, if a .pack file is invalid, bup might not notice it and might keep using the .idx anyway, which would result in certain missing objects not being backed up as they should be. I haven't tested that very carefully.
Someday, we should improve 'bup fsck' to be able to check stuff like that and throw away bad or mismatched packs, just to be safe. However, this situation is pretty contrived and should be very rare, even in the case of a crash.
Maybe we should fsync() more frequently, like after writing a .pack or .idx file. I really hate fsync() though, since its performance is so terrible on ext3. (fsync() on any file ends up being basically the same as a full sync() of the entire filesystem. Barf.)
If you're worried about another form of corruption - ie. silent loss of data in .pack files due to hard drive sector errors - you should consider using 'bup fsck -g'.
From what I understand, a temporary .idx file was moved to objects/
pack but not written to disk yet, because there's no fsync in between.
Therefore, 0-byte size. That's an easy case, though. I'm really
getting paranoid about my .pack files not being written completely in
crash situations. bup fsck found an invalid pack file from a crash a
week earlier ("git verify failed"), so that backup set is most likely
invalid from that point on, isn't it?
I think a very simple and safe procedure is deleting all .idx, .pack
and .midx files created at and after that day?
On 5 Apr., 16:51, Brandon Smith <free...@reardencode.com> wrote:
On Tue, Apr 5, 2011 at 11:51 AM, Korkman <goo...@pierre-beck.de> wrote: > From what I understand, a temporary .idx file was moved to objects/ > pack but not written to disk yet, because there's no fsync in between. > Therefore, 0-byte size. That's an easy case, though. I'm really > getting paranoid about my .pack files not being written completely in > crash situations. bup fsck found an invalid pack file from a crash a > week earlier ("git verify failed"), so that backup set is most likely > invalid from that point on, isn't it?
> I think a very simple and safe procedure is deleting all .idx, .pack > and .midx files created at and after that day?
Just watch out for one thing: the git commit that your branch points at. It may be that the final commit of a particular backup is valid right now, but if you delete old (valid) packs, it might become invalid and you might need to re-point the branch at a commit that still exists.
So basically:
cd ~/.bup git rev-list name-of-my-branch >my-commits ...delete some packs... git log name-of-my-branch # if it fails, try each of the branches in my-commits in turn. # let's call the first one that matches $TOPCOMMIT git branch -d name-of-my-branch git branch name-of-my-branch $TOPCOMMIT
Then you should be okay.
So basically, deleting the packs is fine, but you want to be careful not to lose track of the part of history you *didn't* delete.
We could probably put more work into bup fsck, autorecovery, and so on. But since it's generally *possible* (although a bit of work like the above) to recover from problems, it hasn't been a big deal.