Nico <
nmend...@gmail.com> writes:
> What are the consequences of a corrupted packfile? Does the corruption
> affect a single backup date?
Because bup deduplicates, each packfile only contains the fragments
(determined by the rolling checksum) that didn't already exist in some
other packfile in the repository that are needed by at the time of any
given "bup save".
That does mean that (assuming you haven't run "bup gc"), any problems
with that packfile should only affect saves run after it was created,
and the effect of the bad packfile is that future saves assume that all
of the objects bup thinks are in that packfile (all listed in the
corresponding .idx) don't need to be saved.
Now if the problem is that the bad packfile has missing objects, say it
was truncated, then I'd expect a future restore of any save that
depended on that packfile to crash when it tried to fetch one of the
missing objects. Alternately, the packfile might be corrupted, which
means the restore might work, but the results couldn't be entirely
trusted.
> At the very least, I would like to know what was lost, and then be
> able to "fix" the repo (so that it passes the verification step, and i
> can save recovery information once again).
With the current tools, there are a number of options, depending on how
far down the rabbit hole you want to go. I'll give an outline, and can
try to help with further detail, depending on how you feel you might
want to proceed.
In places where I refer to questionable.{pack,idx} below, I'm assuming
(and suggest) you work from a copy of the original
pack-4e15073a62edb135ce5183fdf42d80734af5d90d.idx etc., so we don't
accidentally make things worse. And you'll usually want both those
files, side-by-side in the same dir.
Also, I'd strongly recommend working on a copy of your repo, if you can
afford the space.
1) Examining the packfile and index
You can look at the contents of the broken pack/idx with various git
tools like this:
git show-index < questionable.idx
That will show all the hashes that the index claims are in the packfile,
which should be at least a subset (if not all) of the hashes that could
be missing or corrupted.
This should show a lot more information, and might give us some clue
about what's actually wrong:
git verify-pack -v questionable.idx
Another possibility
git index-pack -v -o dummy.idx questionable.pack
Of course you can delete the dummy.idx afterward -- just want to see
what the process says about the pack.
2) Examining the repo
You might want to try this first, but it's going to be a lot more
expensive than the commands above (i.e. it has to traverse *all* the
data in the repository):
git --git-dir=BUP_REPO fsck --name-objects
This might tell us more or less just what you wanted to know with
respect to which saves are broken.
I believe you can also pass final "ref" arguments to limit the check to
particular saves, i.e. you can give it branch names or individual save
commit hashes, which you can see those via "bup ... ls --hash ...".
3) Rolling back
One way to repair the repository would be to "roll back" all of your
save branches to just before that packfile's creation, then delete that
packfile and idx, and clobber the midx and bloom files.
To roll back, we'd need to figure out which save to roll back to, and
then use whatever git tools we like to reset the branch, e.g.
git --git-dir=BUP_REPO branch -f SAVE_BRANCH_NAME HASH_OF_SAVE_BEFORE_TROUBLE
You can see the save hashes via "bup ... ls --hash".
Then remove the packfile and index.
Next, while "bup gc" knows how to clobber midx and bloom, we don't yet
have top-level --clear commands, so instead you can do this:
rm REPO/objets/pack/midx-*
bup -d REPO bloom --force
bup -d REPO midx -a
Then just start making new saves again. You'd lose access to any saves
after that point, but it'd be "quick".
Afterward, all the other, newer packfiles would still be there, which
might speed the next save, and eventually, you could use bup rm/gc or
prune-older to drop any vestigial bits that were no longer needed.
I'd also suggest making sure "git verify" is happy with the repo (aside
from dangling blobs) at this point.
4) "Fixing" the repo more precisely than rolling back (riskier)
If the directories you're saving haven't changed a lot since the broken
packfile was created, then you might be able to move it (and its idx)
out of the objects/ dir, delete all the .midx files in the repo, run
"bup ... bloom --force", and then run new saves.
If those succeed, they may reintroduce some of the missing fragments
(objects) so that more of the older saves that were broken by the bad
packfile are now complete.
Basically, we'd have to find some way of re-saving all of the hashes
that a git-fsck still reports missing, for any saves you want to keep.
I'd also suggest making sure "git verify" is happy with the repo (aside
from dangling blobs) in the end.
5) Cleaning up
If you manage to get your repository in a state you're happy with, even
though there are still some broken saves. You can remove those via "bup
rm", so they won't affect future commands. This will likely just leave
some "dangling blobs" in your repository, but those are harmless, and
will nearly always exist if you've ever used "bup gc", since it doesn't
try to be precise by default.
Hope this helps