Important: bup gc (up through at least 0.33.4) may cause data loss

55 views
Skip to first unread message

Rob Browning

unread,
Oct 16, 2024, 1:33:33 PM10/16/24
to bup-...@googlegroups.com

What's wrong

- We've discovered a bup gc problem that can eventually cause the
repository to end up with "incomplete" subtrees after a gc.
Directories could be missing files, or files could be missing parts
(given that bup's deduplication also creates trees from files). The
gc itself does not (immediately) create the problem. It can just
set the stage for future additions to the repository (via bup save,
bup get, etc.) to be broken.

Are you affected?

- If you've never run gc, then as far as we know right now, no.
Otherwise, you can test saves by attempting to restore them, which
will fail at some point if you're affected, or you can test a save
without restoring it by "joining" (bup-join(1)) the save to
/dev/null. See "Testing a save via join" below for details.

Recommendations

- Don't run gc again until we fix it.

- If you might be affected, clear all your indexes, i.e. via "bup
index --clear" if you only use the default index. That will ensure
new saves won't be broken. (Starting a new repository will of
course also ensure new saves are fine.)

- Delete any index-cache midx files in your repositories, for example:

find ~/.bup/index-cache -name "*.midx" -exec rm '{}' +

The remote ~/.bup/index-cache is relevant for commands like "bup on
REMOTE save ...", and currently, "-d" changes the index-cache
location on the client side (e.g. for save -r).

- Test important saves, perhaps via restore or join (as described
below).

- If you have different repositories with the same or related data,
keep them. It's possible to restore missing data from any other
repository that still has it, and we intend to make that easier
soon.

Our current recovery plan

- Make it possible to scan the entire repository for damage, and
report it (in progress, mostly settled) so you'll know exactly
what's wrong.

- Fix bup gc (in progress, mostly settled).

- Provide a way to repair any affected saves by pruning them,
replacing missing references with placeholders so that the
repository will no longer be structurally broken (in progress).

- Provide a way to repopulate missing data from another repository
that still has it (in progress, mostly settled).

- We plan to fix 0.33.x first and make a 0.33.5 release, and then fix
the main branch.

Testing a save via join

- Run "bup ls -s BRANCH" on the relevant branch and note the HASH
corresponding to the save.

- Run "bup join HASH > /dev/null". If that succeeds, then that save
shouldn't be affected, but make sure not to run gc on the repository
again until we fix it.

- If "bup ls" won't work, you can still run "git --git-dir REPOSITORY
log" to find the save's "commit HASH", and then give "HASH:" to join
instead.

Further details:

- This may move later, but Johannes has provided an excellent
overview:
https://gist.github.com/jmberg/e0d98a944172380b050dae5d0b05e582

--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Johannes Berg

unread,
Oct 22, 2024, 1:55:21 PM10/22/24
to bup-...@googlegroups.com
In case anyone's paying attention :)

Turns out there's another, unrelated issue, in how 'bup get' works right
now, specifically with the order it transfers objects in.

The problem can only occur when 'bup get' is aborted, I've added a more
complete explanation here:

https://gist.github.com/jmberg/e0d98a944172380b050dae5d0b05e582#bug-3


> What's wrong
>
> - We've discovered a bup gc problem that can eventually cause the
> repository to end up with "incomplete" subtrees after a gc.

The effect of the aborted 'bup get' is the same really, you can end up
with incomplete subtrees.

> Are you affected?
>
> - If you've never run gc, then as far as we know right now, no.

But you might be affected by this bug if you've ever run 'bup get' and
it was aborted in the middle. If the abort led to the issue, then
annoyingly, due to the way it works now, 'bup get' won't be able to fix
it even if you run the aborted command again.

> Recommendations

> [...]

> - If you have different repositories with the same or related data,
> keep them. It's possible to restore missing data from any other
> repository that still has it, and we intend to make that easier
> soon.

This still stands I guess, don't delete the source :)

If you notice missing objects you can probably in the interim also use
git to transfer them from elsewhere, e.g. using 'git hash-object -w'.

Pure _fixes_ for these issues are pretty straight-forward, but putting a
broken repository into a good state again (as good as it can be) isn't.

johannes

Rob Browning

unread,
Nov 10, 2024, 3:29:02 PM11/10/24
to Johannes Berg, bup-...@googlegroups.com
Johannes Berg <joha...@sipsolutions.net> writes:

> https://gist.github.com/jmberg/e0d98a944172380b050dae5d0b05e582#bug-3

Here's a collected and augmented version of Johannes' explanations:

https://bup.github.io/issue/missing-objects.html

That's generated by markdown in the source repo (or will be once we post
our pending 0.33.x fixes for review and they're eventually merged).
(I think we're nearly there.)

> This still stands I guess, don't delete the source :)
>
> If you notice missing objects you can probably in the interim also use
> git to transfer them from elsewhere, e.g. using 'git hash-object -w'.
>
> Pure _fixes_ for these issues are pretty straight-forward, but putting a
> broken repository into a good state again (as good as it can be) isn't.

Indeed -- should be possible soon, but later. (Johannes and I have
started working on it, but have decided not to let it hold up the 0.33.5
release.)

Thanks
Reply all
Reply to author
Forward
0 new messages