[PATCH 3/6] fsck: report lingering pack-related files

0 views
Skip to first unread message

Rob Browning

unread,
Apr 16, 2026, 8:02:37 PMApr 16
to bup-...@googlegroups.com
When asked to scan all packfiles (i.e. when no pack arguments are
given), notice and report any pack-related files with no corresponding
pack-HASH.pack. bup-gc used to leave behind pack-related
files (e.g. par2 files).

For now, define pack-related to mean pack- followed by 40 hex digits
or that followed by a dot and anything else.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---

Pushed to main.

Documentation/bup-fsck.1.md | 6 ++
lib/bup/cmd/fsck.py | 30 +++++++--
note/main.md | 6 ++
test/ext/test-fsck | 128 ++++++++++++++++++++++--------------
4 files changed, 115 insertions(+), 55 deletions(-)

diff --git a/Documentation/bup-fsck.1.md b/Documentation/bup-fsck.1.md
index 5dd4fe79..9e47b6dc 100644
--- a/Documentation/bup-fsck.1.md
+++ b/Documentation/bup-fsck.1.md
@@ -45,6 +45,12 @@ need to carefully consider redundancy (such as using RAID
for multi-disk redundancy, or making off-site backups for
site redundancy).

+When asked to examine all packfiles (i.e. when no *packfile*s are
+specified), fsck will report any files that appear to be related to a
+pack file that no longer exists. Previous versions of `bup gc` can
+cause this to happen because they did not remove all of the related
+files when removing a pack file.
+
# OPTIONS

-r, \--repair
diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 14963129..496edb37 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -4,7 +4,7 @@ from os.path import join
from shutil import copy2, rmtree
from subprocess import DEVNULL, PIPE, run
from tempfile import mkdtemp
-import errno, glob, os, sys
+import errno, glob, os, re, sys

from bup import options, git
from bup.compat import argv_bytes
@@ -14,13 +14,32 @@ from bup.helpers \
from bup.io import byte_stream, path_msg


-par2_ok = 0
opt = None

def debug(s):
if opt.verbose > 1:
log(s)

+def report_stray_pack_related_files(repo, pack_paths):
+ # For now only look for stray files we might have created (SHA1-related)
+ pack_rx = re.compile(br'(?:^|/)pack-([a-f0-9]{40})\.pack$')
+ pack_oidxs = set()
+ for path in pack_paths:
+ m = pack_rx.search(path)
+ if not m:
+ continue
+ pack_oidxs.add(m.group(1))
+
+ # Pack-related is currently pack-40HEX or pack-40HEX.*
+ pack_related_rx = re.compile(br'(?:^|/)pack-([a-f0-9]{40})(?:\..*)?$')
+ for path in glob.glob(repo + b'/objects/pack/pack-*'):
+ m = pack_related_rx.search(path)
+ if not m or m.group(1) in pack_oidxs:
+ continue
+ log(f'No pack file for {path_msg(os.path.basename(path))}\n')
+
+par2_ok = 0
+
def par2_setup():
global par2_ok
try:
@@ -331,12 +350,13 @@ def main(argv):
for stem in pack_stems:
if not stem.endswith(b'.pack'):
o.fatal(f'packfile argument {path_msg(stem)} must end with .pack')
+ pack_stems = [x[:-5] for x in pack_stems]
else:
debug('fsck: No filenames given: checking all packs.\n')
git.check_repo_or_die()
- pack_stems = glob.glob(git.repo(b'objects/pack/*.pack'))
-
- pack_stems = [x[:-5] for x in pack_stems]
+ pack_files = glob.glob(git.repo(b'objects/pack/*.pack'))
+ report_stray_pack_related_files(git.repo(), pack_files)
+ pack_stems = [x[:-5] for x in pack_files]

sys.stdout.flush()
out = byte_stream(sys.stdout)
diff --git a/note/main.md b/note/main.md
index 047bab15..a19176b5 100644
--- a/note/main.md
+++ b/note/main.md
@@ -51,6 +51,12 @@ May require attention
that the `.idx` files can be (and now are) trivially regenerated
from the packfiles during `--repair` when needed.

+* `bup gc` now removes all pack file related files
+ (e.g. `pack-HASH.*`) when removing a packfile. Previously it did
+ not, leaving behind, for example `bup-fsck` generated recovery
+ files. In addition, `bup fsck` will report lingering files when
+ asked to scan the entire repository,
+
* Some prior exit statuses of 1 have been changed to a different
non-zero value. `bup` is migrating away from exiting with status 1
for anything other than "false". This is used by commands like
diff --git a/test/ext/test-fsck b/test/ext/test-fsck
index c290759a..179a0ab5 100755
--- a/test/ext/test-fsck
+++ b/test/ext/test-fsck
@@ -1,5 +1,6 @@
#!/usr/bin/env bash
. ./wvtest-bup.sh || exit $?
+. ./test/lib/btl.sh || exit $?

set -o pipefail

@@ -67,58 +68,85 @@ if ! bup fsck --par2-ok; then
set +x
WVPASSNE 0 "$rc"
WVPASSNE 1 "$rc"
-else
- bup fsck --quick -rvv -j9;
- WVPASSEQ 1 $?
-
- git_idxs=("$BUP_DIR"/objects/pack/pack-*.idx)
- some_idx="${git_idxs[0]}"
- some_idx="${some_idx%.idx}.par2"
-
- vols=("$BUP_DIR"/objects/pack/pack-*.vol*.par2)
- some_vol="${vols[0]}"
-
- WVPASS cp -p "$some_idx" some-pack.par2
- WVPASS cp -p "$some_vol" some-pack.vol.par2
-
- WVSTART 'fsck rejects empty par2 index files'
- WVPASS echo -n > "$some_idx"
- WVFAIL bup fsck -v
- WVPASS test -e "$some_idx" -a ! -s "$some_idx"
- WVFAIL bup fsck -vr
- WVPASS test -e "$some_idx" -a ! -s "$some_idx"
- WVFAIL bup fsck -vg
- WVPASS test -e "$some_idx" -a ! -s "$some_idx"
- WVPASS cp -p some-pack.par2 "$some_idx"
-
- WVSTART 'fsck rejects empty par2 vol files'
- WVPASS echo -n > "$some_vol"
- WVFAIL bup fsck -v
- WVPASS test -e "$some_vol" -a ! -s "$some_vol"
- WVFAIL bup fsck -vr
- WVPASS test -e "$some_vol" -a ! -s "$some_vol"
- WVFAIL bup fsck -vg
- WVPASS test -e "$some_vol" -a ! -s "$some_vol"
- WVPASS cp -p some-pack.vol.par2 "$some_vol"
-
- # This must do "too much" damage. Currently par2 is invoked with
- # -c200, which should allow up to 200 damaged "blocks", but since
- # we don't specify the block size, it's dynamically computed.
- # Even if we did specify a size, the actual size appears to be
- # affected by the input file sizes, and the specific behavior
- # doesn't appear to be documented/promised -- see par2
- # comandline.cpp. Also worth noting that bup damage's output is
- # currently probabilistic, so it might not actually damage any
- # given byte. For now, just try to overdo it -- randomly change
- # (or not 1/256th of the time) 600 evenly spaced bytes in each
- # pack file.
- WVPASS bup damage "$BUP_DIR"/objects/pack/*.pack -n600 -s1 --equal -S0
- WVFAIL bup fsck
-
- WVEXPRC '[!01]' bup fsck -rvv # too many errors to be repairable
- WVEXPRC '[!01]' bup fsck -r # too many errors to be repairable
+ WVPASS cd "$top"
+ WVPASS rm -rf "$tmpdir"
+ exit 0
fi

+bup fsck --quick -rvv -j9
+WVPASSEQ 1 $?
+
+git_idxs=("$BUP_DIR"/objects/pack/pack-*.idx)
+some_idx="${git_idxs[0]}"
+some_idx="${some_idx%.idx}.par2"
+
+vols=("$BUP_DIR"/objects/pack/pack-*.vol*.par2)
+some_vol="${vols[0]}"
+
+WVPASS cp -p "$some_idx" some-pack.par2
+WVPASS cp -p "$some_vol" some-pack.vol.par2
+
+WVSTART 'fsck rejects empty par2 index files'
+WVPASS echo -n > "$some_idx"
+WVFAIL bup fsck -v
+WVPASS test -e "$some_idx" -a ! -s "$some_idx"
+WVFAIL bup fsck -vr
+WVPASS test -e "$some_idx" -a ! -s "$some_idx"
+WVFAIL bup fsck -vg
+WVPASS test -e "$some_idx" -a ! -s "$some_idx"
+WVPASS cp -p some-pack.par2 "$some_idx"
+
+WVSTART 'fsck rejects empty par2 vol files'
+WVPASS echo -n > "$some_vol"
+WVFAIL bup fsck -v
+WVPASS test -e "$some_vol" -a ! -s "$some_vol"
+WVFAIL bup fsck -vr
+WVPASS test -e "$some_vol" -a ! -s "$some_vol"
+WVFAIL bup fsck -vg
+WVPASS test -e "$some_vol" -a ! -s "$some_vol"
+WVPASS cp -p some-pack.vol.par2 "$some_vol"
+
+# This must do "too much" damage. Currently par2 is invoked with
+# -c200, which should allow up to 200 damaged "blocks", but since
+# we don't specify the block size, it's dynamically computed.
+# Even if we did specify a size, the actual size appears to be
+# affected by the input file sizes, and the specific behavior
+# doesn't appear to be documented/promised -- see par2
+# comandline.cpp. Also worth noting that bup damage's output is
+# currently probabilistic, so it might not actually damage any
+# given byte. For now, just try to overdo it -- randomly change
+# (or not 1/256th of the time) 600 evenly spaced bytes in each
+# pack file.
+WVPASS bup damage "$BUP_DIR"/objects/pack/*.pack -n600 -s1 --equal -S0
+WVFAIL bup fsck
+
+WVEXPRC '[!01]' bup fsck -rvv # too many errors to be repairable
+WVEXPRC '[!01]' bup fsck -r # too many errors to be repairable
+
+
+WVSTART "fsck detects orphaned pack-related files"
+WVPASS rm -rf "$BUP_DIR"
+WVPASS bup init
+WVPASS bup index src
+WVPASS bup save -n lib --strip "$(pwd)" src/var/lib
+WVPASS bup save -n doc --strip "$(pwd)" src/var/doc
+packs=($(cd bup/objects/pack/ && ls -rt *.pack))
+doc_pack="${packs[1]}"
+pack_stem="${doc_pack%.pack}"
+WVPASS bup rm --unsafe doc
+WVPASS test -e bup/objects/pack/"$doc_pack"
+WVPASS bup gc --unsafe
+WVPASS test ! -e bup/objects/pack/"$doc_pack"
+WVPASS touch bup/objects/pack/"$pack_stem"
+WVPASS touch bup/objects/pack/"$pack_stem."
+WVPASS touch bup/objects/pack/"$pack_stem.lingering"
+WVPASS err-to log bup fsck
+WVPASS grep -E '^No pack file for ' log | WVPASS sort > lingering
+WVPASSEQ "$(< lingering)" \
+"No pack file for $pack_stem
+No pack file for $pack_stem.
+No pack file for $pack_stem.lingering"
+

WVPASS cd "$top"
WVPASS rm -rf "$tmpdir"
--
2.47.3

Reply all
Reply to author
Forward
0 new messages