[PATCH 0/6] Continue fsck improvements on main

0 views
Skip to first unread message

Rob Browning

unread,
Jun 24, 2024, 7:12:51 PM (8 days ago) Jun 24
to bup-...@googlegroups.com
I cherry-picked all the recent fsck improvements from 0.33.x to main,
and these changes continue that work.

Proposed for main.

Rob Browning (5):
fsck: insist on par2 for --generate and --repair
fsck: have --quick check the .idx checksum too
fsck: rework the per-packfile logic in do_pack
fsck: allow par2 progress info on tty if verbose > 3
fsck: stop generating recovery info for .idx files

Documentation/bup-fsck.1.md | 9 +-
lib/bup/cmd/fsck.py | 202 +++++++++++++++++++-----------------
note/main.md | 23 ++++
test/ext/test-fsck | 10 ++
4 files changed, 144 insertions(+), 100 deletions(-)
create mode 100644 note/main.md

--
2.43.0

Rob Browning

unread,
Jun 24, 2024, 7:12:52 PM (8 days ago) Jun 24
to bup-...@googlegroups.com
Signed-off-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index e7600b0b7..4d191cdc0 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -60,8 +60,10 @@ def par2(action, args, verb_floor=0, cwd=None):
if _par2_parallel is None:
_par2_parallel = is_par2_parallel()
cmd = [b'par2', action]
- if opt.verbose >= verb_floor and not istty2:
+ if opt.verbose == verb_floor and not istty2:
cmd.append(b'-q')
+ elif opt.verbose > verb_floor and istty2:
+ pass
else:
cmd.append(b'-qq')
if _par2_parallel:
--
2.43.0

Rob Browning

unread,
Jun 24, 2024, 7:12:52 PM (8 days ago) Jun 24
to bup-...@googlegroups.com
This is a change in behavior. Previously we'd just print "skipped"
when asked to --generate and par2 wasn't available.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 2 ++
note/main.md | 14 ++++++++++++++
2 files changed, 16 insertions(+)
create mode 100644 note/main.md

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 36258ee97..0e03b2be6 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -244,6 +244,8 @@ def main(argv):
sys.exit(EXIT_TRUE if par2_ok else EXIT_FALSE)
if opt.disable_par2:
par2_ok = 0
+ if not par2_ok and (opt.generate or opt.repair):
+ log(f'error: cannot --generate or --repair without par2\n')

if extra:
extra = [argv_bytes(x) for x in extra]
diff --git a/note/main.md b/note/main.md
new file mode 100644
index 000000000..e3e58cf6d
--- /dev/null
+++ b/note/main.md
@@ -0,0 +1,14 @@
+Notable changes in main (incomplete)
+====================================
+
+May require attention
+---------------------
+
+* `bup fsck --generate` and `--repair` now exit immediately with an
+ error when a suitable par2 isn't available. Previously `--generate`
+ would just print "skipped".
+
+Thanks to (at least)
+====================
+
+...
--
2.43.0

Rob Browning

unread,
Jun 24, 2024, 7:12:52 PM (8 days ago) Jun 24
to bup-...@googlegroups.com
Previously it only checked packfiles.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 48 ++++++++++++++++++++++-----------------------
note/main.md | 3 +++
test/ext/test-fsck | 10 +++++++++-
3 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 0e03b2be6..5918477b3 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -2,7 +2,6 @@
from shutil import rmtree
from subprocess import DEVNULL, PIPE, run
from tempfile import mkdtemp
-from binascii import hexlify
from os.path import join
import glob, os, sys

@@ -146,29 +145,28 @@ def par2_verify(base):
def par2_repair(base):
return par2(b'repair', [b'--', base], verb_floor=2)

-def quick_verify(base):
- f = open(base + b'.pack', 'rb')
- f.seek(-20, 2)
- wantsum = f.read(20)
- assert(len(wantsum) == 20)
- f.seek(0)
- sum = Sha1()
- for b in chunkyreader(f, os.fstat(f.fileno()).st_size - 20):
- sum.update(b)
- if sum.digest() != wantsum:
- raise ValueError('expected %r, got %r' % (hexlify(wantsum),
- sum.hexdigest()))
-
-
-def git_verify(base):
- if not opt.quick:
- return run([b'git', b'verify-pack', b'--', base]).returncode
- try:
- quick_verify(base)
- except Exception as e:
- log('error: %s\n' % e)
- return 1
- return 0
+
+def trailing_and_actual_checksum(path):
+ with open(path, 'rb') as f:
+ f.seek(-20, 2)
+ trailing = f.read(20)
+ assert len(trailing) == 20
+ f.seek(0)
+ actual = Sha1()
+ for b in chunkyreader(f, os.fstat(f.fileno()).st_size - 20):
+ actual.update(b)
+ return trailing, actual.digest()
+
+
+def git_verify(stem, *, quick=False):
+ if not quick:
+ return run([b'git', b'verify-pack', b'--', stem]).returncode
+ for path in (stem + b'.idx', stem + b'.pack'):
+ exp, act = trailing_and_actual_checksum(path)
+ if exp != act:
+ log(f'error: expected {exp.hex()}, got {act.hex()}\n')
+ return EXIT_FAILURE
+ return EXIT_SUCCESS


def do_pack(base, last, par2_exists, out):
@@ -196,7 +194,7 @@ def do_pack(base, last, par2_exists, out):
else:
action_result = b'ok'
elif not opt.generate or (par2_ok and not par2_exists):
- gresult = git_verify(base)
+ gresult = git_verify(base, quick=opt.quick)
if gresult != 0:
action_result = b'failed'
log('%s git verify: failed (%d)\n' % (last, gresult))
diff --git a/note/main.md b/note/main.md
index e3e58cf6d..2b364d7ea 100644
--- a/note/main.md
+++ b/note/main.md
@@ -4,6 +4,9 @@ Notable changes in main (incomplete)
May require attention
---------------------

+* `bup fsck --quick` now checks the packfile indexes too. Previously
+ it only checked the packfiles.
+
* `bup fsck --generate` and `--repair` now exit immediately with an
error when a suitable par2 isn't available. Previously `--generate`
would just print "skipped".
diff --git a/test/ext/test-fsck b/test/ext/test-fsck
index 94428b274..7d3a59520 100755
--- a/test/ext/test-fsck
+++ b/test/ext/test-fsck
@@ -34,10 +34,18 @@ else
fi
WVPASS bup fsck -g
WVPASS bup fsck -r
+
+WVPASS chmod u+w "$BUP_DIR"/objects/pack/*.idx
+WVPASS bup damage "$BUP_DIR"/objects/pack/*.idx -n10 -s1 -S0
+WVFAIL bup fsck --quick
+WVFAIL bup fsck --quick --disable-par2
+
+WVFAIL bup fsck -r # should repair, but returns nonzero
+
WVPASS bup damage "$BUP_DIR"/objects/pack/*.pack -n10 -s1 -S0
WVFAIL bup fsck --quick
WVFAIL bup fsck --quick --disable-par2
-WVPASS chmod u+w "$BUP_DIR"/objects/pack/*.idx
+
WVPASS bup damage "$BUP_DIR"/objects/pack/*.idx -n10 -s1 -S0
WVFAIL bup fsck --quick -j4
WVPASS bup damage "$BUP_DIR"/objects/pack/*.pack -n10 -s1024 --percent 0.4 -S0
--
2.43.0

Rob Browning

unread,
Jun 24, 2024, 7:12:53 PM (8 days ago) Jun 24
to bup-...@googlegroups.com
Perhaps easier to follow, and now always runs both git and par2
verifications when possible since git verification is checking the
internal correctnes of the packfile, while par2 is just making sure
that whatever it was given during generate hasn't changed. If git
verify and par2 verify disagree, just exit with an error for now.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 95 ++++++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 48 deletions(-)

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 5918477b3..e7600b0b7 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -169,53 +169,52 @@ def git_verify(stem, *, quick=False):
return EXIT_SUCCESS


-def do_pack(base, last, par2_exists, out):
- code = 0
- if par2_ok and par2_exists and (opt.repair or not opt.generate):
- vresult = par2_verify(base)
- if vresult != 0:
- if opt.repair:
- rresult = par2_repair(base)
- if rresult != 0:
- action_result = b'failed'
- log('%s par2 repair: failed (%d)\n' % (last, rresult))
- code = rresult
- else:
- action_result = b'repaired'
- log('%s par2 repair: succeeded (0)\n' % last)
- # FIXME: for this to be useful, we need to define
- # the semantics, e.g. what's promised when we have
- # this and a competing error from another pack?
- code = 100
- else:
- action_result = b'failed'
- log('%s par2 verify: failed (%d)\n' % (last, vresult))
- code = vresult
- else:
- action_result = b'ok'
- elif not opt.generate or (par2_ok and not par2_exists):
- gresult = git_verify(base, quick=opt.quick)
- if gresult != 0:
- action_result = b'failed'
- log('%s git verify: failed (%d)\n' % (last, gresult))
- code = gresult
- else:
- if par2_ok and opt.generate:
- presult = par2_generate(base)
- if presult != 0:
- action_result = b'failed'
- log('%s par2 create: failed (%d)\n' % (last, presult))
- code = presult
- else:
- action_result = b'generated'
- else:
- action_result = b'ok'
+def attempt_repair(stem, base, out, *, verbose=False):
+ rc = par2_repair(stem)
+ if rc:
+ log(f'{path_msg(base)} par2 repair: failed ({rc})\n')
+ if verbose: out.write(base + b' failed\n')
else:
- assert(opt.generate and (not par2_ok or par2_exists))
- action_result = b'exists' if par2_exists else b'skipped'
- if opt.verbose:
- out.write(last + b' ' + action_result + b'\n')
- return code
+ log(f'{path_msg(base)} par2 repair: succeeded (0)\n')
+ if verbose: out.write(base + b' repaired\n')
+ rc = 100
+ return rc
+
+
+def do_pack(stem, par2_exists, out):
+ # Repair and generate are treated as optional actions, requested
+ # in addition to verification. Always validate the git checksums,
+ # and then run par2 validation if possible. The latter only tells
+ # us that the pack/idx files haven't changed since the par2
+ # generation.
+ base = os.path.basename(stem)
+ rc = git_verify(stem, quick=opt.quick)
+ if rc:
+ log(f'{path_msg(base)} git verify: failed ({rc})\n')
+ if opt.repair:
+ return attempt_repair(stem, base, out, verbose=opt.verbose)
+ if opt.verbose: out.write(base + b' failed\n')
+ return rc
+ if par2_ok and par2_exists:
+ rc = par2_verify(stem)
+ if rc:
+ log(f'{path_msg(base)} git verified, par2 failed ({rc}); bad par2 data?\n')
+ if opt.verbose: out.write(base + b' failed\n')
+ return rc
+ if opt.generate:
+ if par2_exists:
+ if opt.verbose: out.write(base + b' exists\n')
+ return EXIT_TRUE
+ rc = par2_generate(stem)
+ if rc:
+ log(f'{path_msg(base)} par2 create: failed ({rc})\n')
+ if opt.verbose: out.write(base + b' failed\n')
+ return rc
+ if opt.verbose: out.write(base + b' generated\n')
+ return EXIT_TRUE
+
+ if opt.verbose: out.write(base + b' ok\n')
+ return EXIT_TRUE


optspec = """
@@ -285,7 +284,7 @@ def main(argv):

if not opt.jobs:
assert par2_status != False
- nc = do_pack(stem, base, par2_status, out)
+ nc = do_pack(stem, par2_status, out)
# FIXME: is first wins what we really want (cf. repair's 100)
code = code or nc
count += 1
@@ -303,7 +302,7 @@ def main(argv):
else: # child
try:
assert par2_status != False
- sys.exit(do_pack(stem, base, par2_status, out))
+ sys.exit(do_pack(stem, par2_status, out))
except Exception as e:
log('exception: %r\n' % e)
sys.exit(99)
--
2.43.0

Rob Browning

unread,
Jun 24, 2024, 7:12:53 PM (8 days ago) Jun 24
to bup-...@googlegroups.com
Only generate recovery information for the packfiles so that all of
the recovery blocks and the recovery process only includes the data we
can't trivially recreate.

Automatically recreate the .idx files during --repair when needed.

Require any requested paths to end in .pack.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
Documentation/bup-fsck.1.md | 9 +++--
lib/bup/cmd/fsck.py | 65 ++++++++++++++++++++-----------------
note/main.md | 6 ++++
test/ext/test-fsck | 4 ++-
4 files changed, 51 insertions(+), 33 deletions(-)

diff --git a/Documentation/bup-fsck.1.md b/Documentation/bup-fsck.1.md
index aa00f0bec..cf8cc6ab3 100644
--- a/Documentation/bup-fsck.1.md
+++ b/Documentation/bup-fsck.1.md
@@ -9,12 +9,15 @@ bup-fsck - verify or repair a bup repository
# SYNOPSIS

bup fsck [-r] [-g] [-v] [\--quick] [-j *jobs*] [\--par2-ok]
-[\--disable-par2] [filenames...]
+[\--disable-par2] [packfile...]

# DESCRIPTION

-`bup fsck` is a tool for validating bup repositories in the
-same way that `git fsck` validates git repositories.
+`bup fsck` validates bup repositories much the way `git fsck`
+validates git repositories. When *packfile*s (which must end in
+.pack) are specified, pack-related operations are limited to those
+files, otherwise all packfiles in the current repository are
+considered.

It can also generate and/or use "recovery blocks" using the
`par2`(1) tool (if you have it installed). This allows you
diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 4d191cdc0..9bafeb72c 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -77,11 +77,9 @@ def par2_generate(stem):
# files when interrupted by C-c.
# cf. https://github.com/Parchive/par2cmdline/issues/84
with temp_dir(dir=parent, prefix=(base + b'-bup-tmp-')) as tmpdir:
- idx = base + b'.idx'
pack = base + b'.pack'
- os.symlink(join(b'..', idx), join(tmpdir, idx))
os.symlink(join(b'..', pack), join(tmpdir, pack))
- rc = par2(b'create', [b'-n1', b'-c200', b'--', base, pack, idx],
+ rc = par2(b'create', [b'-n1', b'-c200', b'--', base, pack],
verb_floor=2, cwd=tmpdir)
if rc == 0:
# Currently, there should only be two files, the par2
@@ -89,16 +87,16 @@ def par2_generate(stem):
# defensive for the generation (keep whatever's produced).
p2_idx = base + b'.par2'
p2_vol = base + b'.vol000+200.par2'
- expected = frozenset((idx, pack, p2_idx, p2_vol))
+ expected = frozenset((pack, p2_idx, p2_vol))
for tmp in os.listdir(tmpdir):
if tmp not in expected:
log(f'Unexpected par2 file (please report) {path_msg(tmp)}\n')
- if tmp in (p2_idx, idx, pack):
+ if tmp in (p2_idx, pack):
continue
os.rename(join(tmpdir, tmp), join(parent, tmp))
# Let this indicate success
os.rename(join(tmpdir, p2_idx), join(parent, p2_idx))
- expected = frozenset((idx, pack))
+ expected = frozenset([pack])
remaining = frozenset(os.listdir(tmpdir))
assert expected == remaining
return rc
@@ -176,19 +174,36 @@ def attempt_repair(stem, base, out, *, verbose=False):
if rc:
log(f'{path_msg(base)} par2 repair: failed ({rc})\n')
if verbose: out.write(base + b' failed\n')
- else:
- log(f'{path_msg(base)} par2 repair: succeeded (0)\n')
- if verbose: out.write(base + b' repaired\n')
- rc = 100
- return rc
+ return rc if rc != 100 else EXIT_FAILURE
+ log(f'{path_msg(base)} par2 repair: succeeded, checking .idx\n')
+ try:
+ exp, act = trailing_and_actual_checksum(stem + b'.idx')
+ idx_ok = exp == act
+ if not idx_ok:
+ os.unlink(stem + b'.idx')
+ except FileNotFoundError:
+ idx_ok = False
+ if not idx_ok:
+ cmd = (b'git', b'index-pack', b'--no-rev-index', stem + b'.pack')
+ idx_rc = run(cmd).returncode
+ if idx_rc:
+ log(f'{path_msg(base)} index-pack failed\n')
+ if verbose: out.write(base + b' failed\n')
+ return idx_rc if idx_rc != 100 else EXIT_FAILURE
+ log(f'{path_msg(base)} index-pack succeeded\n')
+ if verbose: out.write(base + b' repaired\n')
+ return 100


def do_pack(stem, par2_exists, out):
# Repair and generate are treated as optional actions, requested
# in addition to verification. Always validate the git checksums,
# and then run par2 validation if possible. The latter only tells
- # us that the pack/idx files haven't changed since the par2
- # generation.
+ # us that the pack file hasn't changed since the par2 generation.
+ #
+ # When making changes here, keep in mind that bup used to include
+ # .idx files in the par2 recovery info, and we want to maintain
+ # backward-compatibility.
base = os.path.basename(stem)
rc = git_verify(stem, quick=opt.quick)
if rc:
@@ -220,7 +235,7 @@ def do_pack(stem, par2_exists, out):


optspec = """
-bup fsck [options...] [filenames...]
+bup fsck [options...] [packfile...]
--
r,repair attempt to repair errors using par2 (dangerous!)
g,generate generate auto-repair information using par2
@@ -247,24 +262,16 @@ def main(argv):
log(f'error: cannot --generate or --repair without par2\n')

if extra:
- extra = [argv_bytes(x) for x in extra]
+ pack_stems = [argv_bytes(x) for x in extra]
+ for stem in pack_stems:
+ if not stem.endswith(b'.pack'):
+ o.fatal(f'packfile argument {path_msg(stem)} must end with .pack')
else:
debug('fsck: No filenames given: checking all packs.\n')
git.check_repo_or_die()
- extra = glob.glob(git.repo(b'objects/pack/*.pack'))
-
- pack_stems = []
- for name in extra:
- if name.endswith(b'.pack'):
- pack_stems.append(name[:-5])
- elif name.endswith(b'.idx'):
- pack_stems.append(name[:-4])
- elif name.endswith(b'.par2'):
- pack_stems.append(name[:-5])
- elif os.path.exists(name + b'.pack'):
- pack_stems.append(name)
- else:
- raise Exception('%r is not a pack file!' % name)
+ pack_stems = glob.glob(git.repo(b'objects/pack/*.pack'))
+
+ pack_stems = [x[:-5] for x in pack_stems]

sys.stdout.flush()
out = byte_stream(sys.stdout)
diff --git a/note/main.md b/note/main.md
index 2b364d7ea..f7666dc76 100644
--- a/note/main.md
+++ b/note/main.md
@@ -11,6 +11,12 @@ May require attention
error when a suitable par2 isn't available. Previously `--generate`
would just print "skipped".

+* `bup fsck` now only accepts `*.pack` arguments, and it no longer
+ generates recovery information for the `*.idx` files so that all of
+ the recovery blocks will protect the packfile. This is safer given
+ that the `.idx` files can be (and now are) trivially regenerated
+ from the packfiles during `--repair` when needed.
+
Thanks to (at least)
====================

diff --git a/test/ext/test-fsck b/test/ext/test-fsck
index 7d3a59520..c92d72409 100755
--- a/test/ext/test-fsck
+++ b/test/ext/test-fsck
@@ -40,12 +40,14 @@ WVPASS bup damage "$BUP_DIR"/objects/pack/*.idx -n10 -s1 -S0
WVFAIL bup fsck --quick
WVFAIL bup fsck --quick --disable-par2

-WVFAIL bup fsck -r # should repair, but returns nonzero
+# should repair, but returns nonzero, also recreates .idx read-only
+WVFAIL bup fsck -r

WVPASS bup damage "$BUP_DIR"/objects/pack/*.pack -n10 -s1 -S0
WVFAIL bup fsck --quick
WVFAIL bup fsck --quick --disable-par2

+WVPASS chmod u+w "$BUP_DIR"/objects/pack/*.idx

Rob Browning

unread,
Jun 24, 2024, 7:19:29 PM (8 days ago) Jun 24
to bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> I cherry-picked all the recent fsck improvements from 0.33.x to main,
> and these changes continue that work.
>
> Proposed for main.

Oh, and yes, still plan to add some tests to the last commit, but I
wanted to go ahead and see if there were any concerns overall.

--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
Reply all
Reply to author
Forward
0 new messages