[PATCH v2 2/5] fsck: run par2 generate in a sandbox to avoid empty files

0 views
Skip to first unread message

Rob Browning

unread,
Jun 19, 2024, 3:16:43 PM (14 days ago) Jun 19
to bup-...@googlegroups.com
If par2 generate is interrupted, it may leave empty *.par2 files, and
not just the top-level PATH.par2 index. To avoid that possiblity, run
par2 in a temporary directory and then move the recovery files to the
right place if par2 succeeds.

fsck currently checks whether the top-level index is empty, but
doesn't check any of the other vol files.

Thanks to Johannes Berg for reporting the problem.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 39 ++++++++++++++++++++++++++++++---------
note/0.33.x.md | 6 ++++++
2 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 16a2f2178..f0aac3583 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -1,14 +1,14 @@

-from __future__ import absolute_import, print_function
from shutil import rmtree
from subprocess import PIPE
from tempfile import mkdtemp
from binascii import hexlify
+from os.path import join
import glob, os, subprocess, sys

from bup import options, git
from bup.compat import argv_bytes
-from bup.helpers import Sha1, chunkyreader, istty2, log, progress
+from bup.helpers import Sha1, chunkyreader, istty2, log, progress, temp_dir
from bup.io import byte_stream


@@ -20,14 +20,14 @@ def debug(s):
if opt.verbose > 1:
log(s)

-def run(argv):
+def run(argv, *, cwd=None):
# at least in python 2.5, using "stdout=2" or "stdout=sys.stderr" below
# doesn't actually work, because subprocess closes fd #2 right before
# execing for some reason. So we work around it by duplicating the fd
# first.
fd = os.dup(2) # copy stderr
try:
- p = subprocess.Popen(argv, stdout=fd, close_fds=False)
+ p = subprocess.Popen(argv, stdout=fd, close_fds=False, cwd=cwd)
return p.wait()
finally:
os.close(fd)
@@ -70,7 +70,7 @@ def is_par2_parallel():

_par2_parallel = None

-def par2(action, args, verb_floor=0):
+def par2(action, args, verb_floor=0, cwd=None):
global _par2_parallel
if _par2_parallel is None:
_par2_parallel = is_par2_parallel()
@@ -82,12 +82,33 @@ def par2(action, args, verb_floor=0):
if _par2_parallel:
cmd.append(b'-t1')
cmd.extend(args)
- return run(cmd)
+ return run(cmd, cwd=cwd)

def par2_generate(base):
- return par2(b'create',
- [b'-n1', b'-c200', b'--', base, base + b'.pack', base + b'.idx'],
- verb_floor=2)
+ parent, name = os.path.split(base)
+ # Work in a temp_dir because par2 was observed creating empty
+ # files when interrupted by C-c.
+ # cf. https://github.com/Parchive/par2cmdline/issues/84
+ with temp_dir(dir=parent, prefix=(name + b'-bup-tmp-')) as tmpdir:
+ idx_name = name + b'.idx'
+ pack_name = name + b'.pack'
+ os.symlink(join(b'../', idx_name), join(tmpdir, idx_name))
+ os.symlink(join(b'../', pack_name), join(tmpdir, pack_name))
+ rc = par2(b'create',
+ [b'-n1', b'-c200', b'--', name, pack_name, idx_name],
+ verb_floor=2, cwd=tmpdir)
+ if rc == 0:
+ p2_idx = name + b'.par2'
+ for tmp in os.listdir(tmpdir):
+ if tmp in (p2_idx, idx_name, pack_name):
+ continue
+ os.rename(join(tmpdir, tmp), join(parent, tmp))
+ # Let this indicate success
+ os.rename(join(tmpdir, p2_idx), join(parent, p2_idx))
+ expected = frozenset((idx_name, pack_name))
+ remaining = frozenset(os.listdir(tmpdir))
+ assert expected == remaining
+ return rc

def par2_verify(base):
return par2(b'verify', [b'--', base], verb_floor=3)
diff --git a/note/0.33.x.md b/note/0.33.x.md
index 32d739697..1fbd153fa 100644
--- a/note/0.33.x.md
+++ b/note/0.33.x.md
@@ -4,6 +4,12 @@ Notable changes in main since 0.33.3 (incomplete)
May require attention
---------------------

+* The `par2` command (invoked by `bup fsck -g`) may generate empty
+ recovery files if interrupted (say via C-c). To mitigate this, bup
+ now runs `par2` in a temporary directory, and only moves the
+ recovery files into place if the generation succeeds. See also
+ https://github.com/Parchive/par2cmdline/issues/84
+
* Previously, any `bup on REMOTE ...` commands that attempted to read
from standard input (for example `bup on HOST split < something` or
`bup on HOST split --git-ids ...`) would read nothing instead of the
--
2.43.0

Rob Browning

unread,
Jun 19, 2024, 3:16:43 PM (14 days ago) Jun 19
to bup-...@googlegroups.com
This version is simpler, avoiding any entanglement with the do_pack
logic, and it may or may not be exactly what we'll want in main. The
last three patches have changed from v1, and I'm still likely to add
tests.

See the commit messages and note changes for additional details.

Proposed for 0.33.x

Rob Browning (5):
fsck: don't require a repo when pack paths are provided
fsck: run par2 generate in a sandbox to avoid empty files
fsck: add and begin to use standardized EXIT_* codes
fsck: compute pack stems outside do_pack loop
fsck: ensure all requested pack-*par2 files are not empty

lib/bup/cmd/fsck.py | 106 +++++++++++++++++++++++++++++++-------------
lib/bup/helpers.py | 12 ++++-
note/0.33.x.md | 13 +++++-
3 files changed, 97 insertions(+), 34 deletions(-)

--
2.43.0

Rob Browning

unread,
Jun 19, 2024, 3:16:44 PM (14 days ago) Jun 19
to bup-...@googlegroups.com
This will allow us to re-use them.

Use stem instead of base in some cases to refer to the
/some/where/pack-HASH packfile "stems" since base is potentially
misleading, suggesting "basename", which we also need, e.g. this also
allows us to change "last" (which is the basename) to base.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 2339fb903..1309543d2 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -222,34 +222,37 @@ def main(argv):
git.check_repo_or_die()
extra = glob.glob(git.repo(b'objects/pack/*.pack'))

- sys.stdout.flush()
- out = byte_stream(sys.stdout)
- code = 0
- count = 0
- outstanding = {}
+ pack_stems = []
for name in extra:
if name.endswith(b'.pack'):
- base = name[:-5]
+ pack_stems.append(name[:-5])
elif name.endswith(b'.idx'):
- base = name[:-4]
+ pack_stems.append(name[:-4])
elif name.endswith(b'.par2'):
- base = name[:-5]
+ pack_stems.append(name[:-5])
elif os.path.exists(name + b'.pack'):
- base = name
+ pack_stems.append(name)
else:
raise Exception('%r is not a pack file!' % name)
- (dir,last) = os.path.split(base)
- par2_exists = os.path.exists(base + b'.par2')
- if par2_exists and os.stat(base + b'.par2').st_size == 0:
+
+ sys.stdout.flush()
+ out = byte_stream(sys.stdout)
+ code = 0
+ count = 0
+ outstanding = {}
+ for stem in pack_stems:
+ base = os.path.basename(stem)
+ par2_exists = os.path.exists(stem + b'.par2')
+ if par2_exists and os.stat(stem + b'.par2').st_size == 0:
par2_exists = 0
sys.stdout.flush() # Not sure we still need this, but it'll flush out too
debug('fsck: checking %r (%s)\n'
- % (last, par2_ok and par2_exists and 'par2' or 'git'))
+ % (base, par2_ok and par2_exists and 'par2' or 'git'))
if not opt.verbose:
progress('fsck (%d/%d)\r' % (count, len(extra)))

if not opt.jobs:
- nc = do_pack(base, last, par2_exists, out)
+ nc = do_pack(stem, base, par2_exists, out)
code = code or nc
count += 1
else:
@@ -265,7 +268,7 @@ def main(argv):
outstanding[pid] = 1
else: # child
try:
- sys.exit(do_pack(base, last, par2_exists, out))
+ sys.exit(do_pack(stem, base, par2_exists, out))
except Exception as e:
log('exception: %r\n' % e)
sys.exit(99)
--
2.43.0

Rob Browning

unread,
Jun 22, 2024, 5:23:29 PM (11 days ago) Jun 22
to bup-...@googlegroups.com
The main differences as compared to v2 are some additional tests, and
that this version checks for our new full set of potential par2 file
problems both up-front, and before fsck exits. Previously it only
looked at the existence of the par2 index file up-front (e.g. it
wouldn't notice a missing vol file, or a vol file with no index).

The secondary check just before we exit is intended to catch any
(likely rare) problems with the moves out of the par2 generate
tempdir.

See the commit messages and the note changes for additional details.

Proposed for 0.33.x

*** BLURB HERE ***

Rob Browning (6):
test-fsck: lower check -j99 to -j9
fsck: don't require a repo when pack paths are provided
fsck: run par2 generate in a sandbox to avoid empty files
fsck: add and begin to use standardized EXIT_* codes
fsck: compute pack stems outside do_pack loop
fsck: report errors for all mismatched or empty pack-*.par2 files

lib/bup/cmd/fsck.py | 138 +++++++++++++++++++++++++++++++++-----------
lib/bup/helpers.py | 12 +++-
note/0.33.x.md | 14 ++++-
test/ext/test-fsck | 31 +++++++++-
4 files changed, 159 insertions(+), 36 deletions(-)

--
2.43.0

Rob Browning

unread,
Jun 22, 2024, 5:23:29 PM (11 days ago) Jun 22
to bup-...@googlegroups.com
cf. 24208d7a7cd30b9b5e2838a96b79c36e3f746b1d

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
test/ext/test-fsck | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/test/ext/test-fsck b/test/ext/test-fsck
index 2f6f4c4e4..73e734d9c 100755
--- a/test/ext/test-fsck
+++ b/test/ext/test-fsck
@@ -42,7 +42,10 @@ WVPASS bup damage "$BUP_DIR"/objects/pack/*.idx -n10 -s1 -S0
WVFAIL bup fsck --quick -j4
WVPASS bup damage "$BUP_DIR"/objects/pack/*.pack -n10 -s1024 --percent 0.4 -S0
WVFAIL bup fsck --quick
-WVFAIL bup fsck --quick -rvv -j99 # fails because repairs were needed
+
+# Fails because repairs were needed or we don't have a suitable par2
+WVFAIL bup fsck --quick -rvv -j9
+
if bup fsck --par2-ok; then
WVPASS bup fsck -r # ok because of repairs from last time

--
2.43.0

Rob Browning

unread,
Jun 22, 2024, 5:23:29 PM (11 days ago) Jun 22
to bup-...@googlegroups.com
If par2 generate is interrupted, it may leave empty *.par2 files, and
not just the top-level PATH.par2 index. To avoid that possiblity, run
par2 in a temporary directory and then move the recovery files to the
right place if par2 succeeds.

fsck currently checks whether the top-level index is empty, but
doesn't check any of the other vol files.

Thanks to Johannes Berg for reporting the problem.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 39 ++++++++++++++++++++++++++++++---------
note/0.33.x.md | 6 ++++++
2 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 16a2f2178..f0aac3583 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py

Rob Browning

unread,
Jun 22, 2024, 5:23:29 PM (11 days ago) Jun 22
to bup-...@googlegroups.com
Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 9 ++++-----
lib/bup/helpers.py | 12 +++++++++++-
2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index f0aac3583..2339fb903 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py
@@ -8,7 +8,9 @@ import glob, os, subprocess, sys

from bup import options, git
from bup.compat import argv_bytes
-from bup.helpers import Sha1, chunkyreader, istty2, log, progress, temp_dir
+from bup.helpers \
+ import (EXIT_FALSE, EXIT_TRUE,
+ Sha1, chunkyreader, istty2, log, progress, temp_dir)
from bup.io import byte_stream


@@ -209,10 +211,7 @@ def main(argv):

par2_setup()
if opt.par2_ok:
- if par2_ok:
- sys.exit(0) # 'true' in sh
- else:
- sys.exit(1)
+ sys.exit(EXIT_TRUE if par2_ok else EXIT_FALSE)
if opt.disable_par2:
par2_ok = 0

diff --git a/lib/bup/helpers.py b/lib/bup/helpers.py
index 8c9338432..881e8dffa 100644
--- a/lib/bup/helpers.py
+++ b/lib/bup/helpers.py
@@ -1,6 +1,5 @@
"""Helper functions and classes for bup."""

-from __future__ import absolute_import, division
from collections import namedtuple
from contextlib import ExitStack
from ctypes import sizeof, c_void_p
@@ -21,6 +20,17 @@ from bup.io import byte_stream, path_msg
from bup.options import _tty_width as tty_width


+# EXIT_TRUE (just an alias) and EXIT_FALSE are intended for cases like
+# POSIX grep or test, or bup's own "fsck --par2-ok", where the command
+# is asking a question with a yes or no answer. Eventually all
+# commands should avoid exiting with 1 for errors.
+
+EXIT_SUCCESS = 0
+EXIT_TRUE = 0
+EXIT_FALSE = 1
+EXIT_FAILURE = 2
+
+
buglvl = int(os.environ.get('BUP_DEBUG', 0))


--
2.43.0

Rob Browning

unread,
Jun 22, 2024, 5:23:29 PM (11 days ago) Jun 22
to bup-...@googlegroups.com
This will allow us to re-use them.

Use stem instead of base in some cases to refer to the
/some/where/pack-HASH packfile "stems" since base is potentially
misleading, suggesting "basename", which we also need, e.g. this also
allows us to change "last" (which is the basename) to base.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
lib/bup/cmd/fsck.py | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/lib/bup/cmd/fsck.py b/lib/bup/cmd/fsck.py
index 2339fb903..1309543d2 100644
--- a/lib/bup/cmd/fsck.py
+++ b/lib/bup/cmd/fsck.py

Rob Browning

unread,
Jun 23, 2024, 4:59:26 PM (10 days ago) Jun 23
to bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> The main differences as compared to v2 are some additional tests, and
> that this version checks for our new full set of potential par2 file
> problems both up-front, and before fsck exits. Previously it only
> looked at the existence of the par2 index file up-front (e.g. it
> wouldn't notice a missing vol file, or a vol file with no index).
>
> The secondary check just before we exit is intended to catch any
> (likely rare) problems with the moves out of the par2 generate
> tempdir.
>
> See the commit messages and the note changes for additional details.

Pushed a slightly adjusted version to 0.33.x. I'll need to rework the
related changes I'd proposed for main to reflect where we ended up.

Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
Reply all
Reply to author
Forward
0 new messages