Restore failed with "EOFError: encountered EOF while reading vuint"

73 views
Skip to first unread message

Anton Eliasson

unread,
Nov 4, 2014, 4:27:26 PM11/4/14
to bup-...@googlegroups.com
Hi!
In an attempt to save some space on my backup drive I am merging a few
bup repositories so that bup can deduplicate away some data. I've
written some shell script magic to list all backups in a branch in a bup
repo and for each backup: restore it to a temporary directory, index it
and then save it to a different repo. This has worked well for 24 out of
25 backups in a branch, but for one of them the restore fails after
restoring about 15 % of the expected number of files:

$ ~/src/bup/bup -d /Athena/Backup/legacybup restore
avalon-anton/2014-06-05-140711
Traceback (most recent call last):
File "/home/anton/src/bup/cmd/bup-restore", line 340, in <module>
do_root(target, opt.sparse, owner_map)
File "/home/anton/src/bup/cmd/bup-restore", line 228, in do_root
do_node(n, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 278, in do_node
m = metadata.Metadata.read(meta_stream)
File "/home/anton/src/bup/lib/bup/metadata.py", line 753, in read
tag = vint.read_vuint(port)
File "/home/anton/src/bup/lib/bup/vint.py", line 30, in read_vuint
raise EOFError('encountered EOF while reading vuint');
EOFError: encountered EOF while reading vuint

I get the same error with bup 0.25 from Ubuntu 14.04 and with the latest
git version (commit a17037508c) built from source.

Running restore with -v:

[...]
home/anton/.mixxx/
home/anton/.mixxx/.git/
home/anton/.mixxx/.git/annex/
home/anton/.mixxx/.git/annex/creds/
home/anton/.mixxx/.git/annex/journal/
home/anton/.mixxx/.git/annex/misctmp/
home/anton/.mixxx/.git/annex/misctmp/xmppgit/
home/anton/.mixxx/.git/annex/objects/
home/anton/.mixxx/.git/annex/objects/1K/
home/anton/.mixxx/.git/annex/objects/1K/2W/
home/anton/.mixxx/.git/annex/objects/1K/2W/SHA256E-s58552--d35436c1a34d261501f598e6207340aa5d6259e434634bab50f315541491ae71.log/
home/anton/.mixxx/.git/annex/objects/54/
home/anton/.mixxx/.git/annex/objects/54/f1/
home/anton/.mixxx/.git/annex/objects/54/f1/SHA256E-s2893--91493feb8532c8d147f8594af6279888468473b67636f60dbd200e57f5f0ea73.cfg/
home/anton/.mixxx/.git/annex/objects/GG/
home/anton/.mixxx/.git/annex/objects/GG/4J/
home/anton/.mixxx/.git/annex/objects/GG/4J/SHA256E-s37744--bfcc93268b6614a214d2666a828b2659700512139c03d370c5c2348a4b5b2fce.log/
home/anton/.mixxx/.git/annex/objects/Gw/
home/anton/.mixxx/.git/annex/objects/Gw/Kp/
home/anton/.mixxx/.git/annex/objects/Gw/Kp/SHA256E-s8238080--beca524be913da1979e4ee0c62bd040a3d748cbee35dc31833a8d7fe9bc39ee8/
home/anton/.mixxx/.git/annex/objects/Kx/
home/anton/.mixxx/.git/annex/objects/Kx/qG/
home/anton/.mixxx/.git/annex/objects/Kx/qG/SHA256E-s2894--183c4f6c8bd98b4ff116c375e5dbc8a5009f9276d976bcb4f7164573dc9caa2d.cfg/
home/anton/.mixxx/.git/annex/objects/qz/
home/anton/.mixxx/.git/annex/objects/qz/94/
home/anton/.mixxx/.git/annex/objects/qz/94/SHA256E-s8237056--510c16db224130575c9bec27223e214ec04b8ef019bcd312ef13a7ddc62e0c85/
home/anton/.mixxx/.git/annex/objects/vQ/
home/anton/.mixxx/.git/annex/objects/vQ/QQ/
home/anton/.mixxx/.git/annex/objects/vQ/QQ/SHA256E-s210--f9cd9f199bf20836f8867797711998755835179a7dd89dab06a7671e7ad851c8.xml/
home/anton/.mixxx/.git/annex/ssh/
Traceback (most recent call last):
[...]
$ ls -l 2014-06-05-140711/home/anton/.mixxx/.git/annex/ssh
totalt 0
-rw-r--r-- 1 anton anton 0 maj 1 2014
08db033c943f7ad4fd81dad98e5d98f2.lock
-rw-r--r-- 1 anton anton 0 apr 11 2014 anton@hephaestus

In the next backup, that same directory looks like this:

totalt 0
-rw-r--r-- 1 anton anton 0 maj 1 2014
08db033c943f7ad4fd81dad98e5d98f2.lock
srw------- 1 anton anton 0 jun 13 22:58 anton@hephaestus
-rw-r--r-- 1 anton anton 0 apr 11 2014 an...@hephaestus.lock

So anton@hephaestus should probably be a socket but for some reason
isn't. How can I debug this further? The data isn't particularly
interesting to me but I'm curious why it fails.

The backup that fails is number 15 out of 25 in this branch. The other
backups seem to restore fine. List of branches and a list of backups in
the affected branch:

$ ~/src/bup/bup -d /Athena/Backup/legacybup ls
avalon-anton
balder-anton
riven-anton
$ ~/src/bup/bup -d /Athena/Backup/legacybup ls avalon-anton
2013-11-23-233247
2013-11-25-193931
2013-12-02-215046
2013-12-21-163301
2014-01-07-002019
2014-01-20-221015
2014-01-29-204109
2014-02-03-222521
2014-02-11-210520
2014-02-15-235944
2014-03-22-112622
2014-03-25-195641
2014-04-05-124420
2014-04-23-225708
2014-06-05-140711
2014-06-13-230027
2014-06-22-154438
2014-06-28-150359
2014-07-10-221827
2014-07-17-211003
2014-08-01-124029
2014-08-08-162221
2014-08-18-221508
2014-09-02-182827
2014-09-11-214801
latest

bup fsck does not find any problems. The total size of the repo is 83
GB. Each backup is about 40 GB and consists of ~150 000 mixed-size files
(a home directory). All backups in this branch were created on Arch
Linux and I am restoring them on Ubuntu 14.04. Both the filesystem that
was being backed up and the one I'm restoring to is ext4.

--
Best Regards,
Anton Eliasson

Rob Browning

unread,
Nov 4, 2014, 5:35:57 PM11/4/14
to Anton Eliasson, bup-...@googlegroups.com
Anton Eliasson <de...@antoneliasson.se> writes:

> In an attempt to save some space on my backup drive I am merging a few
> bup repositories so that bup can deduplicate away some data. I've
> written some shell script magic to list all backups in a branch in a bup
> repo and for each backup: restore it to a temporary directory, index it
> and then save it to a different repo. This has worked well for 24 out of
> 25 backups in a branch, but for one of them the restore fails after

(Aside: make sure you use --numeric-owner. And of course, if you want
test either "bup get" or "bup gc" (to ideally produce the same result
more efficiently), I won't try to stop you (see the experimental
patches to the list, or the relevant tmp/ branches). Although with
"bup gc", if you need to kill individual saves (rather than whole
branches), I think you'll also have to call git filter-branch manually
first. But be careful -- only work on a copy of your repo, and don't
trust the result (yet) without a lot of vetting.)

> $ ~/src/bup/bup -d /Athena/Backup/legacybup restore
> avalon-anton/2014-06-05-140711
> Traceback (most recent call last):
> File "/home/anton/src/bup/cmd/bup-restore", line 340, in <module>
> do_root(target, opt.sparse, owner_map)
> File "/home/anton/src/bup/cmd/bup-restore", line 228, in do_root
> do_node(n, sub, sparse, owner_map, meta = m)
> File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
> do_node(top, sub, sparse, owner_map, meta = m)
> File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
> do_node(top, sub, sparse, owner_map, meta = m)
> File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
> do_node(top, sub, sparse, owner_map, meta = m)
> File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
> do_node(top, sub, sparse, owner_map, meta = m)
> File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
> do_node(top, sub, sparse, owner_map, meta = m)
> File "/home/anton/src/bup/cmd/bup-restore", line 278, in do_node
> m = metadata.Metadata.read(meta_stream)
> File "/home/anton/src/bup/lib/bup/metadata.py", line 753, in read
> tag = vint.read_vuint(port)
> File "/home/anton/src/bup/lib/bup/vint.py", line 30, in read_vuint
> raise EOFError('encountered EOF while reading vuint');
> EOFError: encountered EOF while reading vuint

I bup is trying to read the current path's metadata from the relevant
.bupm file, and (as indicated) it's hitting EOF. If you can reproduce
this easily, then it might be useful to try multiple -v options,
i.e. -vvvv and maybe even BUP_DEBUG=11.

If I recall correctly, the extra -v options will have save print each
path (not just the dirs), so we can see which file it's failing on.

Hope this helps
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Anton Eliasson

unread,
Nov 5, 2014, 3:56:23 AM11/5/14
to bup-...@googlegroups.com
Den 2014-11-04 23:35, Rob Browning skrev:
> Anton Eliasson <de...@antoneliasson.se> writes:
>
>> In an attempt to save some space on my backup drive I am merging a few
>> bup repositories so that bup can deduplicate away some data. I've
>> written some shell script magic to list all backups in a branch in a bup
>> repo and for each backup: restore it to a temporary directory, index it
>> and then save it to a different repo. This has worked well for 24 out of
>> 25 backups in a branch, but for one of them the restore fails after
> (Aside: make sure you use --numeric-owner. And of course, if you want
> test either "bup get" or "bup gc" (to ideally produce the same result
> more efficiently), I won't try to stop you (see the experimental
> patches to the list, or the relevant tmp/ branches). Although with
> "bup gc", if you need to kill individual saves (rather than whole
> branches), I think you'll also have to call git filter-branch manually
> first. But be careful -- only work on a copy of your repo, and don't
> trust the result (yet) without a lot of vetting.)
Thanks. I might try bup get for the other branches.
It looks like this. next(1) returns 0, which is probably an error,
comparing it to the previous output:

$ BUP_DEBUG=11 ~/src/bup/bup -d /Athena/Backup/legacybup restore -vvvv
avalon-anton/2014-06-05-140711
[...]
home/anton/.mixxx/.git/annex/objects/qz/
<<<<File.size() is calculating (for '.bupm')...
<<<<File.size() done.
next(1) returned 1
next(1) returned 1
next(47) returned 47
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/objects/qz/94/
<<<<File.size() is calculating (for '.bupm')...
<<<<File.size() done.
next(1) returned 1
next(1) returned 1
next(47) returned 47
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/objects/qz/94/SHA256E-s8237056--510c16db224130575c9bec27223e214ec04b8ef019bcd312ef13a7ddc62e0c85/
next(1) returned 1
next(1) returned 1
next(47) returned 47
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/objects/qz/94/SHA256E-s8237056--510c16db224130575c9bec27223e214ec04b8ef019bcd312ef13a7ddc62e0c85/SHA256E-s8237056--510c16db224130575c9bec27223e214ec04b8ef019bcd312ef13a7ddc62e0c85.map
<<<<File.size() is calculating (for
'SHA256E-s8237056--510c16db224130575c9bec27223e214ec04b8ef019bcd312ef13a7ddc62e0c85.map')...
<<<<File.size() done.
next(65536) returned 0
<<<<File.size() is calculating (for '.bupm')...
<<<<File.size() done.
next(1) returned 1
next(1) returned 1
next(47) returned 47
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/objects/vQ/
<<<<File.size() is calculating (for '.bupm')...
<<<<File.size() done.
next(1) returned 1
next(1) returned 1
next(47) returned 47
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/objects/vQ/QQ/
<<<<File.size() is calculating (for '.bupm')...
<<<<File.size() done.
next(1) returned 1
next(1) returned 1
next(50) returned 50
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/objects/vQ/QQ/SHA256E-s210--f9cd9f199bf20836f8867797711998755835179a7dd89dab06a7671e7ad851c8.xml/
next(1) returned 1
next(1) returned 1
next(50) returned 50
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/objects/vQ/QQ/SHA256E-s210--f9cd9f199bf20836f8867797711998755835179a7dd89dab06a7671e7ad851c8.xml/SHA256E-s210--f9cd9f199bf20836f8867797711998755835179a7dd89dab06a7671e7ad851c8.xml
<<<<File.size() is calculating (for
'SHA256E-s210--f9cd9f199bf20836f8867797711998755835179a7dd89dab06a7671e7ad851c8.xml')...
<<<<File.size() done.
next(65536) returned 210
next(65536) returned 0
next(1) returned 1
next(1) returned 1
next(50) returned 50
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/schedulestate
<<<<File.size() is calculating (for 'schedulestate')...
<<<<File.size() done.
next(65536) returned 118
next(65536) returned 0
next(1) returned 1
next(1) returned 1
next(47) returned 47
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/sentinal
<<<<File.size() is calculating (for 'sentinal')...
<<<<File.size() done.
next(65536) returned 0
next(1) returned 1
next(1) returned 1
next(47) returned 47
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/sentinal.cache
<<<<File.size() is calculating (for 'sentinal.cache')...
<<<<File.size() done.
next(65536) returned 19
next(65536) returned 0
<<<<File.size() is calculating (for '.bupm')...
<<<<File.size() done.
next(1) returned 1
next(1) returned 1
next(49) returned 49
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/ssh/
next(1) returned 1
next(1) returned 1
next(50) returned 50
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/ssh/08db033c943f7ad4fd81dad98e5d98f2.lock
<<<<File.size() is calculating (for
'08db033c943f7ad4fd81dad98e5d98f2.lock')...
<<<<File.size() done.
next(65536) returned 0
next(1) returned 1
next(1) returned 1
next(47) returned 47
next(1) returned 1
next(1) returned 1
next(3) returned 3
next(1) returned 1
home/anton/.mixxx/.git/annex/ssh/anton@hephaestus
<<<<File.size() is calculating (for 'anton@hephaestus')...
<<<<File.size() done.
next(65536) returned 0
next(1) returned 0
Traceback (most recent call last):
File "/home/anton/src/bup/cmd/bup-restore", line 340, in <module>
do_root(target, opt.sparse, owner_map)
File "/home/anton/src/bup/cmd/bup-restore", line 228, in do_root
do_node(n, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 279, in do_node
do_node(top, sub, sparse, owner_map, meta = m)
File "/home/anton/src/bup/cmd/bup-restore", line 278, in do_node
m = metadata.Metadata.read(meta_stream)
File "/home/anton/src/bup/lib/bup/metadata.py", line 753, in read
tag = vint.read_vuint(port)
File "/home/anton/src/bup/lib/bup/vint.py", line 30, in read_vuint
raise EOFError('encountered EOF while reading vuint');
EOFError: encountered EOF while reading vuint


Rob Browning

unread,
Nov 5, 2014, 12:40:19 PM11/5/14
to Anton Eliasson, bup-...@googlegroups.com
Anton Eliasson <de...@antoneliasson.se> writes:

> home/anton/.mixxx/.git/annex/ssh/anton@hephaestus
> <<<<File.size() is calculating (for 'anton@hephaestus')...
> <<<<File.size() done.
> next(65536) returned 0
> next(1) returned 0
> Traceback (most recent call last):

So it's choking on the "anton@hephaestus" socket -- one possibility is
that the .bupm file (which stores all the metadata for a dir and its
contents) ended up truncated or corrupted somehow.

If you want to poke around further, try this (from memory):

export BUP_DIR=/Athena/Backup/legacybup

(Of course, unset BUP_DIR later, assuming it isn't normally set.)

To see the hashes of all of the (non-hidden) paths in ssh:

bup ls -s /avalon-anton/2014-06-05-140711/home/anton/.mixxx/.git/annex/ssh

To retrieve the .bupm for ssh (stored as a hidden path), which should
include the metadata record that restore was trying to read:

bup cat-file --bupm \
/avalon-anton/2014-06-05-140711/home/anton/.mixxx/.git/annex/ssh \
> tmp-ssh.bupm

To see the decoded metadata:

bup meta -tvf tmp-ssh.bupm # Add more v's for more detail.

Compare that output to the paths in the listing produced by ls above.
The .bupm entries should correspond to the ls entries, except that there
should be an extra initial .bupm entry for the ssh dir itself
(i.e. conceptually for ".").

Now try (roughly) the same metadata read that failed for restore:

bup cat-file --meta \
/avalon-anton/2014-06-05-140711/home/anton/.mixxx/.git/annex/ssh/anton@hephaestus \
> tmp-anton.meta

And if that doesn't fail, see what's in there like this:

bup meta -tvf tmp-anton.meta # Add more v's for more detail.

Thanks

Anton Eliasson

unread,
Nov 10, 2014, 3:04:44 PM11/10/14
to bup-...@googlegroups.com
Den 2014-11-05 18:40, Rob Browning skrev:
> Anton Eliasson <de...@antoneliasson.se> writes:
>
>> home/anton/.mixxx/.git/annex/ssh/anton@hephaestus
>> <<<<File.size() is calculating (for 'anton@hephaestus')...
>> <<<<File.size() done.
>> next(65536) returned 0
>> next(1) returned 0
>> Traceback (most recent call last):
> So it's choking on the "anton@hephaestus" socket -- one possibility is
> that the .bupm file (which stores all the metadata for a dir and its
> contents) ended up truncated or corrupted somehow.
>
> If you want to poke around further, try this (from memory):
>
> export BUP_DIR=/Athena/Backup/legacybup
>
> (Of course, unset BUP_DIR later, assuming it isn't normally set.)
>
> To see the hashes of all of the (non-hidden) paths in ssh:
>
> bup ls -s /avalon-anton/2014-06-05-140711/home/anton/.mixxx/.git/annex/ssh
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
08db033c943f7ad4fd81dad98e5d98f2.lock
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 anton@hephaestus
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 an...@hephaestus.lock

>
> To retrieve the .bupm for ssh (stored as a hidden path), which should
> include the metadata record that restore was trying to read:
>
> bup cat-file --bupm \
> /avalon-anton/2014-06-05-140711/home/anton/.mixxx/.git/annex/ssh \
> > tmp-ssh.bupm
>
> To see the decoded metadata:
>
> bup meta -tvf tmp-ssh.bupm # Add more v's for more detail.
drwxr-xr-x anton/anton - 2014-06-05 14:12
-rw-r--r-- anton/anton - 2014-05-01 01:11
-rw-r--r-- anton/anton - 2014-04-11 21:02

With two or more -v:

path:
mode: 040755 (drwxr-xr-x)
rdev: 0
uid: 1000
gid: 1000
user: anton
group: anton
atime: 1397242947.082876181
mtime: 1401970360.551795131
ctime: 1401970360.551795131
linux-attr: 0x80000

path:
mode: 0100644 (-rw-r--r--)
rdev: 0
uid: 1000
gid: 1000
user: anton
group: anton
atime: 1398899488.537507149
mtime: 1398899488.537507149
ctime: 1398899488.537507149
linux-attr: 0x80000

path:
mode: 0100644 (-rw-r--r--)
rdev: 0
uid: 1000
gid: 1000
user: anton
group: anton
atime: 1397242947.082876181
mtime: 1397242947.082876181
ctime: 1397242947.082876181
linux-attr: 0x80000

Looks like one file is missing from the bupm, and judging from the
timestamps I believe it is the socket anton@hephaestus. The other two
files seem to be the stale lockfiles
08db033c943f7ad4fd81dad98e5d98f2.lock and an...@hephaestus.lock.
>
> Compare that output to the paths in the listing produced by ls above.
> The .bupm entries should correspond to the ls entries, except that there
> should be an extra initial .bupm entry for the ssh dir itself
> (i.e. conceptually for ".").
>
> Now try (roughly) the same metadata read that failed for restore:
>
> bup cat-file --meta \
> /avalon-anton/2014-06-05-140711/home/anton/.mixxx/.git/annex/ssh/anton@hephaestus \
> > tmp-anton.meta
It did fail:

Traceback (most recent call last):
File "/usr/lib/bup/cmd/bup-cat-file", line 52, in <module>
sys.stdout.write(n.metadata().encode())
File "/usr/lib/bup/bup/vfs.py", line 308, in metadata
self.parent._populate_metadata()
File "/usr/lib/bup/bup/vfs.py", line 411, in _populate_metadata
sub._metadata = metadata.Metadata.read(meta_stream)
File "/usr/lib/bup/bup/metadata.py", line 715, in read
tag = vint.read_vuint(port)
File "/usr/lib/bup/bup/vint.py", line 30, in read_vuint
raise EOFError('encountered EOF while reading vuint');
EOFError: encountered EOF while reading vuint

>
> And if that doesn't fail, see what's in there like this:
>
> bup meta -tvf tmp-anton.meta # Add more v's for more detail.
>
> Thanks
Thank you!

Rob Browning

unread,
Nov 10, 2014, 5:11:48 PM11/10/14
to Anton Eliasson, bup-...@googlegroups.com
Anton Eliasson <de...@antoneliasson.se> writes:

> Looks like one file is missing from the bupm, and judging from the
> timestamps I believe it is the socket anton@hephaestus. The other two
> files seem to be the stale lockfiles
> 08db033c943f7ad4fd81dad98e5d98f2.lock and an...@hephaestus.lock.

OK, so that's at least a partial answer to "what?", but not "why?".

I assume that as far as you know the original save completed without any
errors (i.e. no exceptions, and a zero exit status)?

Anton Eliasson

unread,
Nov 11, 2014, 7:07:05 AM11/11/14
to bup-...@googlegroups.com
Den 2014-11-10 23:11, Rob Browning skrev:
> Anton Eliasson <de...@antoneliasson.se> writes:
>
>> Looks like one file is missing from the bupm, and judging from the
>> timestamps I believe it is the socket anton@hephaestus. The other two
>> files seem to be the stale lockfiles
>> 08db033c943f7ad4fd81dad98e5d98f2.lock and an...@hephaestus.lock.
> OK, so that's at least a partial answer to "what?", but not "why?".
>
> I assume that as far as you know the original save completed without any
> errors (i.e. no exceptions, and a zero exit status)?
>
> Thanks
Yes. If it had failed with an exception I would probably have restarted
the backup, but this repo contains no other backups from that day. I
usually don't check bup's exit status, because it returns nonzero values
for things I consider warnings (IIRC for instance when a file has
disappeared between the index and the save, which is common in a home
directory while programs are running).

Rob Browning

unread,
Nov 11, 2014, 10:07:22 PM11/11/14
to Anton Eliasson, bup-...@googlegroups.com
Anton Eliasson <de...@antoneliasson.se> writes:

> Yes. If it had failed with an exception I would probably have restarted
> the backup, but this repo contains no other backups from that day.

> I usually don't check bup's exit status, because it returns nonzero
> values for things I consider warnings (IIRC for instance when a file
> has disappeared between the index and the save, which is common in a
> home directory while programs are running).

Right -- I consider this a significant problem, but not with your
actions, with bup.

I imagine we need some combination of clearly documented exit status
values that would allow you to distinguish the important cases, and/or
command line options that allow you to suppress errors for certain
conditions.

Having to routinely ignore the exit status of "bup save" is bad.

For now, if there's a way you can use snapshots, that should allow you
to stop ignoring errors (and is advisable anyway), but I know that may
not be feasible.

In any case, I'm wondering how bup was able to write a truncated .bupm
without halting. Do you know if the rest of the tree after that (in
drecurse order -- roughly reverse alphabetical) is fine? i.e. can you
see dirs "after" that via "bup ls"?

Anton Eliasson

unread,
Nov 13, 2014, 4:54:35 AM11/13/14
to bup-...@googlegroups.com
Den 2014-11-12 04:07, Rob Browning skrev:
> For now, if there's a way you can use snapshots, that should allow you
> to stop ignoring errors (and is advisable anyway), but I know that may
> not be feasible.
You mean taking a snapshot of the source filesystem and backing up the
snapshot instead of the live filesystem? Yes, that would probably be a
good idea, but it would require a filesystem like Btrfs or ZFS and I am
currently don't use any of them for my home directory.
>
> In any case, I'm wondering how bup was able to write a truncated .bupm
> without halting. Do you know if the rest of the tree after that (in
> drecurse order -- roughly reverse alphabetical) is fine? i.e. can you
> see dirs "after" that via "bup ls"?
>
Yes, it looks like it should. bup ls shows directories both before and
after /home/anton/.mixxx in alphabetical order and I can also restore
directories both before and after.

Rob Browning

unread,
Nov 13, 2014, 11:59:53 AM11/13/14
to Anton Eliasson, bup-...@googlegroups.com
Anton Eliasson <de...@antoneliasson.se> writes:

> You mean taking a snapshot of the source filesystem and backing up the
> snapshot instead of the live filesystem? Yes, that would probably be a
> good idea, but it would require a filesystem like Btrfs or ZFS and I am
> currently don't use any of them for my home directory.

Or LVM (which is what I use), i.e. ext4 on LVM, etc.

But if you go that route and are backing up the whole system, you'll
likely want to take some additional steps i.e.:

# stop-all-relevant-services
# create-snapshot-via-lvmcreate
# restart-services
# index/save-snapshot
# possibly-generate-par2-data-via-fsck
# possibly-spot-test-the-save
# remove-snapshot

The spot testing could be via a partial restore and t/compare-trees, for
example, and while I have my own infrastructure for this process, I
believe it's also the sort of thing that bup-cron intends to address.

> Yes, it looks like it should. bup ls shows directories both before and
> after /home/anton/.mixxx in alphabetical order and I can also restore
> directories both before and after.

So somehow there's a mismatch between what ended up in the .bupm and
what ended up in the git tree for the dir.

Just to double check, was this the entire content of the relevant dir:

08db033c943f7ad4fd81dad98e5d98f2.lock
anton@hephaestus
an...@hephaestus.lock

If so, then it looks like the missing entry is at the end of the .bupm
(i.e. there are no paths after it in the tree), and that suggests
save-cmd.py's either dropping the .bupm entry for the final path from
the pending metadata list for the dir in its "metalists" stack, or it's
never adding the entry in the first place.

Thanks

Anton Eliasson

unread,
Nov 19, 2014, 3:48:32 PM11/19/14
to Rob Browning, bup-...@googlegroups.com
Hi again!
Sorry for my late response.

Den 2014-11-13 17:59, Rob Browning skrev:
> Anton Eliasson <de...@antoneliasson.se> writes:
>> Yes, it looks like it should. bup ls shows directories both before and
>> after /home/anton/.mixxx in alphabetical order and I can also restore
>> directories both before and after.
> So somehow there's a mismatch between what ended up in the .bupm and
> what ended up in the git tree for the dir.
>
> Just to double check, was this the entire content of the relevant dir:
>
> 08db033c943f7ad4fd81dad98e5d98f2.lock
> anton@hephaestus
> an...@hephaestus.lock
I think so. That's what bup ls says. I'm not sure what git-annex uses
the socket and the lock files for. Judging from my backups and current
home directory it appears git-annex deleted the socket and stopped
creating new ones around September 25 this year.

I found more weird stuff:
* Restoring only 08db033c943f7ad4fd81dad98e5d98f2.lock and
anton@hephaestus finishes without errors but both are empty normal files
and anton@hephaestus should be a socket. mtimes are restored.
* Restoring only an...@hephaestus.lock fails.
* bup cat-file --meta fails for all three files.
> If so, then it looks like the missing entry is at the end of the .bupm
> (i.e. there are no paths after it in the tree), and that suggests
> save-cmd.py's either dropping the .bupm entry for the final path from
> the pending metadata list for the dir in its "metalists" stack, or it's
> never adding the entry in the first place.
>
> Thanks

--
Best Regards,
Anton Eliasson

Rob Browning

unread,
Nov 19, 2014, 4:04:00 PM11/19/14
to Anton Eliasson, bup-...@googlegroups.com
Anton Eliasson <de...@antoneliasson.se> writes:

> Sorry for my late response.

No problem.

> I think so. That's what bup ls says. I'm not sure what git-annex uses
> the socket and the lock files for. Judging from my backups and current
> home directory it appears git-annex deleted the socket and stopped
> creating new ones around September 25 this year.
>
> I found more weird stuff:
> * Restoring only 08db033c943f7ad4fd81dad98e5d98f2.lock and
> anton@hephaestus finishes without errors but both are empty normal files
> and anton@hephaestus should be a socket. mtimes are restored.

If I read your earlier data correctly, that's not surprising since the
.bupm entries were (-rw-r--r--) for them. The next question is how did
a socket end up with a plain file's mode?

> * Restoring only an...@hephaestus.lock fails.
> * bup cat-file --meta fails for all three files.

Right, all of this sounds consistent with a truncated .bupm.

By the way -- did you already mention (assuming you know) what version
of bup save was used to generate this tree?

Anton Eliasson

unread,
Nov 19, 2014, 4:19:03 PM11/19/14
to Rob Browning, bup-...@googlegroups.com
Den 2014-11-19 22:03, Rob Browning skrev:
> Anton Eliasson <de...@antoneliasson.se> writes:
>
>> Sorry for my late response.
> No problem.
>
>> I think so. That's what bup ls says. I'm not sure what git-annex uses
>> the socket and the lock files for. Judging from my backups and current
>> home directory it appears git-annex deleted the socket and stopped
>> creating new ones around September 25 this year.
>>
>> I found more weird stuff:
>> * Restoring only 08db033c943f7ad4fd81dad98e5d98f2.lock and
>> anton@hephaestus finishes without errors but both are empty normal files
>> and anton@hephaestus should be a socket. mtimes are restored.
> If I read your earlier data correctly, that's not surprising since the
> .bupm entries were (-rw-r--r--) for them. The next question is how did
> a socket end up with a plain file's mode?
>
>> * Restoring only an...@hephaestus.lock fails.
>> * bup cat-file --meta fails for all three files.
> Right, all of this sounds consistent with a truncated .bupm.
>
> By the way -- did you already mention (assuming you know) what version
> of bup save was used to generate this tree?
>
> Thanks
The client likely used git tag 0.25-rc5 built from source. The remote
used the prebuilt 0.25-1 from Debian wheezy-backports.

Rob Browning

unread,
Nov 19, 2014, 4:28:44 PM11/19/14
to Anton Eliasson, bup-...@googlegroups.com
Anton Eliasson <de...@antoneliasson.se> writes:

> The client likely used git tag 0.25-rc5 built from source. The remote
> used the prebuilt 0.25-1 from Debian wheezy-backports.

OK, and I suppose if that dir's still lying around, it might be
interesting to see if you can reproduce the problem with a new repo via
BUP_DIR or "-d" (i.e. just index/save/restore that dir or something).

I'm guesing not, but it'd be worth a try (maybe we'll get lucky).

Nix

unread,
Nov 19, 2014, 5:55:16 PM11/19/14
to Rob Browning, Anton Eliasson, bup-...@googlegroups.com
On 13 Nov 2014, Rob Browning outgrape:

> Anton Eliasson <de...@antoneliasson.se> writes:
>
>> You mean taking a snapshot of the source filesystem and backing up the
>> snapshot instead of the live filesystem? Yes, that would probably be a
>> good idea, but it would require a filesystem like Btrfs or ZFS and I am
>> currently don't use any of them for my home directory.
>
> Or LVM (which is what I use), i.e. ext4 on LVM, etc.

Warning: LVM snapshots of the filesystem containing /etc/lvm, /sbin/lvm
or any libraries that it uses (generally this is /) are prone to
deadlocking. LVM attempts to avoid this, but cannot succeed in the
general case. The deadlock is fatal: all accesses to / fail forever
until reboot.

This is generally not what you want in a transparently-operating backup
system :)

--
NULL && (void)

Rob Browning

unread,
Nov 19, 2014, 5:58:58 PM11/19/14
to Nix, Anton Eliasson, bup-...@googlegroups.com
Nix <n...@esperi.org.uk> writes:

> Warning: LVM snapshots of the filesystem containing /etc/lvm, /sbin/lvm
> or any libraries that it uses (generally this is /) are prone to
> deadlocking. LVM attempts to avoid this, but cannot succeed in the
> general case. The deadlock is fatal: all accesses to / fail forever
> until reboot.
>
> This is generally not what you want in a transparently-operating backup
> system :)

Hmm, interesting -- as yet, no trouble on any machines here.

Nix

unread,
Nov 19, 2014, 6:16:37 PM11/19/14
to Rob Browning, Anton Eliasson, bup-...@googlegroups.com
On 19 Nov 2014, Rob Browning said:

> Nix <n...@esperi.org.uk> writes:
>
>> Warning: LVM snapshots of the filesystem containing /etc/lvm, /sbin/lvm
>> or any libraries that it uses (generally this is /) are prone to
>> deadlocking. LVM attempts to avoid this, but cannot succeed in the
>> general case. The deadlock is fatal: all accesses to / fail forever
>> until reboot.
>>
>> This is generally not what you want in a transparently-operating backup
>> system :)
>
> Hmm, interesting -- as yet, no trouble on any machines here.

Yeah, you need to be unlucky and have stuff swapped out at just the
wrong moment, so LVM tries to swap it in, or otherwise access the
suspended device -- and then you're dead.

Rare these days, perhaps.

--
NULL && (void)

Rob Browning

unread,
Nov 19, 2014, 8:12:19 PM11/19/14
to Nix, Anton Eliasson, bup-...@googlegroups.com
Nix <n...@esperi.org.uk> writes:

> Warning: LVM snapshots of the filesystem containing /etc/lvm, /sbin/lvm
> or any libraries that it uses (generally this is /) are prone to
> deadlocking. LVM attempts to avoid this, but cannot succeed in the
> general case. The deadlock is fatal: all accesses to / fail forever
> until reboot.

Where's that from, by the way?

Nix

unread,
Nov 20, 2014, 7:27:54 AM11/20/14
to Rob Browning, Anton Eliasson, bup-...@googlegroups.com
On 20 Nov 2014, Rob Browning told this:

> Nix <n...@esperi.org.uk> writes:
>
>> Warning: LVM snapshots of the filesystem containing /etc/lvm, /sbin/lvm
>> or any libraries that it uses (generally this is /) are prone to
>> deadlocking. LVM attempts to avoid this, but cannot succeed in the
>> general case. The deadlock is fatal: all accesses to / fail forever
>> until reboot.
>
> Where's that from, by the way?

I ran into it many years ago and then I asked Alasdair why it happened
over a drink. :)

Recovery is easy enough -- reboot and you're fine. Big-red-switch
reboot.

--
NULL && (void)

Anton Eliasson

unread,
Nov 20, 2014, 5:06:26 PM11/20/14
to Rob Browning, bup-...@googlegroups.com
Den 2014-11-19 22:28, Rob Browning skrev:
> Anton Eliasson <de...@antoneliasson.se> writes:
>
>> The client likely used git tag 0.25-rc5 built from source. The remote
>> used the prebuilt 0.25-1 from Debian wheezy-backports.
> OK, and I suppose if that dir's still lying around, it might be
> interesting to see if you can reproduce the problem with a new repo via
> BUP_DIR or "-d" (i.e. just index/save/restore that dir or something).
>
> I'm guesing not, but it'd be worth a try (maybe we'll get lucky).
>
> Thanks
Just tried it with the same build on another computer, and there were no
issues saving and then restoring. I don't think we'll ever find out why
this happened, but perhaps bup fsck could be improved to detect this
kind of trouble in the repo. Or bup restore could be made more robust so
that it skips files that are broken in the backup (with loud warnings of
course) and restore the rest.

It's not something I'm willing to spend time on however, and I can't
really say that it is an important issue either, since there was no data
loss that mattered in this case (the other backups in the repo seem to
be okay).

Rob Browning

unread,
Nov 20, 2014, 5:22:05 PM11/20/14
to Anton Eliasson, bup-...@googlegroups.com
Anton Eliasson <de...@antoneliasson.se> writes:

> Just tried it with the same build on another computer, and there were no
> issues saving and then restoring. I don't think we'll ever find out why
> this happened, but perhaps bup fsck could be improved to detect this
> kind of trouble in the repo. Or bup restore could be made more robust so
> that it skips files that are broken in the backup (with loud warnings of
> course) and restore the rest.
>
> It's not something I'm willing to spend time on however, and I can't
> really say that it is an important issue either, since there was no data
> loss that mattered in this case (the other backups in the repo seem to
> be okay).

Understood, and thanks for the help investigating.

Bjarki Ágúst Guðmundsson

unread,
Dec 7, 2020, 5:33:16 PM12/7/20
to bup-list
I apologize for the necroposting, but I just ran into the exact same error while doing a full restore of a backup from 2015. The interesting thing - and the reason I decided to post - is that my corrupted .bupm file was located in a directory called /home/bjarki/.ssh/sockets. Since Anton's error was also in a directory containing socket files, I think it's likely that socket files are somehow the cause of this. Perhaps something goes wrong when bup tries to back up an (active?) socket file?

Then again this backup was taken 5 years ago, and this may very well have been fixed in bup since then. In any case my fix was simply to --ignore-rx that directory during restore, since I didn't care about its contents.

Best,
Bjarki.

Johannes Berg

unread,
Dec 8, 2020, 3:06:03 AM12/8/20
to Bjarki Ágúst Guðmundsson, bup-list
On Mon, 2020-12-07 at 14:33 -0800, Bjarki Ágúst Guðmundsson wrote:
> I apologize for the necroposting, but I just ran into the exact same
> error while doing a full restore of a backup from 2015. The
> interesting thing - and the reason I decided to post - is that my
> corrupted .bupm file was located in a directory called
> /home/bjarki/.ssh/sockets. Since Anton's error was also in a directory
> containing socket files, I think it's likely that socket files are
> somehow the cause of this. Perhaps something goes wrong when bup tries
> to back up an (active?) socket file?
>
> Then again this backup was taken 5 years ago, and this may very well
> have been fixed in bup since then. In any case my fix was simply to --
> ignore-rx that directory during restore, since I didn't care about its
> contents.

Would you be willing to (privately) share the .bupm file?

johannes

Rob Browning

unread,
Dec 8, 2020, 7:11:39 PM12/8/20
to Johannes Berg, Bjarki Ágúst Guðmundsson, bup-list
Johannes Berg <joha...@sipsolutions.net> writes:

> Would you be willing to (privately) share the .bupm file?

Also worth noting that nothing in there is compressed so a

$ strings some.bupm

might help check that there's nothing sensitive in it -- only thing I
can imagine *might* be xattrs in some situations (or path names, but
those shouldn't be in a .bupm generated by save).

Bjarki Ágúst Guðmundsson

unread,
Dec 8, 2020, 8:36:46 PM12/8/20
to Rob Browning, Johannes Berg, bup-list
$ git ls-tree <hash1> home/bjarki/.ssh/sockets/
100644 blob <hash2>    home/bjarki/.ssh/sockets/.bupm
100644 blob <hash3>    home/bjarki/.ssh/sockets/bjarki@<hostname>
$ git cat-file blob <hash2>

The output of this command is attached.

Best,
Bjarki.
bupm

Johannes Berg

unread,
Dec 26, 2020, 5:27:05 PM12/26/20
to Bjarki Ágúst Guðmundsson, Rob Browning, bup-list
Hi Bjarki,

Sorry for the delay!

> The output of this command is attached.

This puzzled me for a while until I tested a similar scenario myself ...

Indeed your file, shown by 'bup meta', shows:

$ bup meta -t -f /tmp/bupm -vvvvq
path:
mode: 40755 (drwxr-xr-x)
rdev: 0
uid: 1000
gid: 1000
user: bjarki
group: bjarki
atime: 1435334308.790318547
mtime: 1435334482.156992389
ctime: 1435334482.156992389
linux-attr: 80000


Which seems ... fine? Except, look again! That's a _directory_ metadata,
it's executable, where a socket never would/should be.

Now, doing the same for a similar backup I made, with a single socket in
a folder, I see:


$ bup meta -t -f /tmp/bupm-repro -vvvvq
path:
mode: 40700 (drwx------)
rdev: 0
uid: 1000
gid: 1000
user: johannes
group: johannes
atime: 1609020690.002712175
mtime: 1608968089.597887290
ctime: 1608968089.597887290
linux-xattr: security.selinux -> unconfined_u:object_r:user_tmp_t:s0

path:
mode: 140600 (srw-------)
rdev: 0
uid: 1000
gid: 1000
user: johannes
group: johannes
atime: 1608968089.597887290
mtime: 1608968089.597887290
ctime: 1608968089.597887290
linux-xattr: security.selinux -> unconfined_u:object_r:user_tmp_t:s0



Aha! There's a second entry for the socket.


So I can only guess that the old version of bup was not storing metadata
for sockets correctly, and thus the entry there is simply _missing_,
leading to the error (which is exactly what we'd expect now that we know
the data is missing.)

I don't see anything obvious in the commit log since then, but I'm not
very familiar with bup's history.

Could be nice to try to detect this and *at least* print a nicer error
message, I guess?

Thinking out loud, it might make sense to consider these kinds of
metadata corruption cases more generally - I recently found and fixed a
few errors in this sense, see e.g. commit 47891d8951a9. Who knows, this
might even have been something that happened in your case - the socket
went away and its metadata couldn't be read, or something? After all,
the existence of a socket would probably be only stored from the index
into the backup, or so ...

Anyway, given that, it seems it might make sense to log an error along
the lines of

metadata for directory <name> is corrupted and was ignored

during restore but continue with the file contents, and add an option to
do this also during a special (very deep and expensive) fsck.

johannes

Reply all
Reply to author
Forward
0 new messages