Bup failing on remote save (stack trace)

32 views
Skip to first unread message

Norman Ramsey

unread,
Jun 15, 2021, 10:15:53 AM6/15/21
to bup-list
A bup command that had been working now fails.  It's a pretty hard fail with a stack trace. I'm wondering if using `bup prune-older` could have broken something.  In any case, I'm wondering if the current bup directory can be saved---my `.bup` is 13GB and creating a new one is time-consuming.

I'd love any help with diagnosis of the fault, repair of the broken repo, or a workaround for any of the issues.





Both local and remote bup should be version 0.32, but for some reason the local `bup --version` reports as "0.32+".  The latest commit is 3b052c63c93a95193447945296e7ec5db97f6b6e so I'm not sure what's up there.

The current failure is shown by running a shell script with -x, which shows the commands used, as follows (can't figure out how to attach a file in google groups):

--------------------------------------------------
+ HOST=homedog
+ PATH=/usr/local/bin:/usr/bin:/home/nr/.opam/default/bin:/home/nr/cs/105/server/sbin:/home/nr/cs/105/server/bin:/home/nr/cs/105/server/courseware/bin:/home/nr/bin:/home/nr/machine/amd64-linux/bin:/home/nr/.cabal/bin:/home/nr/bin:/home/nr/machine/amd64-linux/bin:/usr/bin/mh:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/nr/net/pclu/exe:/home/nr/.local/bin:/sbin:/usr/sbin:/usr/share/java:/usr/local/racket/bin:/etc:/usr/local/etc:/usr/bin/X11:/usr/local/sbin:/home/nr/cs/105/server/BUILD-Linux-x86_64:.:/home/nr/bin/tests:/usr/share/java:/home/nr/cs/150vm/bin
+ export PATH
+ BUP_DIR=/home/nr/.bup
+ export BUP_DIR
+ bup index /home/nr/Contains-Local-Mail
+ bup save '--strip-path=/home/nr' -n mail-backup-homedog -r homework: -q /home/nr/Contains-Local-Mail/Mail
Traceback (most recent call last):
  File "/usr/local/lib/bup/cmd/../bup/metadata.py", line 843, in read
    tag = vint.read_vuint(port)
  File "/usr/local/lib/bup/cmd/../bup/vint.py", line 38, in read_vuint
    raise EOFError('encountered EOF while reading vuint')
EOFError: encountered EOF while reading vuint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/bup/cmd/bup-save", line 412, in <module>
    meta = msr.metadata_at(ent.meta_ofs)
  File "/usr/local/lib/bup/cmd/../bup/index.py", line 66, in metadata_at
    return metadata.Metadata.read(self._file)
  File "/usr/local/lib/bup/cmd/../bup/metadata.py", line 845, in read
    raise Exception("EOF while reading Metadata")
Exception: EOF while reading Metadata
Traceback (most recent call last):
  File "/usr/sup/bup-0.32/lib/bup/cmd/bup-server", line 320, in <module>
    cmd(conn, rest)
  File "/usr/sup/bup-0.32/lib/bup/cmd/bup-server", line 112, in receive_objects_v2
    raise Exception('object read: expected length header, got EOF\n')
Exception: object read: expected length header, got EOF
-----------------------------------------------------




An earlier failure that might be diagnostic occurs in a cron job:

----------------------------------------------------------
+ hostname
+ HOST=homedog
+ PATH=/usr/local/bin:/usr/bin:/usr/bin:/bin
+ export PATH
+ BUP_DIR=/home/nr/.bup
+ export BUP_DIR
+ bup index /home/nr/Contains-Local-Mail
+ bup save '--strip-path=/home/nr' -n mail-backup-homedog -r homework: -q /home/nr/Contains-Local-Mail/Mail
warning: index pack-2ba0f704ac646aa3391b93a94da7706f0bb59387.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-66f7296c188647ed0b4d2cfef1b35fa4c7563bab.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-dfa4982ba98c624e1e200ff30f6b657222e2253c.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-0c6637b2e67448629867f04800fe74c8744ef46d.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-d85cd431215f8aa995d9f7d2da7dfa712bc84faf.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-730da91bd130e169543db514ea077c5558c95eea.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-fef781531bcf85a29f85a0f38d7ea9a30ab33343.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-356a9ffcaef7d44d8e322a5bd4627135f5b49632.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-7328cdd438ed337ffc92dfe72410390b4130124a.idx missing
  used by midx-77c0e830765410d5224ac543e8807965d6c6d236.midx
warning: index pack-28e65507ba7ac0c162fb08569cff2ff4038812dd.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-7086c0851248a012e094906de6a32396bbf1c07c.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-cd8fbeacda1575e506bbcd7388413c4e43210a8d.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-2ba0f704ac646aa3391b93a94da7706f0bb59387.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-66f7296c188647ed0b4d2cfef1b35fa4c7563bab.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-dfa4982ba98c624e1e200ff30f6b657222e2253c.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-0c6637b2e67448629867f04800fe74c8744ef46d.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-d85cd431215f8aa995d9f7d2da7dfa712bc84faf.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-730da91bd130e169543db514ea077c5558c95eea.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-fef781531bcf85a29f85a0f38d7ea9a30ab33343.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-356a9ffcaef7d44d8e322a5bd4627135f5b49632.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
warning: index pack-7328cdd438ed337ffc92dfe72410390b4130124a.idx missing
  used by midx-7258e564317bf52286360728aacc866a96c4519d.midx
Traceback (most recent call last):
  File "/usr/local/lib/bup/cmd/../bup/metadata.py", line 843, in read
    tag = vint.read_vuint(port)
  File "/usr/local/lib/bup/cmd/../bup/vint.py", line 38, in read_vuint
    raise EOFError('encountered EOF while reading vuint')
EOFError: encountered EOF while reading vuint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/bup/cmd/bup-save", line 412, in <module>
    meta = msr.metadata_at(ent.meta_ofs)
  File "/usr/local/lib/bup/cmd/../bup/index.py", line 66, in metadata_at
    return metadata.Metadata.read(self._file)
  File "/usr/local/lib/bup/cmd/../bup/metadata.py", line 845, in read
    raise Exception("EOF while reading Metadata")
Exception: EOF while reading Metadata
Traceback (most recent call last):
  File "/usr/sup/bup-0.32/lib/bup/cmd/bup-server", line 320, in <module>
    cmd(conn, rest)
  File "/usr/sup/bup-0.32/lib/bup/cmd/bup-server", line 112, in receive_objects_v2
    raise Exception('object read: expected length header, got EOF\n')
Exception: object read: expected length header, got EOF
---------------------------------------------------------------------------------------------------

Johannes Berg

unread,
Jun 16, 2021, 3:14:34 AM6/16/21
to Norman Ramsey, bup-list
On Tue, 2021-06-15 at 07:15 -0700, Norman Ramsey wrote:
> A bup command that had been working now fails.  It's a pretty hard
> fail with a stack trace. I'm wondering if using `bup prune-older`
> could have broken something.  In any case, I'm wondering if the
> current bup directory can be saved---my `.bup` is 13GB and creating a
> new one is time-consuming.
>
> I'd love any help with diagnosis of the fault, repair of the broken
> repo, or a workaround for any of the issues.

Just FYI, we're thinking about this issue. Don't delete anything yet :)

It's not clear to me after looking a few minutes, so I need to look at
it with a bit more time.

johannes

Rob Browning

unread,
Jun 16, 2021, 8:09:07 PM6/16/21
to Johannes Berg, Norman Ramsey, bup-list
Johannes Berg <joha...@sipsolutions.net> writes:

> Just FYI, we're thinking about this issue. Don't delete anything yet :)
>
> It's not clear to me after looking a few minutes, so I need to look at
> it with a bit more time.

Agreed, and I suspect that your repo will turn out to be salvagable one
way or another, but won't know until we can diagnose the problem a bit
better.

If you have the space, I'd suggest making a copy of the repo.

And I wonder if there's a chance this is an issue with the cache. I
think we might have seen that before, and if so, then you may be able to
"bup index --clear" and "fix" the problem. Doing that shouldn't hurt
anything, but will cause the next index run to take a lot longer.

--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Norman Ramsey

unread,
Jun 17, 2021, 4:56:50 PM6/17/21
to bup-list
Over the last two days, the problem has resolved.  I have `bup index`, `bup save`, and `bup prune-older` running in cron jobs, so perhaps one of these contributed.

I will keep an eye on things, and the moment the issue recurs, I will shut down the cron jobs and make copies of the (local) index and (remote) repo.

Norman Ramsey

unread,
Jun 30, 2021, 1:03:29 PM6/30/21
to bup-list
A heads-up: this issue is back.  I have taken snapshots of the BUP_DIR on both the local machine (for `index`) and the remote (for `save`).  I await advice.

Johannes Berg

unread,
Jul 1, 2021, 3:48:39 PM7/1/21
to Norman Ramsey, bup-list
On Wed, 2021-06-30 at 10:03 -0700, Norman Ramsey wrote:
> A heads-up: this issue is back.  I have taken snapshots of the BUP_DIR
> on both the local machine (for `index`) and the remote (for `save`).  I
> await advice.


Well, it does look like your index is corrupt, somehow.

In the stack trace we got (which is off by two - do you have local any
modifications that would also explain the "+" on the version number?),
we see

File "/usr/local/lib/bup/cmd/bup-save", line 412, in <module>
meta = msr.metadata_at(ent.meta_ofs)
File "/usr/local/lib/bup/cmd/../bup/index.py", line 66, in metadata_at
return metadata.Metadata.read(self._file)
File "/usr/local/lib/bup/cmd/../bup/metadata.py", line 845, in read
raise Exception("EOF while reading Metadata")


The code in bup-save is this (note the last line - quoted above - is
actually line 414 in 0.32, so not sure if you have changes in your
package or so):

# it's not a directory
if hashvalid:
id = ent.sha
git_name = git.mangle_name(file, ent.mode, ent.gitmode)
git_info = (ent.gitmode, git_name, id)
shalists[-1].append(git_info)
sort_key = git.shalist_item_sort_key((ent.mode, file, id))
meta = msr.metadata_at(ent.meta_ofs)

msr here is:

msr = index.MetaStoreReader(indexfile + b'.meta')

so either your "bupindex.meta" or "bupindex" files are corrupt
(bupindex.meta could be bad directly, or bupindex could point to a bad
"meta_ofs").


If you have a lot of time on your hands for the next backup, you could
just delete your entire index ("bup index --clear"). This would force a
re-hash/re-validation of *all* your files though, i.e. read all files
from disk again. It obviously won't store them again, but it's still
expensive if you have a decently sized dataset.

You could also do "bup index --check", but I don't think it actually
checks metadata. That could be implemented though, without that much
trouble, and I guess in theory we could even have a "--repair-meta" mode
that checks metadata, and if it's bad it would throw away that
particular index entry only. However, chances are some other things are
also corrupted, so clearing the index and spending the time on the next
round is probably wise anyway.

However, the bigger question is how this happened, but I really don't
know.

johannes

Rob Browning

unread,
Jul 3, 2021, 2:30:54 PM7/3/21
to Johannes Berg, Norman Ramsey, bup-list
Johannes Berg <joha...@sipsolutions.net> writes:

> You could also do "bup index --check", but I don't think it actually
> checks metadata. That could be implemented though, without that much
> trouble, and I guess in theory we could even have a "--repair-meta" mode
> that checks metadata, and if it's bad it would throw away that
> particular index entry only. However, chances are some other things are
> also corrupted, so clearing the index and spending the time on the next
> round is probably wise anyway.
>
> However, the bigger question is how this happened, but I really don't
> know.

I suspect there are any number of possibilities at the moment, since the
index itself is mutable (and not transactional in any way, afaik; it
just changes via direct mmapped writes), and then we added the
completely independent metadata cache. So particularly if index or save
crashes, I assume we have the potential for trouble.

I have some unfinished work here that would propose switching the index
to sqlite to avoid the problem entirely, but haven't gotten back to it
in a while.

For now, agreed. If you can live with the cost of traversing all the
data again, "index --clear" should side-step the problem.

Davide Guerri

unread,
Aug 8, 2021, 3:31:45 PM8/8/21
to bup-list
FWIW, I ran into this a few days ago. 
Glad I found this thread as `index --clear` worked for me.

Rob Browning

unread,
Aug 11, 2021, 8:37:33 PM8/11/21
to Davide Guerri, bup-list
Davide Guerri <davide...@gmail.com> writes:

> FWIW, I ran into this a few days ago.
> Glad I found this thread as `index --clear` worked for me.

That's good, at least, and for what it's worth, fixing this is on the
list, just not near the top yet.
Reply all
Reply to author
Forward
0 new messages