bup fsck -g fails due to existing files

145 views
Skip to first unread message

Karl-Philipp Richter

unread,
Aug 7, 2020, 3:52:33 AM8/7/20
to bup-list
Hi,
During `bup fsck -g` I see several messages in the form of

```
Could not create "/mnt/diskstation/backup/bup_backup/objects/pack/pack-c13f3f8800a857bf56c23afa7ec97f9c9bce3ba6.par2": File already exists.                                                                                                                              
pack-c13f3f8800a857bf56c23afa7ec97f9c9bce3ba6 par2 create: failed (6)
```

with different hashes. This then results in about 100 errors in the form of

```
error: inflate: data stream error (incorrect data check)                                                                                                                                                                                                                
fatal: pack has bad object at offset 534116960: inflate returned -3                                                                                                                                                                                                     
pack-cb5543a5bca733bfa874aef8d15eebae80b53d6c git verify: failed (1)
```

with different hashes as well.

Deleting the file .par2 file doesn't fix the issue.

Should I consider my backup damaged? And thus delete and recreate it? Or is there a way to fix this?

Best regards,
Kalle

Johannes Berg

unread,
Aug 10, 2020, 3:41:22 PM8/10/20
to Karl-Philipp Richter, bup-list
Hi,

> Hi,
> During `bup fsck -g` I see several messages in the form of
>
> ```
> Could not create "/mnt/diskstation/backup/bup_backup/objects/pack/pack-c13f3f8800a857bf56c23afa7ec97f9c9bce3ba6.par2": File already exists.
> pack-c13f3f8800a857bf56c23afa7ec97f9c9bce3ba6 par2 create: failed (6)
> ```
>
> with different hashes.

Hmm. That's weird, not sure why that happens. But it should be
reasonably harmless.

> This then results in about 100 errors in the form of
> ```
> error: inflate: data stream error (incorrect data check)
> fatal: pack has bad object at offset 534116960: inflate returned -3
> pack-cb5543a5bca733bfa874aef8d15eebae80b53d6c git verify: failed (1)
> ```
>
> with different hashes as well.

This is problematic.

> Deleting the file .par2 file doesn't fix the issue.

Obviously. I hope you didn't actually *delete* them but merely moved
them out of the way, because the whole idea of the par2 files is that
you can recover from errors like that in your backup.

> Should I consider my backup damaged? And thus delete and recreate it?
> Or is there a way to fix this?


If you still have the par2 files for the pack files in question, then
I'd probably start by doing something like

par2verify pack-cb5543a5bca733bfa874aef8d15eebae80b53d6c.pack.par2

(not sure about the exact filename) to see what par2 has to say about
the data there. If git says it's bad though chances are that it really
_is_ bad and you'd want to "par2repair" (or even "bup fsck --repair")
it. After making copies I guess :)


If you actually deleted the par2 files then I'm afraid you just deleted
the chance that you had to actually recover those errors. You could
figure out which objects were supposed to be at that location (from
git's "fatal: pack has bad object at offset 534116960" message) from the
corresponding idx, but then with that information you'd have to figure
out where exactly that object is used in the backup, etc. It's
_possible_ but not really easy, and I don't think there are currently
any tools for it. You could also try to re-hash all of your files to see
if an object with this sha1 shows up... again, no tools.


Unless you can repair it using par2 data, I'd definitely recommend you
stop backing up into this repository as bup will think the object(s)
that is (are) broken is (are) still there, and will not save it (them)
again. This might make even your current backups bad.


If you have enough space, keep this backup repo around and make a new
one. If you don't have enough space, you could make a new repo and move
(since you don't have space to copy) the packs that are still intact
according to git/par2 into it. These will just stick around as-is, and
if you really really need to restore some old files, you'd still be able
to do that, subject to whatever holes might be poked into them by the
bad objects.

Obviously, if you really don't care about the old backups after you've
made a new one (provided you had enough space) you can also just throw
the old repo away ...

johannes

Karl-Philipp Richter

unread,
Aug 12, 2020, 4:14:26 PM8/12/20
to bup-list
Hi johannes,
thanks for you input. Having a broken backup and the need to do a new are not good things. Now I can make a good decision whether to attempt rescue/repair (unlikely) or not.

-Kalle

Karl-Philipp Richter

unread,
Nov 6, 2021, 10:42:50 AM11/6/21
to bup-list
Hi,
After more than 1 year, I finally gave bup a try again. I created a new backup with 0.32 which worked great, however the initial index and save into an empty repository caused a dozen of the following messages during `bup fsck -g` started immediately after a successful save:

```
error: inflate: data stream error (incorrect data check)
fatal: Paket hat ein ungültiges Objekt bei Versatz 141031134: Dekomprimierung gab -3 zurück
b'pack-31167292d4f9f2719eff287202efcbd563e89ecd' git verify: failed (1)
```
as well as
```
fatal: sha1 file '/mnt/diskstation/backup/bup_backup/objects/pack/pack-c2eae9a526ed17c3ed9025c61d0690bad43234bb.idx' validation error
b'pack-c2eae9a526ed17c3ed9025c61d0690bad43234bb' git verify: failed (1)
```
I'm pretty sure the German messages is `fatal: pack has bad object at offset [...]: inflate returned -3`, but I can verify if someone considers the verification useful.

This is the result after saving into a fresh repository. Afaiu the corruption is already starting from the first backup. This is pretty alarming from my point of view. What are your insights on this?

I'm backing up approx. 2TB from a Ubuntu 21.10.The bup repository is located on a cifs mount with options `rw,relatime,vers=3.1.1,cache=loose,username=...,uid=...,noforceuid,gid=...,noforcegid,addr=192.168.[...],file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,mfsymlinks,fsc,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,user=...`. I obfuscated some information. I need to use `fsc` otherwise counting bloom gets slower and slower over time resulting in save taking several weeks for the 2TB rather than a day with cachefilesd. I doubt that it's a source of corruption.

I can start a new post, if you think it's a completely new issue. For me it's similar enough to continue here.

-Kalle

Rob Browning

unread,
Nov 7, 2021, 12:52:06 PM11/7/21
to Karl-Philipp Richter, bup-list
Karl-Philipp Richter <kric...@posteo.de> writes:

> This is the result after saving into a fresh repository. Afaiu the
> corruption is already starting from the first backup. This is pretty
> alarming from my point of view. What are your insights on this?

Agreed. Not good.

Just for reference, did you mention the bup version, i.e. 0.32 or
something else?

> I'm backing up approx. 2TB from a Ubuntu 21.10.The bup repository is
> located on a cifs mount with options
> `rw,relatime,vers=3.1.1,cache=loose,username=...,uid=...,noforceuid,gid=...,noforcegid,addr=192.168.[...],file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,mfsymlinks,fsc,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,user=...`.
> I obfuscated some information. I need to use `fsc` otherwise counting bloom
> gets slower and slower over time resulting in save taking several weeks for
> the 2TB rather than a day with cachefilesd.

What's fsc?

I don't know if it's possible, but one option if you have the hardware
might be to try the same backup to a local filesystem (perhaps a usb drive or
something), or to a remote say via ssh instead -- i.e. to try to isolate
some of the variables.

For what it's worth, I believe we've had various reports of trouble with
cifs mounts over the years:

https://groups.google.com/g/bup-list/search?q=cifs

For example, this thread suggested potential resource limit issues,
which might not be surprising given bup's current behavior, i.e. all the
mmapping, keeping "all" the files open during some operations, etc.:

https://groups.google.com/g/bup-list/c/HJmIxaE1yGw/m/0JX6utND4z8J

and there's this more recent message:

https://groups.google.com/g/bup-list/c/yh1xqzL6Znk/m/MDcg5r7PBAAJ

Another thread suggests that we might see better behavior if we more
aggressively close mmaps, and I actually have a patch set I've been
working on lately that I plan to post soon (and merge if it seems OK)
that overhauls all of our class instance clean up, i.e. it replaces
*all* reliance on __del__ with context management.

(And not directly relevant, but given the way bup currently works, I'd
expect operations over ssh like save -r, "bup get", and guessing also
rsync (where feasible) to be notably more effcient than any involving a
network filesystem most of the time -- i.e. given all the mmapping,
etc.)

Hope this helps, and if it is cifs, happy to try to help figure out an
alternate arrangement, if there's one that's feasible for you.

It might also be the case that allowing streaming operations as an
alternative to mmap would help here, and broaden our support, and I have
a tree here where I nearly finished that, but it's been a good while and
I haven't gotten back to it yet.

Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Nix

unread,
Nov 9, 2021, 12:56:05 PM11/9/21
to Rob Browning, Karl-Philipp Richter, bup-list
On 7 Nov 2021, Rob Browning outgrape:

> Karl-Philipp Richter <kric...@posteo.de> writes:
>> I'm backing up approx. 2TB from a Ubuntu 21.10.The bup repository is
>> located on a cifs mount with options
>> `rw,relatime,vers=3.1.1,cache=loose,username=...,uid=...,noforceuid,gid=...,noforcegid,addr=192.168.[...],file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,mfsymlinks,fsc,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,user=...`.
>> I obfuscated some information. I need to use `fsc` otherwise counting bloom
>> gets slower and slower over time resulting in save taking several weeks for
>> the 2TB rather than a day with cachefilesd.
>
> What's fsc?

It's an option specific to a few network filesystems (CIFS among them:
also NFS and Ceph) that uses fscache to cache remote files on local
storage. See Documentation/filesystems/caching/fscache.rst in a nearby
Linux kernel tree.

(fsc is generally not worth it unless you have a machine with relatively
limited RAM and a networked filesystem supporting leases, since
otherwise you either satisfy most requests from RAM anyway or have to
keep going back across the network to check if the content is
outdated -- but CIFS would be one where it make a major difference.)

> I don't know if it's possible, but one option if you have the hardware
> might be to try the same backup to a local filesystem (perhaps a usb drive or
> something), or to a remote say via ssh instead -- i.e. to try to isolate
> some of the variables.

In effect fsc is doing a large part of what using a local fs would do.
It's still worth a try though.

--
NULL && (void)

Karl-Philipp Richter

unread,
Nov 17, 2021, 5:59:43 AM11/17/21
to Rob Browning, bup-list

Hi,

Am 07.11.21 um 18:52 schrieb Rob Browning:
> Just for reference, did you mention the bup version, i.e. 0.32 or
> something else?

I'm experiencing this using bup 0.32.

> Hope this helps, and if it is cifs, happy to try to help figure out an
> alternate arrangement, if there's one that's feasible for you.
>
> It might also be the case that allowing streaming operations as an
> alternative to mmap would help here, and broaden our support, and I have
> a tree here where I nearly finished that, but it's been a good while and
> I haven't gotten back to it yet.

I'm looking forward to the streaming operations. I made another attempt
with nfs (with mount options async,hard,proto=tcp) and verified with
`bup fsck -g --jobs 8` - no errors.

I noticed that using filecached (by adding the mount option fsc) causes
the bup python3 process to be killed by the Linux OOM killer (I'm using
Ubuntu 21.10 with 5.13.0-21-generic). I didn't need fsc for nfs, though;
for cifs the performance was much lower without fsc and the backup of
2TB only feasible with fsc enabled.

I assume that the way I'm using cifs or let's say the way I think I have
to use cifs for adequate performance together with bup (with protocol
4.x and mount options
user=%s,rw,gid=1000,uid=1000,mfsymlinks,fsc,cache=loose) is wrong or buggy.

Thanks for you advise and support

-Kalle

Johannes Berg

unread,
Nov 28, 2021, 11:40:13 AM11/28/21
to Karl-Philipp Richter, bup-list
Hi,

> ```
> error: inflate: data stream error (incorrect data check)
> fatal: Paket hat ein ungültiges Objekt bei Versatz 141031134:
> Dekomprimierung gab -3 zurück
> b'pack-31167292d4f9f2719eff287202efcbd563e89ecd' git verify: failed
> (1)
> ```
> as well as
> ```
> fatal: sha1 file
> '/mnt/diskstation/backup/bup_backup/objects/pack/pack-
> c2eae9a526ed17c3ed9025c61d0690bad43234bb.idx' validation error
> b'pack-c2eae9a526ed17c3ed9025c61d0690bad43234bb' git verify: failed
> (1)
> ```
> I'm pretty sure the German messages is `fatal: pack has bad object at
> offset [...]: inflate returned -3`, but I can verify if someone
> considers the verification useful.
>
> This is the result after saving into a fresh repository. Afaiu the
> corruption is already starting from the first backup. This is pretty
> alarming from my point of view. What are your insights on this?

That looks pretty bad, agree.

> I'm backing up approx. 2TB from a Ubuntu 21.10.The bup repository is
> located on a cifs mount with options
> `rw,relatime,vers=3.1.1,cache=loose,username=...,uid=...,noforceuid,gi
> d=...,noforcegid,addr=192.168.[...],file_mode=0755,dir_mode=0755,soft,
> nounix,serverino,mapposix,mfsymlinks,fsc,rsize=4194304,wsize=4194304,b
> size=1048576,echo_interval=60,actimeo=1,user=...`.

Does anything change if you use cache=strict?

I see reports on the web about some antivirus or something causing file
corruption on the server, anything like that running on the server?

Also, what are you running as the server in the first place?

> I obfuscated some information. I need to use `fsc` otherwise counting
> bloom gets slower and slower over time resulting in save taking
> several weeks for the 2TB rather than a day with cachefilesd. I doubt
> that it's a source of corruption.

According to the man page that's only used for read-only files, so I'd
agree it's not the source of the corruption.

The pack writing code doesn't even do anything really strange (like e.g.
mixing mmap and read/write), it just streams to a file, rewinds to
calculate the sha1, and then rewinds again to write the result into the
header ...

But offset 141031134 isn't even near the beginning or end of the file,
it's just in some random place? And that object is corrupt in itself,
not just the sha1 of the entire file.

If you just take an arbitrary ~1GiB file and copy it to the mounted
share, does it also experience corruption? I'd hope not, but just
wondering what bup could be doing that's special in the middle of the
file - the beginning and end _are_ treated specially, but not the
middle?

johannes
Reply all
Reply to author
Forward
0 new messages