bup save crashes on afpfs

29 views
Skip to first unread message

michelegu...@gmail.com

unread,
Nov 1, 2016, 12:11:02 PM11/1/16
to bup-list
I'm trying to backup a machine to a shared drive on an old time capsule.
I have previously been using CIFS/SMB to mount the drive but when the repository grew I started having the "too many open files" problem so I tried connecting with AFP instead.
So now I mount the share using afpfs-ng and fuse. That problem is now gone but it has been replaced by yet another crash.
This one only seems to happen on afpfs as setting BUP_DIR to a path in the local drive works fine.
Basically I index the directory I want to backup, run "bup save" and as soon as the progress reaches 100% bup crashes with this traceback:

Traceback (most recent call last):                           
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/cmd/bup-save", line 460, in <module>
    w.close()  # must close before we can update the ref
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/git.py", line 790, in close
    return self._end(run_midx=run_midx)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/git.py", line 768, in _end
    obj_list_sha = self._write_pack_idx_v2(self.filename + '.idx', idx, packbin)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/git.py", line 812, in _write_pack_idx_v2
    idx_map.close()
AttributeError: 'str' object has no attribute 'close'

My guess is that idx_map is the empty string, returned by this function:

def _mmap_do(f, sz, flags, prot, close):
   
if not sz:
        st
= os.fstat(f.fileno())
        sz
= st.st_size
   
if not sz:
       
# trying to open a zero-length map gives an error, but an empty
       
# string has all the same behaviour of a zero-length map, ie. it has
       
# no elements :)
       
return ''
    map
= mmap.mmap(f.fileno(), sz, flags, prot)
   
if close:
        f
.close()  # map will persist beyond file close
   
return map

So it seems the assumption wasn't entirely correct.



Rob Browning

unread,
Nov 4, 2016, 12:15:01 AM11/4/16
to michelegu...@gmail.com, bup-list
michelegu...@gmail.com writes:

> I'm trying to backup a machine to a shared drive on an old time capsule.
> I have previously been using CIFS/SMB to mount the drive but when the
> repository grew I started having the "too many open files" problem so I
> tried connecting with AFP instead.

Remind me, what's that problem, or rather is it bup's or somewhere else?

> So it seems the assumption wasn't entirely correct.

Agreed, though it might have been "correct" at the time. Bup originally
tended to just leave cleanup to the garbage collector (via object
finalizers), but we've been changing the code to handle resources like
files more explicitly, and I think that may be related to this crash.

However, if the crash indicates it tried to mmap a 0 length index, when
the index_len of the truncation above in the code couldn't have been
zero, that seems odd. I'm not sure how that could have happened without
truncate throwing an exception (say if the drive was full).

--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

michelegu...@gmail.com

unread,
Nov 4, 2016, 3:30:58 PM11/4/16
to bup-list, michelegu...@gmail.com
> Remind me, what's that problem, or rather is it bup's or somewhere else?
It's not bup's fault for sure; I think it's either a limitation in samba or one of its dependencies.
There are so many simultaneously open files samba seems to handle and after a few months
of backups the number of git objects probably reaches this limit.
What happens is bup crashes while reading or writing to the repository, rendering the backup
inaccessible (unless you copy all the data locally). This is the traceback:

fatal: failed to read object 10b110bd8bf13685dffd292e771b951eae641e35: Too many open files

Traceback (most recent call last):
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/cmd/bup-ls", line 12, in <module>
    ret = ls.do_ls(sys.argv[1:], top, default='/', spec_prefix='bup ')
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/ls.py", line 108, in do_ls
    n = pwd.try_resolve(path)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 280, in try_resolve
    n = self.lresolve(path)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 269, in lresolve
    return start._lresolve(parts)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 246, in _lresolve
    return self.sub(first)._lresolve(rest)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 246, in _lresolve
    return self.sub(first)._lresolve(rest)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 388, in _lresolve
    return self.dereference()._lresolve(parts)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 380, in dereference
    stay_inside_fs=True)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 269, in lresolve
    return start._lresolve(parts)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 244, in _lresolve
    return self.parent._lresolve(rest)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 246, in _lresolve
    return self.sub(first)._lresolve(rest)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 246, in _lresolve
    return self.sub(first)._lresolve(rest)
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 211, in sub
    self._mksubs()
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/vfs.py", line 487, in _mksubs
    for (date, commit) in revs:
  File "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/git.py", line 916, in rev_list
    raise GitError, 'git rev-list returned error %d' % rv
bup.git.GitError: git rev-list returned error 128

I say for sure because directly issuing git commands in BUP_DIR produces the same error, like:

git gc
fatal: failed to read object 10b110bd8bf13685dffd292e771b951eae641e35: Too many open files 
error: failed to run repack
 
Regarding the crash, note I was just a guessing: all I did was following the trace until I found something
that matched the error. But it's also possible afpfs it's doing something strange.
What should I do to test further?

Rob Browning

unread,
Nov 5, 2016, 3:56:32 PM11/5/16
to michelegu...@gmail.com, bup-list
michelegu...@gmail.com writes:

> I say for sure because directly issuing git commands in BUP_DIR produces
> the same error, like:
>
> git gc
>
> fatal: failed to read object 10b110bd8bf13685dffd292e771b951eae641e35: Too
>> many open files

Ahh, bup may be relying on git in the bup failure as well, but
regardless, I wonder if you might just have a ton of small files. What
do you see from:

find "$BUP_DIR" | wc -l
find "$BUP_DIR"/refs | wc -l
find "$BUP_DIR"/objects | wc -l

I think it's possible to accumulate a lot of small files, and maybe
that's what's causing the trouble when git (or bup) tries to
open/mmap/whatever them all.

Oh, and what does "ulimit -a" say about open files from the environment
where you're running bup (and do the other layers (afpfs or whatever)
have a similar parameter)?

> Regarding the crash, note I was just a guessing: all I did was following
> the trace until I found something
> that matched the error.

Well, whether or not it's directly relevant to your issue, you may have
found an issue we'll want to address.

Thanks

michelegu...@gmail.com

unread,
Nov 7, 2016, 6:10:21 PM11/7/16
to bup-list, michelegu...@gmail.com
>What do you see from:
>  find "$BUP_DIR" | wc -l
>  find "$BUP_DIR"/refs | wc -l
>  find "$BUP_DIR"/objects | wc -l

The counts are respectively 247, 3, 112.
Also `git count-objects` reports

warning: garbage found: ./objects/pack/bup.bloom
warning: garbage found: ./objects/pack/midx-69ff463fb7697a0a9c37649939ce2596293dcb42.midx
warning: garbage found: ./objects/pack/midx-fa5c129350a0eabeec5b9faae03a930c2ed15a21.midx
count: 0
size: 0 bytes
in-pack: 9391515
packs: 89
size-pack: 61.53 GiB
prune-packable: 0
garbage: 3
size-garbage: 230.85 MiB

 
and this is `ulimit -a`:

Maximum size of core files created                           (kB, -c) 0
Maximum size of a process’s data segment                     (kB, -d) unlimited
Maximum size of files created by the shell                   (kB, -f) unlimited
Maximum size that may be locked into memory                  (kB, -l) 64
Maximum resident set size                                    (kB, -m) unlimited
Maximum number of open file descriptors                          (-n) 1024
Maximum stack size                                           (kB, -s) 8192
Maximum amount of cpu time in seconds                   (seconds, -t) unlimited
Maximum number of processes available to a single user           (-u) 31205
Maximum amount of virtual memory available to the shell      (kB, -v) unlimited

 >and do the other layers (afpfs or whatever) have a similar parameter?
I don't know much about fuse but I think it should be 1024 as well.
I have once tried to raise it to the maximum allowed by my system (4096 IIRC) but it still crashed
at about the same time. Also it has not happened (yet) with afpfs so this is probably not the problem.

Rob Browning

unread,
Nov 10, 2016, 1:42:02 AM11/10/16
to michelegu...@gmail.com, bup-list
michelegu...@gmail.com writes:

> Traceback (most recent call last):
>> File
>> "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/cmd/bup-save",
>> line 460, in <module>
>> w.close() # must close before we can update the ref
>> File
>> "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/git.py",
>> line 790, in close
>> return self._end(run_midx=run_midx)
>> File
>> "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/git.py",
>> line 768, in _end
>> obj_list_sha = self._write_pack_idx_v2(self.filename + '.idx', idx,
>> packbin)
>> File
>> "/nix/store/20dz2927kl675zsfhrfc46qi4525chk1-bup-0.28.1/lib/bup/bup/git.py",
>> line 812, in _write_pack_idx_v2
>> idx_map.close()
>> AttributeError: 'str' object has no attribute 'close'

OK, so the file counts and limits you posted elsewhere don't seem likely
to be trouble.

If you wanted to investigate further, and you can reproduce the crash, I
suppose you could try adding some print statements and see what they
produce, e.g. perhaps a bit higher in the function something like:

print >> sys.stderr, 'index-len:', index_len

To make sure index_len isn't zero, though it's hard to see how it could
be given:

index_len = 8 + (4 * 256) + (28 * self.count) + (8 * ofs64_count)

And then maybe some statements in mmap_readwrite in git.py to see why/if
it really is returning a string in this situation.
Reply all
Reply to author
Forward
0 new messages