lib.bup.vfs.Node.fullname assertion error

150 views
Skip to first unread message

Dickon Reed

unread,
Mar 26, 2011, 3:48:00 AM3/26/11
to bup-...@googlegroups.com
I was doing a "bup restore" (with bup 1fe7783) and hit the assertion in vfs.Node.fullname, which I have pasted below. What I think is happening (and I'm new to the code base so don't believe me) is that the top level directory node in my restore has the same name as a directory node 8 levels deep, so the __cmp__ method invoked by the assertion correctly indicate that names match. The comment "would be the empty string; too weird" is confusing.

I'm experimenting but I thought I would mention this here in case anyone wants to explain or I get distracted before I have solved this.

 def fullname(self, stop_at=None):
        """Get this file's full path."""
        assert(self != stop_at) # would be the empty string; too weird
        if self.parent and self.parent != stop_at:
            return os.path.join(self.parent.fullname(stop_at=stop_at),
                                self.name)
        else:
            return self.name




Gabriel Filion

unread,
Mar 27, 2011, 12:13:36 AM3/27/11
to bup-...@googlegroups.com, Dickon Reed
Hi,

On 11-03-26 03:48 AM, Dickon Reed wrote:
> I was doing a "bup restore" (with bup 1fe7783) and hit the assertion in
> vfs.Node.fullname, which I have pasted below. What I think is happening
> (and I'm new to the code base so don't believe me) is that the top level
> directory node in my restore has the same name as a directory node 8
> levels deep, so the __cmp__ method invoked by the assertion correctly
> indicate that names match.

Hmm, if this is true then it's kinda bad.. can you easily replicate this
in a simpler repository?


> I'm experimenting but I thought I would mention this here in case anyone
> wants to explain or I get distracted before I have solved this.
>
> def fullname(self, stop_at=None):
> """Get this file's full path."""
> assert(self != stop_at) # would be the empty string; too weird
> if self.parent and self.parent != stop_at:
> return os.path.join(self.parent.fullname(stop_at=stop_at),
> self.name)
> else:
> return self.name


It'd help if you pasted bup's output in the e-mail [and maybe while
using bup's debug mode, too]. This way, we could have more clue about
what bup is actually doing to reach this point.

--
Gabriel Filion

Dickon Reed

unread,
Mar 27, 2011, 4:08:22 PM3/27/11
to bup-...@googlegroups.com, Dickon Reed, Gabriel Filion
I can trigger the assertion error thus: 

mkdir -p /tmp/buptest/tmp
bup index /tmp/buptest
bup save -n cycle-check /tmp/buptest
bup restore -C buprestore.tmp cycle-check/latest/tmp/buptest

This gives me:

Traceback (most recent call last): File "/usr/lib/bup/cmd/bup-restore", line 90, in <module> do_node(n.parent, n) File "/usr/lib/bup/cmd/bup-restore", line 54, in do_node do_node(top, sub) File "/usr/lib/bup/cmd/bup-restore", line 35, in do_node fullname = n.fullname(stop_at=top) File "/usr/lib/bup/bup/vfs.py", line 183, in fullname assert(self != stop_at) # would be the empty string; too weird AssertionError


Dickon

Gabriel Filion

unread,
Mar 27, 2011, 11:29:42 PM3/27/11
to bup-...@googlegroups.com, Dickon Reed
On 11-03-27 04:08 PM, Dickon Reed wrote:
> I can trigger the assertion error thus:
>
> mkdir -p /tmp/buptest/tmp
> bup index /tmp/buptest
> bup save -n cycle-check /tmp/buptest
> bup restore -C buprestore.tmp cycle-check/latest/tmp/buptest

I was able to reproduce this in /tmp and in another arbitrary top level
directory, but not anywhere else further down the filesystem hierarchy..

> This gives me:
>
> Traceback (most recent call last): File "/usr/lib/bup/cmd/bup-restore",
> line 90, in <module> do_node(n.parent, n) File
> "/usr/lib/bup/cmd/bup-restore", line 54, in do_node do_node(top, sub)
> File "/usr/lib/bup/cmd/bup-restore", line 35, in do_node fullname =
> n.fullname(stop_at=top) File "/usr/lib/bup/bup/vfs.py", line 183, in
> fullname assert(self != stop_at) # would be the empty string; too weird
> AssertionError

somehow, in this assertion, self.name (the 'name' argument for
/tmp/buptest/tmp's node) is the same as stop_at.name (the 'name'
argument for /tmp's node) so this makes those nodes look like they're
equal in Node.__cmp__'s view even though they represent different files
on the filesystem.

--
Gabriel Filion

Avery Pennarun

unread,
May 8, 2011, 3:21:03 AM5/8/11
to Gabriel Filion, bup-...@googlegroups.com, Dickon Reed
On Sun, Mar 27, 2011 at 11:29 PM, Gabriel Filion <lel...@gmail.com> wrote:
> somehow, in this assertion, self.name (the 'name' argument for
> /tmp/buptest/tmp's node) is the same as stop_at.name (the 'name'
> argument for /tmp's node) so this makes those nodes look like they're
> equal in Node.__cmp__'s view even though they represent different files
> on the filesystem.

Sorry, I'm behind on my bup mails and just saw this one now. It does
sound serious... does someone want to try their hand at a patch?

Thanks,

Avery

Gabriel Filion

unread,
May 8, 2011, 5:25:25 PM5/8/11
to Avery Pennarun, bup-...@googlegroups.com, Dickon Reed

I'm still kinda perplexed by what's actually causing this.

Do you have any suggestion of how we could go further down the debugging
road so that we identify the problem exactly?

--
Gabriel Filion

Avery Pennarun

unread,
May 8, 2011, 8:02:29 PM5/8/11
to Gabriel Filion, bup-...@googlegroups.com, Dickon Reed
On Sun, May 8, 2011 at 5:25 PM, Gabriel Filion <lel...@gmail.com> wrote:
> I'm still kinda perplexed by what's actually causing this.
>
> Do you have any suggestion of how we could go further down the debugging
> road so that we identify the problem exactly?

I admit that I haven't read the context very carefully (and I don't
really have time). But if you have a reproducible test case, that's
about 95% of the really hard part :)

I'd think you just need to add a few print and/or assert() commands in
various places to narrow down exactly what isn't happening where you
expect it to happen... famous last words, of course, but there really
aren't that many layers in bup, so the problem has to be relatively
shallow.

Have fun,

Avery

Shane Geiger

unread,
May 8, 2011, 11:16:28 PM5/8/11
to bup-...@googlegroups.com
I saw a request on the list for adding an option to turn off
encryption. In my enthusiasm for bup, I jumped at it without knowing
much of anything about the code.

I'm a little stuck and too tired to continue tonight. Here's a start.
It fails to run now, but I suspect there only needs to be one or two
more lines of code to go.


sg:bup sg$ bash encryption_test.sh
(then all looks good until...

bloom: adding 1 file (1 object).
error: failed to validate delta base reference at offset 33 from
/Users/sg/.bup/objects/pack/pack-87243f038756bb42c70107ac50f5493b32ad521a.pack
error: failed to read object cc9a6ca559a5980e83376b8e47cfd7fcb23d11e4 at
offset 12 from
/Users/sg/.bup/objects/pack/pack-87243f038756bb42c70107ac50f5493b32ad521a.pack
fatal: object cc9a6ca559a5980e83376b8e47cfd7fcb23d11e4 is corrupted


Traceback (most recent call last):

File "./cmd/bup-split", line 176, in <module>
git.update_ref(refname, commit, oldref)
File "/Users/sg/bup/lib/bup/git.py", line 793, in update_ref
_git_wait('git update-ref', p)
File "/Users/sg/bup/lib/bup/git.py", line 881, in _git_wait
raise GitError('%s returned %d' % (cmd, rv))
bup.git.GitError: git update-ref returned 128


From 2deb1dc1cd74c1c298da5cd84af344f734048a4e Mon Sep 17 00:00:00 2001
From: Shane Geiger <shane....@gmail.com>
Date: Sun, 8 May 2011 23:14:28 -0400
Subject: [PATCH] no encryption option (mostly done)

---
cmd/split-cmd.py | 9 +++++----
encryption_test.sh | 2 ++
lib/bup/git.py | 5 ++++-
3 files changed, 11 insertions(+), 5 deletions(-)
create mode 100644 encryption_test.sh

diff --git a/cmd/split-cmd.py b/cmd/split-cmd.py
index b243c61..3ca3a3e 100755
--- a/cmd/split-cmd.py
+++ b/cmd/split-cmd.py
@@ -3,9 +3,8 @@ import sys, time
from bup import hashsplit, git, options, client
from bup.helpers import *

-
optspec = """
-bup split <-t|-c|-b|-n name|--copy|--noop> [--bench] [filenames...]
+bup split <-t|-c|-b|-n name|--copy|-0|--noop> [--bench] [filenames...]
--
Modes:
b,blobs output a series of blob ids
@@ -14,6 +13,7 @@ c,commit output a commit id
n,name= save the result under the given name
noop split the input, but throw away the result
copy split the input, copy it to stdout, don't save to repo
+0,noenc no encryption
Options:
r,remote= remote repository path
d,date= date for the commit (seconds since the epoch)
@@ -34,7 +34,7 @@ handle_ctrl_c()
git.check_repo_or_die()
if not (opt.blobs or opt.tree or opt.commit or opt.name or
opt.noop or opt.copy):
- o.fatal("use one or more of -b, -t, -c, -n, -N, --copy")
+ o.fatal("use one or more of -b, -t, -c, -n, -N, --copy, -0")
if (opt.noop or opt.copy) and (opt.blobs or opt.tree or
opt.commit or opt.name):
o.fatal('-N and --copy are incompatible with -b, -t, -c, -n')
@@ -60,7 +60,8 @@ if opt.date:
date = parse_date_or_fatal(opt.date, o.fatal)
else:
date = time.time()
-
+if opt.noenc:
+ git.encryption = False

total_bytes = 0
def prog(filenum, nbytes):
diff --git a/encryption_test.sh b/encryption_test.sh
new file mode 100644
index 0000000..ec36331
--- /dev/null
+++ b/encryption_test.sh
@@ -0,0 +1,2 @@
+cd ~/bup
+tar -cvf - /Users/sg/sysadmin/ | ./main.py split -n local-etc -vv -0
diff --git a/lib/bup/git.py b/lib/bup/git.py
index 233a2ee..b1681a2 100644
--- a/lib/bup/git.py
+++ b/lib/bup/git.py
@@ -21,6 +21,7 @@ _typermap = { 3:'blob', 2:'tree', 1:'commit', 4:'tag' }
_total_searches = 0
_total_steps = 0

+encryption = True

class GitError(Exception):
pass
@@ -540,7 +541,9 @@ class PackWriter:
log('>')
if not sha:
sha = calc_hash(type, content)
- size, crc = self._raw_write(_encode_packobj(type, content),
sha=sha)
+ if encryption:
+ content = _encode_packobj(type, content)
+ size, crc = self._raw_write(content, sha=sha)
if self.outbytes >= max_pack_size or self.count >=
max_pack_objects:
self.breakpoint()
return sha
--
1.6.5.2

--
Shane Geiger
shane....@gmail.com
Phone: 347-237-7108

"Principles for the development of a complete mind:
Study the science of art.
Study the art of science.
Develop your senses--especially learn how to see.
Realize how everything connects to everything else."
-- Leonardo DaVinci

Zoran Zaric

unread,
May 9, 2011, 3:46:01 AM5/9/11
to Shane Geiger, bup-...@googlegroups.com
I have two problems with this patch.
1) Your mail client seems to have accidentally broken two lines

> - size, crc = self._raw_write(_encode_packobj(type, content),
> sha=sha)

should be

> - size, crc = self._raw_write(_encode_packobj(type, content), sha=sha)

and

> if self.outbytes >= max_pack_size or self.count >=
> max_pack_objects:

should be

> if self.outbytes >= max_pack_size or self.count >= max_pack_objects:

2) The patch doesn't apply to the current master
(ed7e33fee0c5ba429272e862071e0bb2a54e0ca2). Could you please tell, which
commit your patch is based on?

I have to admit this is the first time I try to apply a patch from an email, so
the problem might be on my side.

So to be sure I didn't break anything (and if so correct my fault) here my
steps:

- copy mail from mailbox to bup source root

- create new branch

- fix broken lines

- `git apply email.patch`
error: patch failed: cmd/split-cmd.py:3
error: cmd/split-cmd.py: patch does not apply
error: patch failed: lib/bup/git.py:21
error: lib/bup/git.py: patch does not apply

Thanks,
Zoran

Zoran Zaric

unread,
May 9, 2011, 6:11:18 AM5/9/11
to Shane Geiger, bup-...@googlegroups.com
On Sun, May 08, 2011 at 11:16:28PM -0400, Shane Geiger wrote:
> I saw a request on the list for adding an option to turn off
> encryption. In my enthusiasm for bup, I jumped at it without
> knowing much of anything about the code.

Thanks for talking the step to contribute to bup!

I took a look at your patch.

You try to completely bypass compression. IIRC this wouldn't produce valid
packfiles [1].

What we want to do instead is to change the compression level. Python's zlib
library does support this [2]. Currently we set this to 1 [3]. I think the
correct way to implement this feature is to make the compression level a
parameter. I implemented this quickly [4], the UI changes are missing here. To
use the new parameter set it when instantiating a PackWriter object:

w = git.PackWrtier()

could become

w = git.PackWrtier(compression_level=0)

The options -0 to -9 would be greatly appreciated in both split and save. The
long version of your -0 option ("noenc") is misleading. I'd call it
"no-compression".

Feel free to use my commit or just use the changes.

Zoran

[1] http://book.git-scm.com/7_the_packfile.html
[2] http://docs.python.org/library/zlib.html#zlib.compressobj
[3] https://github.com/apenwarr/bup/blob/master/lib/bup/git.py#L180
[4] https://github.com/zoranzaric/bup/commit/74bb8b4df2505eb92bb83c0cb6e55ee87c20a5b2

Tony Godshall

unread,
May 9, 2011, 12:17:50 PM5/9/11
to Shane Geiger, bup-...@googlegroups.com
s/encryption/compression/g

--
Best Regards.
This is unedited.
P-)

Gabriele

unread,
May 16, 2011, 11:14:04 AM5/16/11
to bup-list
On 28 Mar, 05:29, Gabriel Filion <lelu...@gmail.com> wrote:

> > I can trigger the assertion error thus:
>
> > mkdir -p /tmp/buptest/tmp
> > bup index /tmp/buptest
> > bup save -n cycle-check /tmp/buptest
> > bup restore -C buprestore.tmp cycle-check/latest/tmp/buptest
>
> I was able to reproduce this in /tmp and in another arbitrary top level
> directory, but not anywhere else further down the filesystem hierarchy..

I don't know if this is going to be of any help, but I was playing
with BUP today because I want to move one backup I have (done using
GIT directly) to BUP (GIT is having troubles there). Everything went
well until I tried a restore:

giesse@imac /home/slow/giesse $ bup restore /luca/latest/home/luca
Traceback (most recent call last):
File "/home/slow/giesse/bup/cmd/bup-restore", line 90, in <module>
do_node(n.parent, n)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 35, in do_node
fullname = n.fullname(stop_at=top)
File "/home/slow/giesse/bup/lib/bup/vfs.py", line 183, in fullname
assert(self != stop_at) # would be the empty string; too weird
AssertionError

Running with -vv gives:

[...]
luca/.gconf/apps/nautilus/desktop-metadata/
Il@32@mio@32@Disco@46@volume/
luca/.gconf/apps/nautilus/desktop-metadata/
Il@32@mio@32@Disco@46@volume/%gconf.xml
luca/.gconf/apps/nautilus/desktop-metadata/LUCA@46@volume/
luca/.gconf/apps/nautilus/desktop-metadata/LUCA@46@volume/%gconf.xml
luca/.gconf/apps/nautilus/desktop-metadata/computer/
luca/.gconf/apps/nautilus/desktop-metadata/computer/%gconf.xml
luca/.gconf/apps/nautilus/desktop-metadata/directory/
luca/.gconf/apps/nautilus/desktop-metadata/directory/%gconf.xml
Traceback (most recent call last):
File "/home/slow/giesse/bup/cmd/bup-restore", line 90, in <module>
do_node(n.parent, n)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 54, in do_node
do_node(top, sub)
File "/home/slow/giesse/bup/cmd/bup-restore", line 35, in do_node
fullname = n.fullname(stop_at=top)
File "/home/slow/giesse/bup/lib/bup/vfs.py", line 183, in fullname
assert(self != stop_at) # would be the empty string; too weird
AssertionError

bup fuse seems to work normally, but I haven't tried copying all the
files from it. The repo is 13G. I'm a programmer, though I've never
programmed in Python, so I'm open to doing whatever test, you just may
have to be patient with me.

Gabriele

unread,
May 16, 2011, 11:17:12 AM5/16/11
to bup-list
On 16 Mag, 17:14, Gabriele <santilli.gabri...@gmail.com> wrote:

[...]

Forgot to mention, the above is with BUP @65c3cba0df from github.

laurento....@gmail.com

unread,
May 19, 2011, 12:54:28 PM5/19/11
to bup-...@googlegroups.com
I think the problem resides in the node object comparison, actually bup only considers node.name and this causes yours problem when the same dir name appears twice or more in a specified path (without the trailing slash).

The attached patch leaves the current compare approach for node objects sorting but defines an hash based comparator (it uses the node.hash attribute) to check for equality... but I assume node.hash is really unique and I'm don't really sure about that yet :)
bup_cmp_fix_2.patch

Gabriele Santilli

unread,
May 19, 2011, 1:14:54 PM5/19/11
to bup-...@googlegroups.com
(Google's new groups UI does not allow responding in plain text?)

I asked my friend Laurento who knows Python better than me to help with the restore problem. We've been discussing this in the past few days, and although his patch fixes the problem with the fullname function, it may not solve the issue entirely; the main reason is that we don't know what the author means by testing for equality of two nodes.

I'm assuming node.hash is the GIT hash of the object (if not, then what I'm saying may not make sense). If that's the case, then it certainly will be unique within a branch of the tree (or, said another way, you cannot have a child tree object with the same hash as a parent tree object, because if you did, they would be the same and you'd have a loop in the tree). However, there is no guarantee that it will be unique across the whole tree. (I'd say it's more likely to be unique than node.name, so this patch is certainly an improvement IMHO.)

So our question is: what should the meaning of the expression "node1 = node2" be?
  1. it should be true if they are the exact same object (ie. same location in memory)
  2. it should be true if the nodes represent a file with the same name in the same directory, ie. if the full path is the same for both nodes (this means that if the nodes come from two different commits, the result would be true if they represent the same file; of course this means that fullname() cannot use = because = would rely on fullname())
  3. same as (2) except that we also take the commit into account, so the result is false for two nodes from two different commits
  4. it should be true if the two nodes represent two files or directories with the same content (this is what happens with Laurento's patch); I guess this could be useful if one wanted to list all files that are copies of a given file, but I'm not sure how common is this scenario so I doubt this would be desirable as the default
My guess is that the original behavior where only node.name is compared is not desirable, but I may be missing something?

Gabriele Santilli

unread,
May 22, 2011, 4:44:44 AM5/22/11
to bup-...@googlegroups.com
Pushed Laurento's changes to my fork of bup, for those interested. (Also sent a pull request.)

https://github.com/giesse/bup

Avery Pennarun

unread,
May 29, 2011, 8:49:18 PM5/29/11
to bup-...@googlegroups.com

Hi,

Comparing object equality using hash() is really dangerous; it should
be the other way around. Hashes can have random overlaps, so
sometimes we'd end up with totally different nodes having the same
hash, which would cause them to compare equal using your version.

I think what might be desired here would be to fix __cmp__ to use both
.name and .parent here, ie.

return (cmp(a and a.parent, b and b.parent) or
cmp(a and a.name, b and b.name)

That would cause us to compare the entire leading path, rather than
just the relative name.

Want to try that and see if it helps?

We also need a patch to test.sh (or something) that verifies the
problem and solution: ie. it should cause a failure without the fix,
and pass if the fix is applied.

Thanks,

Avery

Gabriele Santilli

unread,
May 30, 2011, 5:51:48 AM5/30/11
to bup-...@googlegroups.com
Il giorno lunedì 30 maggio 2011 02:49:18 UTC+2, apenwarr ha scritto:
 
Comparing object equality using hash() is really dangerous; it should be the other way around.  Hashes can have random overlaps, so sometimes we'd end up with totally different nodes having the same hash, which would cause them to compare equal using your version.

Out of curiosity, where does that hash come from? If it's from GIT, you'll only reasonably have the same hash for blobs or trees with the same content. Which is still a problem, but that really depends on the intended semantics for node comparison.

I think what might be desired here would be to fix __cmp__ to use both
.name and .parent here, ie.

    return (cmp(a and a.parent, b and b.parent) or
                cmp(a and a.name, b and b.name)

That would cause us to compare the entire leading path, rather than
just the relative name.

That makes sense indeed. So we can assume that BUP is always going to compare nodes from the same tree? Do you think this could have a performance impact (eg. sorting can get nasty in the worst case)?

Laurento, what do you think?

Want to try that and see if it helps?

I'll give it a try next time I do a restore test (I guess tomorrow).

Avery Pennarun

unread,
May 30, 2011, 4:25:14 PM5/30/11
to bup-...@googlegroups.com
On Mon, May 30, 2011 at 5:51 AM, Gabriele Santilli
<santilli...@gmail.com> wrote:
> Il giorno lunedì 30 maggio 2011 02:49:18 UTC+2, apenwarr ha scritto:
>> Comparing object equality using hash() is really dangerous; it should be
>> the other way around.  Hashes can have random overlaps, so sometimes we'd
>> end up with totally different nodes having the same hash, which would cause
>> them to compare equal using your version.
>
> Out of curiosity, where does that hash come from? If it's from GIT, you'll
> only reasonably have the same hash for blobs or trees with the same content.
> Which is still a problem, but that really depends on the intended semantics
> for node comparison.

Oops, my mistake, it really is the git hash. I thought you were
talking about the python hash. Guess I should have named that
differently :)

Anyway, we shouldn't compare by hash, because we really want to be
able to sort nodes in alphabetical order for things like ls. That, it
turns out, is why we define a __cmp__ function at all. However, we
can still sort them in alphabetical order if we also compare parents.

>> I think what might be desired here would be to fix __cmp__ to use both
>> .name and .parent here, ie.
>>
>>     return (cmp(a and a.parent, b and b.parent) or
>>                 cmp(a and a.name, b and b.name)
>>
>> That would cause us to compare the entire leading path, rather than
>> just the relative name.
>
> That makes sense indeed. So we can assume that BUP is always going to
> compare nodes from the same tree? Do you think this could have a performance
> impact (eg. sorting can get nasty in the worst case)?

The nice thing about the above code is that we don't have to *assume*
it'll compare nodes from the same tree: it just compares all the way
up the tree, so you end up doing an alphabetic comparison of the full
pathname, not just the local filename. That means you can sort
alphabetically inside a particular tree, as well as sorting paths from
different trees in order by their full pathname.

In terms of slowness, I suspected that python's cmp() might just
shortcut if it knows the two objects are actually the same object, but
I just tested it, and it doesn't. So this might be even better
(warning: untested):

if a is b:
return 0


    return (cmp(a and a.parent, b and b.parent) or

            cmp(a and a.name, b and b.name))

Thanks,

Avery

Gabriele Santilli

unread,
May 31, 2011, 6:13:57 AM5/31/11
to bup-...@googlegroups.com
Il giorno lunedì 30 maggio 2011 22:25:14 UTC+2, apenwarr ha scritto:
 
Anyway, we shouldn't compare by hash, because we really want to be able to sort nodes in alphabetical order for things like ls.  That, it turns out, is why we define a __cmp__ function at all.  However, we can still sort them in alphabetical order if we also compare parents.

That is the reason why Laurento left __cmp__ alone and redefined __eq__, I think. Indeed, by changing __cmp__ we had failures in the tests (ls would not sort correctly anymore, as expected). (This is not to argue in favor of using the hash, just pointing out that the problem with sorting can be avoided.)
 
The nice thing about the above code is that we don't have to *assume* it'll compare nodes from the same tree: it just compares all the way up the tree, so you end up doing an alphabetic comparison of the full pathname, not just the local filename.  That means you can sort alphabetically inside a particular tree, as well as sorting paths from different trees in order by their full pathname.

Right, so my question is: if you have two trees, from two different commits for eg., you'll have two nodes with the same full pathname (eg. /path/to/some/file from commit 1 and from commit 2). Is it ok to treat them as equal? Maybe we never plan to compare nodes from two different trees, but if we do, it's not obvious to me that considering them equal is always desirable. (Even worse would be considering machine1:/etc/hosts and machine2:/etc/hosts to be "equal", if you put the backups for the two machines on separate branches.)

Again, if we can assume we're never going to compare nodes from different trees, this discussion is pointless, but if we can't assume that then it is worth thinking about it (maybe just documenting this behavior would be enough).

Gabriele Santilli

unread,
May 31, 2011, 9:38:05 AM5/31/11
to bup-...@googlegroups.com
Il giorno lunedì 30 maggio 2011 22:25:14 UTC+2, apenwarr ha scritto:

     if a is b:
         return 0
     return (cmp(a and a.parent, b and b.parent) or
             cmp(a and a.name, b and b.name))

make test seems to like it:
 
WvTest: 2614 tests, 0 failures, total time 39.788s.

I'm now doing a restore of my ~15G repo, it seems to be going well (the bug would manifest very quickly, so the fact it's running is already good news).

I'll see if I can put together a new test for this bug, or get Laurento to do it. :)

Avery Pennarun

unread,
Jun 8, 2011, 10:55:11 PM6/8/11
to bup-...@googlegroups.com
On Tue, May 31, 2011 at 6:13 AM, Gabriele Santilli
<santilli...@gmail.com> wrote:
> Il giorno lunedì 30 maggio 2011 22:25:14 UTC+2, apenwarr ha scritto:
>> Anyway, we shouldn't compare by hash, because we really want to be able to
>> sort nodes in alphabetical order for things like ls.  That, it turns out, is
>> why we define a __cmp__ function at all.  However, we can still sort them in
>> alphabetical order if we also compare parents.
>
> That is the reason why Laurento left __cmp__ alone and redefined __eq__, I
> think. Indeed, by changing __cmp__ we had failures in the tests (ls would
> not sort correctly anymore, as expected). (This is not to argue in favor of
> using the hash, just pointing out that the problem with sorting can be
> avoided.)

That's random luck. You have no way to tell whether a random bit of
code will use "cmp(x,y)==0" or "x == y". Making the two things act
differently can only lead to insanity.

> Right, so my question is: if you have two trees, from two different commits
> for eg., you'll have two nodes with the same full pathname (eg.
> /path/to/some/file from commit 1 and from commit 2). Is it ok to treat them
> as equal? Maybe we never plan to compare nodes from two different trees, but
> if we do, it's not obvious to me that considering them equal is always
> desirable. (Even worse would be considering machine1:/etc/hosts and
> machine2:/etc/hosts to be "equal", if you put the backups for the two
> machines on separate branches.)

Yes, I think it's safe to make those equal.

There is no such thing as machine1:/etc/hosts and machine2:/etc/hosts;
in vfs.py they are actually /machine1/latest/etc/hosts and
/machine2/latest/etc/hosts, respectively, so the path is all you need
to know.

Have fun,

Avery

Gabriele Santilli

unread,
Jun 9, 2011, 5:04:40 AM6/9/11
to bup-...@googlegroups.com
Il giorno giovedì 9 giugno 2011 04:55:11 UTC+2, apenwarr ha scritto:
 
There is no such thing as machine1:/etc/hosts and machine2:/etc/hosts; in vfs.py they are actually /machine1/latest/etc/hosts and /machine2/latest/etc/hosts, respectively, so the path is all you need to know.

Ah, cool, then I guess this is the best solution.

I'm sorry I haven't had time to write the test case yet, in my own use cases though the change you suggested seems to work fine. I'll try to get the test done before the new release (do you have a target date?).

In case anyone else was having this problem and wants to test this:

diff --git a/lib/bup/vfs.py b/lib/bup/vfs.py
index 4543553..20627ab 100644
--- a/lib/bup/vfs.py
+++ b/lib/bup/vfs.py
@@ -173,7 +173,10 @@ class Node:
         self._subs = None
 
     def __cmp__(a, b):
-        return cmp(a and a.name or None, b and b.name or None)
+        if a is b:
+            return 0
+        return (cmp(a and a.parent, b and b.parent) or
+                cmp(a and a.name, b and b.name))
 
     def __iter__(self):
         return iter(self.subs())

YC

unread,
Jun 9, 2011, 7:28:22 AM6/9/11
to bup-...@googlegroups.com
(Thanks Avery for pointing me here, after that sad demonstration of my
nightly logical-operator skills - and the morning is only marginally better ;))

On Thu, Jun 09, 2011 at 02:04:40AM -0700, Gabriele Santilli wrote:
> I'm sorry I haven't had time to write the test case yet, in my own use cases
> though the change you suggested seems to work fine. I'll try to get the test
> done before the new release (do you have a target date?).

Ok, I had a go at testing. The thing is, the assertion in Node.fullname()
only fails when our offending name is at the top (well, at stop_at), so
having children named after their parents further down the tree doesn't
trip it up: doing "s/d\/e/d\/d/g" on t/test.sh doesn't break the tests.

This does:

diff --git a/t/test.sh b/t/test.sh
index 6058e44..7e1039f 100755
--- a/t/test.sh
+++ b/t/test.sh
@@ -228,6 +228,9 @@ WVSTART "save/git-fsck"
WVSTART "restore"
rm -rf buprestore.tmp
WVFAIL bup restore boink
+touch $TOP/$D/$D
+bup index -u $TOP/$D
+bup save -n master /
WVPASS bup restore -C buprestore.tmp "/master/latest/$TOP/$D"
WVPASSEQ "$(ls buprestore.tmp)" "bupdata.tmp"
rm -rf buprestore.tmp

(tell me if you prefer the changes to go under WVSTART "index" - I figured by
their purpose they belong here)

> In case anyone else was having this problem and wants to test this:
>
> diff --git a/lib/bup/vfs.py b/lib/bup/vfs.py
> index 4543553..20627ab 100644
> --- a/lib/bup/vfs.py
> +++ b/lib/bup/vfs.py
> @@ -173,7 +173,10 @@ class Node:
> self._subs = None
>
> def __cmp__(a, b):
> - return cmp(a and a.name or None, b and b.name or None)
> + if a is b:
> + return 0
> + return (cmp(a and a.parent, b and b.parent) or
> + cmp(a and a.name, b and b.name))
>
> def __iter__(self):
> return iter(self.subs())

Yep, that fixes it for me. Of course, just removing the assertion from
Node.fullname() also fixes it ;)

Best, YC

Rob Browning

unread,
Dec 10, 2011, 2:16:03 PM12/10/11
to YC, santilli...@gmail.com, bup-...@googlegroups.com

> On Thu, Jun 09, 2011 at 02:04:40AM -0700, Gabriele Santilli wrote:
>> I'm sorry I haven't had time to write the test case yet, in my own use cases
>> though the change you suggested seems to work fine. I'll try to get the test
>> done before the new release (do you have a target date?).
>
> Ok, I had a go at testing. The thing is, the assertion in Node.fullname()
> only fails when our offending name is at the top (well, at stop_at), so
> having children named after their parents further down the tree doesn't
> trip it up: doing "s/d\/e/d\/d/g" on t/test.sh doesn't break the tests.
>
> This does:

Would the two of you have time to create and post a Signed-off-by patch
or patches containing this __cmp__() fix and the test(s)?

I'd like to add it to a proposed updates branch.

Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Gabriele Santilli

unread,
Dec 11, 2011, 5:57:52 AM12/11/11
to Rob Browning, YC, bup-...@googlegroups.com
On Sat, Dec 10, 2011 at 8:16 PM, Rob Browning <r...@defaultvalue.org> wrote:

> Would the two of you have time to create and post a Signed-off-by patch
> or patches containing this __cmp__() fix and the test(s)?

The actual debugging was not done by me, as I don't know much Python.
But, if that's ok anyway, I can probably do this next week; I haven't
had time to create an actual test for this bug though (sorry).

Said another way: I'm ok with you taking full credit for the fix, if
you want to, as that's probably the fastest way. :)

Rob Browning

unread,
Dec 11, 2011, 9:12:09 PM12/11/11
to Gabriele Santilli, YC, bup-...@googlegroups.com
Gabriele Santilli <santilli...@gmail.com> writes:

> On Sat, Dec 10, 2011 at 8:16 PM, Rob Browning <r...@defaultvalue.org> wrote:
>
>> Would the two of you have time to create and post a Signed-off-by patch
>> or patches containing this __cmp__() fix and the test(s)?
>
> The actual debugging was not done by me, as I don't know much Python.
> But, if that's ok anyway, I can probably do this next week; I haven't
> had time to create an actual test for this bug though (sorry).

Right, I thought you provided the fix, and Yungchin provided the test,
and so I was soliciting "signed-off" patches from either/both.

> Said another way: I'm ok with you taking full credit for the fix, if
> you want to, as that's probably the fastest way. :)

I suppose this might be a small enough bit of code that I could do that.
Though I'd probably still include some kind of attribution unless I
rewrote it.

For reference, if I recall correctly, the FSF doesn't require
assignments for less than about 15 lines. See "info maintain" on a
Debian system with gnu-standards installed:

4.2 Legally Significant Changes
===============================

If a person contributes more than around 15 lines of code and/or text
that is legally significant for copyright purposes, we need copyright
papers for that contribution, as described above.

A change of just a few lines (less than 15 or so) is not legally
significant for copyright

...

I assume they've done their legal homework, so Avery might consider that
a reasonable standard, but of course he'll have to say for sure.

Gabriele Santilli

unread,
Dec 12, 2011, 6:02:59 AM12/12/11
to Rob Browning, YC, bup-...@googlegroups.com
On Mon, Dec 12, 2011 at 3:12 AM, Rob Browning <r...@defaultvalue.org> wrote:

> Right, I thought you provided the fix, and Yungchin provided the test,
> and so I was soliciting "signed-off" patches from either/both.

I asked a friend to help me to fix it, he provided an initial fix
(search for Laurento Frittella on that thread). Avery didn't like it,
and provided a better alternative (recursive comparison). So I just
used that; anyway, I'll send a signed-off patch as soon as I have some
time (within a couple days I guess), then you can decide what you want
to do.

For the test, you are right, I think Yungchin wrote one. I hope he can
get back to you.

Oei, YC

unread,
Dec 12, 2011, 9:29:47 AM12/12/11
to Rob Browning, santilli...@gmail.com, bup-...@googlegroups.com
On Sat, Dec 10, 2011 at 19:16, Rob Browning <r...@defaultvalue.org> wrote:
> Would the two of you have time to create and post a Signed-off-by patch
> or patches containing this __cmp__() fix and the test(s)?

Hi Rob, I hope the formatted patch I just sent is suitable.
As for the fix, I still wonder what the purpose is of the assertion
that causes the whole problem. Couldn't that just be dropped?

Greetings, YC


PS: I've been meaning to use your metadata work, but am never in one
place with my scratch disk long enough lately. Haven't lost interest!

Gabriele Santilli

unread,
Dec 12, 2011, 11:10:15 AM12/12/11
to Oei, YC, Rob Browning, bup-...@googlegroups.com
On Mon, Dec 12, 2011 at 3:29 PM, Oei, YC <oei.yu...@gmail.com> wrote:

> As for the fix, I still wonder what the purpose is of the assertion
> that causes the whole problem. Couldn't that just be dropped?

The problem is, that two nodes with the same name but on different
points of the tree (eg. /some/file and /someother/file) should not be
considered equal.

The assertion is just checking that fullpath (IIRC) is not being
called with the wrong argument. The check fails in the case shown in
the test because of a bug in the _cmp_ function.

So, the assertion was actually useful in that it led us to discover
the bug; otherwise, that function would have just returned the wrong
value, and it may have gone unnoticed creating other problems.

Rob Browning

unread,
Dec 12, 2011, 8:34:36 PM12/12/11
to Gabriele Santilli, YC, bup-...@googlegroups.com
Gabriele Santilli <santilli...@gmail.com> writes:

> I asked a friend to help me to fix it, he provided an initial fix
> (search for Laurento Frittella on that thread). Avery didn't like it,
> and provided a better alternative (recursive comparison). So I just
> used that; anyway, I'll send a signed-off patch as soon as I have some
> time (within a couple days I guess), then you can decide what you want
> to do.

Oh right -- since Avery wrote the original code, it's probably not an
issue ;> And if the FSF standards are correct, it may not be enough code
to require strictness anyway.

Rob Browning

unread,
Dec 12, 2011, 8:36:15 PM12/12/11
to Oei, YC, santilli...@gmail.com, bup-...@googlegroups.com
"Oei, YC" <oei.yu...@gmail.com> writes:

> Hi Rob, I hope the formatted patch I just sent is suitable.

Looks like it'll probably be good, thanks.

> PS: I've been meaning to use your metadata work, but am never in one
> place with my scratch disk long enough lately. Haven't lost interest!

Great, let me know how it goes.

Reply all
Reply to author
Forward
0 new messages