def fullname(self, stop_at=None):"""Get this file's full path."""assert(self != stop_at) # would be the empty string; too weirdif self.parent and self.parent != stop_at:return os.path.join(self.parent.fullname(stop_at=stop_at),self.name)else:return self.name
On 11-03-26 03:48 AM, Dickon Reed wrote:
> I was doing a "bup restore" (with bup 1fe7783) and hit the assertion in
> vfs.Node.fullname, which I have pasted below. What I think is happening
> (and I'm new to the code base so don't believe me) is that the top level
> directory node in my restore has the same name as a directory node 8
> levels deep, so the __cmp__ method invoked by the assertion correctly
> indicate that names match.
Hmm, if this is true then it's kinda bad.. can you easily replicate this
in a simpler repository?
> I'm experimenting but I thought I would mention this here in case anyone
> wants to explain or I get distracted before I have solved this.
>
> def fullname(self, stop_at=None):
> """Get this file's full path."""
> assert(self != stop_at) # would be the empty string; too weird
> if self.parent and self.parent != stop_at:
> return os.path.join(self.parent.fullname(stop_at=stop_at),
> self.name)
> else:
> return self.name
It'd help if you pasted bup's output in the e-mail [and maybe while
using bup's debug mode, too]. This way, we could have more clue about
what bup is actually doing to reach this point.
--
Gabriel Filion
I was able to reproduce this in /tmp and in another arbitrary top level
directory, but not anywhere else further down the filesystem hierarchy..
> This gives me:
>
> Traceback (most recent call last): File "/usr/lib/bup/cmd/bup-restore",
> line 90, in <module> do_node(n.parent, n) File
> "/usr/lib/bup/cmd/bup-restore", line 54, in do_node do_node(top, sub)
> File "/usr/lib/bup/cmd/bup-restore", line 35, in do_node fullname =
> n.fullname(stop_at=top) File "/usr/lib/bup/bup/vfs.py", line 183, in
> fullname assert(self != stop_at) # would be the empty string; too weird
> AssertionError
somehow, in this assertion, self.name (the 'name' argument for
/tmp/buptest/tmp's node) is the same as stop_at.name (the 'name'
argument for /tmp's node) so this makes those nodes look like they're
equal in Node.__cmp__'s view even though they represent different files
on the filesystem.
--
Gabriel Filion
Sorry, I'm behind on my bup mails and just saw this one now. It does
sound serious... does someone want to try their hand at a patch?
Thanks,
Avery
I'm still kinda perplexed by what's actually causing this.
Do you have any suggestion of how we could go further down the debugging
road so that we identify the problem exactly?
--
Gabriel Filion
I admit that I haven't read the context very carefully (and I don't
really have time). But if you have a reproducible test case, that's
about 95% of the really hard part :)
I'd think you just need to add a few print and/or assert() commands in
various places to narrow down exactly what isn't happening where you
expect it to happen... famous last words, of course, but there really
aren't that many layers in bup, so the problem has to be relatively
shallow.
Have fun,
Avery
I'm a little stuck and too tired to continue tonight. Here's a start.
It fails to run now, but I suspect there only needs to be one or two
more lines of code to go.
sg:bup sg$ bash encryption_test.sh
(then all looks good until...
bloom: adding 1 file (1 object).
error: failed to validate delta base reference at offset 33 from
/Users/sg/.bup/objects/pack/pack-87243f038756bb42c70107ac50f5493b32ad521a.pack
error: failed to read object cc9a6ca559a5980e83376b8e47cfd7fcb23d11e4 at
offset 12 from
/Users/sg/.bup/objects/pack/pack-87243f038756bb42c70107ac50f5493b32ad521a.pack
fatal: object cc9a6ca559a5980e83376b8e47cfd7fcb23d11e4 is corrupted
Traceback (most recent call last):
File "./cmd/bup-split", line 176, in <module>
git.update_ref(refname, commit, oldref)
File "/Users/sg/bup/lib/bup/git.py", line 793, in update_ref
_git_wait('git update-ref', p)
File "/Users/sg/bup/lib/bup/git.py", line 881, in _git_wait
raise GitError('%s returned %d' % (cmd, rv))
bup.git.GitError: git update-ref returned 128
From 2deb1dc1cd74c1c298da5cd84af344f734048a4e Mon Sep 17 00:00:00 2001
From: Shane Geiger <shane....@gmail.com>
Date: Sun, 8 May 2011 23:14:28 -0400
Subject: [PATCH] no encryption option (mostly done)
---
cmd/split-cmd.py | 9 +++++----
encryption_test.sh | 2 ++
lib/bup/git.py | 5 ++++-
3 files changed, 11 insertions(+), 5 deletions(-)
create mode 100644 encryption_test.sh
diff --git a/cmd/split-cmd.py b/cmd/split-cmd.py
index b243c61..3ca3a3e 100755
--- a/cmd/split-cmd.py
+++ b/cmd/split-cmd.py
@@ -3,9 +3,8 @@ import sys, time
from bup import hashsplit, git, options, client
from bup.helpers import *
-
optspec = """
-bup split <-t|-c|-b|-n name|--copy|--noop> [--bench] [filenames...]
+bup split <-t|-c|-b|-n name|--copy|-0|--noop> [--bench] [filenames...]
--
Modes:
b,blobs output a series of blob ids
@@ -14,6 +13,7 @@ c,commit output a commit id
n,name= save the result under the given name
noop split the input, but throw away the result
copy split the input, copy it to stdout, don't save to repo
+0,noenc no encryption
Options:
r,remote= remote repository path
d,date= date for the commit (seconds since the epoch)
@@ -34,7 +34,7 @@ handle_ctrl_c()
git.check_repo_or_die()
if not (opt.blobs or opt.tree or opt.commit or opt.name or
opt.noop or opt.copy):
- o.fatal("use one or more of -b, -t, -c, -n, -N, --copy")
+ o.fatal("use one or more of -b, -t, -c, -n, -N, --copy, -0")
if (opt.noop or opt.copy) and (opt.blobs or opt.tree or
opt.commit or opt.name):
o.fatal('-N and --copy are incompatible with -b, -t, -c, -n')
@@ -60,7 +60,8 @@ if opt.date:
date = parse_date_or_fatal(opt.date, o.fatal)
else:
date = time.time()
-
+if opt.noenc:
+ git.encryption = False
total_bytes = 0
def prog(filenum, nbytes):
diff --git a/encryption_test.sh b/encryption_test.sh
new file mode 100644
index 0000000..ec36331
--- /dev/null
+++ b/encryption_test.sh
@@ -0,0 +1,2 @@
+cd ~/bup
+tar -cvf - /Users/sg/sysadmin/ | ./main.py split -n local-etc -vv -0
diff --git a/lib/bup/git.py b/lib/bup/git.py
index 233a2ee..b1681a2 100644
--- a/lib/bup/git.py
+++ b/lib/bup/git.py
@@ -21,6 +21,7 @@ _typermap = { 3:'blob', 2:'tree', 1:'commit', 4:'tag' }
_total_searches = 0
_total_steps = 0
+encryption = True
class GitError(Exception):
pass
@@ -540,7 +541,9 @@ class PackWriter:
log('>')
if not sha:
sha = calc_hash(type, content)
- size, crc = self._raw_write(_encode_packobj(type, content),
sha=sha)
+ if encryption:
+ content = _encode_packobj(type, content)
+ size, crc = self._raw_write(content, sha=sha)
if self.outbytes >= max_pack_size or self.count >=
max_pack_objects:
self.breakpoint()
return sha
--
1.6.5.2
--
Shane Geiger
shane....@gmail.com
Phone: 347-237-7108
"Principles for the development of a complete mind:
Study the science of art.
Study the art of science.
Develop your senses--especially learn how to see.
Realize how everything connects to everything else."
-- Leonardo DaVinci
should be
> - size, crc = self._raw_write(_encode_packobj(type, content), sha=sha)
and
> if self.outbytes >= max_pack_size or self.count >=
> max_pack_objects:
should be
> if self.outbytes >= max_pack_size or self.count >= max_pack_objects:
2) The patch doesn't apply to the current master
(ed7e33fee0c5ba429272e862071e0bb2a54e0ca2). Could you please tell, which
commit your patch is based on?
I have to admit this is the first time I try to apply a patch from an email, so
the problem might be on my side.
So to be sure I didn't break anything (and if so correct my fault) here my
steps:
- copy mail from mailbox to bup source root
- create new branch
- fix broken lines
- `git apply email.patch`
error: patch failed: cmd/split-cmd.py:3
error: cmd/split-cmd.py: patch does not apply
error: patch failed: lib/bup/git.py:21
error: lib/bup/git.py: patch does not apply
Thanks,
Zoran
Thanks for talking the step to contribute to bup!
I took a look at your patch.
You try to completely bypass compression. IIRC this wouldn't produce valid
packfiles [1].
What we want to do instead is to change the compression level. Python's zlib
library does support this [2]. Currently we set this to 1 [3]. I think the
correct way to implement this feature is to make the compression level a
parameter. I implemented this quickly [4], the UI changes are missing here. To
use the new parameter set it when instantiating a PackWriter object:
w = git.PackWrtier()
could become
w = git.PackWrtier(compression_level=0)
The options -0 to -9 would be greatly appreciated in both split and save. The
long version of your -0 option ("noenc") is misleading. I'd call it
"no-compression".
Feel free to use my commit or just use the changes.
Zoran
[1] http://book.git-scm.com/7_the_packfile.html
[2] http://docs.python.org/library/zlib.html#zlib.compressobj
[3] https://github.com/apenwarr/bup/blob/master/lib/bup/git.py#L180
[4] https://github.com/zoranzaric/bup/commit/74bb8b4df2505eb92bb83c0cb6e55ee87c20a5b2
--
Best Regards.
This is unedited.
P-)
Hi,
Comparing object equality using hash() is really dangerous; it should
be the other way around. Hashes can have random overlaps, so
sometimes we'd end up with totally different nodes having the same
hash, which would cause them to compare equal using your version.
I think what might be desired here would be to fix __cmp__ to use both
.name and .parent here, ie.
return (cmp(a and a.parent, b and b.parent) or
cmp(a and a.name, b and b.name)
That would cause us to compare the entire leading path, rather than
just the relative name.
Want to try that and see if it helps?
We also need a patch to test.sh (or something) that verifies the
problem and solution: ie. it should cause a failure without the fix,
and pass if the fix is applied.
Thanks,
Avery
Comparing object equality using hash() is really dangerous; it should be the other way around. Hashes can have random overlaps, so sometimes we'd end up with totally different nodes having the same hash, which would cause them to compare equal using your version.
I think what might be desired here would be to fix __cmp__ to use both
.name and .parent here, ie.return (cmp(a and a.parent, b and b.parent) or
cmp(a and a.name, b and b.name)That would cause us to compare the entire leading path, rather than
just the relative name.
Want to try that and see if it helps?
Oops, my mistake, it really is the git hash. I thought you were
talking about the python hash. Guess I should have named that
differently :)
Anyway, we shouldn't compare by hash, because we really want to be
able to sort nodes in alphabetical order for things like ls. That, it
turns out, is why we define a __cmp__ function at all. However, we
can still sort them in alphabetical order if we also compare parents.
>> I think what might be desired here would be to fix __cmp__ to use both
>> .name and .parent here, ie.
>>
>> return (cmp(a and a.parent, b and b.parent) or
>> cmp(a and a.name, b and b.name)
>>
>> That would cause us to compare the entire leading path, rather than
>> just the relative name.
>
> That makes sense indeed. So we can assume that BUP is always going to
> compare nodes from the same tree? Do you think this could have a performance
> impact (eg. sorting can get nasty in the worst case)?
The nice thing about the above code is that we don't have to *assume*
it'll compare nodes from the same tree: it just compares all the way
up the tree, so you end up doing an alphabetic comparison of the full
pathname, not just the local filename. That means you can sort
alphabetically inside a particular tree, as well as sorting paths from
different trees in order by their full pathname.
In terms of slowness, I suspected that python's cmp() might just
shortcut if it knows the two objects are actually the same object, but
I just tested it, and it doesn't. So this might be even better
(warning: untested):
if a is b:
return 0
return (cmp(a and a.parent, b and b.parent) or
cmp(a and a.name, b and b.name))
Thanks,
Avery
Anyway, we shouldn't compare by hash, because we really want to be able to sort nodes in alphabetical order for things like ls. That, it turns out, is why we define a __cmp__ function at all. However, we can still sort them in alphabetical order if we also compare parents.
The nice thing about the above code is that we don't have to *assume* it'll compare nodes from the same tree: it just compares all the way up the tree, so you end up doing an alphabetic comparison of the full pathname, not just the local filename. That means you can sort alphabetically inside a particular tree, as well as sorting paths from different trees in order by their full pathname.
That's random luck. You have no way to tell whether a random bit of
code will use "cmp(x,y)==0" or "x == y". Making the two things act
differently can only lead to insanity.
> Right, so my question is: if you have two trees, from two different commits
> for eg., you'll have two nodes with the same full pathname (eg.
> /path/to/some/file from commit 1 and from commit 2). Is it ok to treat them
> as equal? Maybe we never plan to compare nodes from two different trees, but
> if we do, it's not obvious to me that considering them equal is always
> desirable. (Even worse would be considering machine1:/etc/hosts and
> machine2:/etc/hosts to be "equal", if you put the backups for the two
> machines on separate branches.)
Yes, I think it's safe to make those equal.
There is no such thing as machine1:/etc/hosts and machine2:/etc/hosts;
in vfs.py they are actually /machine1/latest/etc/hosts and
/machine2/latest/etc/hosts, respectively, so the path is all you need
to know.
Have fun,
Avery
There is no such thing as machine1:/etc/hosts and machine2:/etc/hosts; in vfs.py they are actually /machine1/latest/etc/hosts and /machine2/latest/etc/hosts, respectively, so the path is all you need to know.
On Thu, Jun 09, 2011 at 02:04:40AM -0700, Gabriele Santilli wrote:
> I'm sorry I haven't had time to write the test case yet, in my own use cases
> though the change you suggested seems to work fine. I'll try to get the test
> done before the new release (do you have a target date?).
Ok, I had a go at testing. The thing is, the assertion in Node.fullname()
only fails when our offending name is at the top (well, at stop_at), so
having children named after their parents further down the tree doesn't
trip it up: doing "s/d\/e/d\/d/g" on t/test.sh doesn't break the tests.
This does:
diff --git a/t/test.sh b/t/test.sh
index 6058e44..7e1039f 100755
--- a/t/test.sh
+++ b/t/test.sh
@@ -228,6 +228,9 @@ WVSTART "save/git-fsck"
WVSTART "restore"
rm -rf buprestore.tmp
WVFAIL bup restore boink
+touch $TOP/$D/$D
+bup index -u $TOP/$D
+bup save -n master /
WVPASS bup restore -C buprestore.tmp "/master/latest/$TOP/$D"
WVPASSEQ "$(ls buprestore.tmp)" "bupdata.tmp"
rm -rf buprestore.tmp
(tell me if you prefer the changes to go under WVSTART "index" - I figured by
their purpose they belong here)
> In case anyone else was having this problem and wants to test this:
>
> diff --git a/lib/bup/vfs.py b/lib/bup/vfs.py
> index 4543553..20627ab 100644
> --- a/lib/bup/vfs.py
> +++ b/lib/bup/vfs.py
> @@ -173,7 +173,10 @@ class Node:
> self._subs = None
>
> def __cmp__(a, b):
> - return cmp(a and a.name or None, b and b.name or None)
> + if a is b:
> + return 0
> + return (cmp(a and a.parent, b and b.parent) or
> + cmp(a and a.name, b and b.name))
>
> def __iter__(self):
> return iter(self.subs())
Yep, that fixes it for me. Of course, just removing the assertion from
Node.fullname() also fixes it ;)
Best, YC
Would the two of you have time to create and post a Signed-off-by patch
or patches containing this __cmp__() fix and the test(s)?
I'd like to add it to a proposed updates branch.
Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
> Would the two of you have time to create and post a Signed-off-by patch
> or patches containing this __cmp__() fix and the test(s)?
The actual debugging was not done by me, as I don't know much Python.
But, if that's ok anyway, I can probably do this next week; I haven't
had time to create an actual test for this bug though (sorry).
Said another way: I'm ok with you taking full credit for the fix, if
you want to, as that's probably the fastest way. :)
> On Sat, Dec 10, 2011 at 8:16 PM, Rob Browning <r...@defaultvalue.org> wrote:
>
>> Would the two of you have time to create and post a Signed-off-by patch
>> or patches containing this __cmp__() fix and the test(s)?
>
> The actual debugging was not done by me, as I don't know much Python.
> But, if that's ok anyway, I can probably do this next week; I haven't
> had time to create an actual test for this bug though (sorry).
Right, I thought you provided the fix, and Yungchin provided the test,
and so I was soliciting "signed-off" patches from either/both.
> Said another way: I'm ok with you taking full credit for the fix, if
> you want to, as that's probably the fastest way. :)
I suppose this might be a small enough bit of code that I could do that.
Though I'd probably still include some kind of attribution unless I
rewrote it.
For reference, if I recall correctly, the FSF doesn't require
assignments for less than about 15 lines. See "info maintain" on a
Debian system with gnu-standards installed:
4.2 Legally Significant Changes
===============================
If a person contributes more than around 15 lines of code and/or text
that is legally significant for copyright purposes, we need copyright
papers for that contribution, as described above.
A change of just a few lines (less than 15 or so) is not legally
significant for copyright
...
I assume they've done their legal homework, so Avery might consider that
a reasonable standard, but of course he'll have to say for sure.
> Right, I thought you provided the fix, and Yungchin provided the test,
> and so I was soliciting "signed-off" patches from either/both.
I asked a friend to help me to fix it, he provided an initial fix
(search for Laurento Frittella on that thread). Avery didn't like it,
and provided a better alternative (recursive comparison). So I just
used that; anyway, I'll send a signed-off patch as soon as I have some
time (within a couple days I guess), then you can decide what you want
to do.
For the test, you are right, I think Yungchin wrote one. I hope he can
get back to you.
Hi Rob, I hope the formatted patch I just sent is suitable.
As for the fix, I still wonder what the purpose is of the assertion
that causes the whole problem. Couldn't that just be dropped?
Greetings, YC
PS: I've been meaning to use your metadata work, but am never in one
place with my scratch disk long enough lately. Haven't lost interest!
> As for the fix, I still wonder what the purpose is of the assertion
> that causes the whole problem. Couldn't that just be dropped?
The problem is, that two nodes with the same name but on different
points of the tree (eg. /some/file and /someother/file) should not be
considered equal.
The assertion is just checking that fullpath (IIRC) is not being
called with the wrong argument. The check fails in the case shown in
the test because of a bug in the _cmp_ function.
So, the assertion was actually useful in that it led us to discover
the bug; otherwise, that function would have just returned the wrong
value, and it may have gone unnoticed creating other problems.
> I asked a friend to help me to fix it, he provided an initial fix
> (search for Laurento Frittella on that thread). Avery didn't like it,
> and provided a better alternative (recursive comparison). So I just
> used that; anyway, I'll send a signed-off patch as soon as I have some
> time (within a couple days I guess), then you can decide what you want
> to do.
Oh right -- since Avery wrote the original code, it's probably not an
issue ;> And if the FSF standards are correct, it may not be enough code
to require strictness anyway.
> Hi Rob, I hope the formatted patch I just sent is suitable.
Looks like it'll probably be good, thanks.
> PS: I've been meaning to use your metadata work, but am never in one
> place with my scratch disk long enough lately. Haven't lost interest!
Great, let me know how it goes.