On Fri, Dec 28, 2012 at 1:45 PM, Gabriel Filion <
lel...@gmail.com> wrote:
> On 11/06/2012 12:38 PM, Stefan Buller wrote:
>> All of this requires code for reloading or attaching new commits in the
>> fuse layer of bup. I, unfortunately, don't feel prepared to tackle this
>> myself. I feel that this is beyond my understanding of the code, and am
>> working on other projects at the moment.
>
> zoran and I just tried to bite the bullet and implement some kind of
> proof of concept for this... and it was harder than we first thought. I
> actually decided to abandon the idea and review stuff from rob to make
> my time useful. but here's a report of what we saw/thought about:
>
> we thought about implementing this by having the bup-fuse process set up
> an inotify watch on $BUP_DIR/refs and then make it poll inotify and
> update things.
>
> the polling / updating objects looks like it can only be achieved from
> inside methods of the fuse object (we thought about injecting it into
> readdir() ). so things will only get updated if there is interation with
> the fuse mount (possibly making things unresponsive after a long moment
> of inactivity)
This should actually not be so hard. I apologize for being lazy and
not implementing this the first time around :)
You shouldn't need any tricks like inotify, because you can cheat
instead. The important things to note are:
- commits, trees, and objects never disappear from the repo once
they're there (other than Zoran's repacking patches, but we can treat
repacking as a special case later)
- even if an object wasn't present when you supplied the readdir()
contents, people can still read it if they know the exact name
- therefore only the list of branches and the list of contents for
each branch will ever change.
Thus, the easy way to implement this would be to just update the list
of objects *every* time someone does readdir() on the list of refs or
the contents of a given ref. Something like this:
- move list of child objects to a tmpdict
- for each object that now exists:
- if object existed before: move it from the tmpdict
- else: create new object
- delete any objects (trees) now in the tmplist
To prevent memory leaks, make sure that the tree structure is held
correctly: child objects should either not have references to parent
objects, or if they do, they should be made using the python weakref
module. Parent objects just hold *normal* (non-weak) references to
child objects. I can't remember if this code is correct right now
(and since objects currently never disappear because the tree never
refreshes, we wouldn't suffer even if there was a bug in this
respect).
> while thinking about how to place things we thought about a corner case
> that's gonna need discussion here:
>
> suppose you've launched a script for processing and/or copying stuff
> from a branch's "latest" symlink. then a new backup gets pushed while
> this is happening. the latest link gets updated to point to the latest
> commit, as expected of it, however your command might start working on
> stuff from a different commit, which is very not expected.
>
> so we're wondering if there could be a way to work around this issue, or
> if we should remove the "latest" symlink for adding this feature so that
> we don't expose a "dangerous" link.
symlink-to-latest is a standard Unix technique and bup is doing it
correctly. It's perfectly okay for bup to change that link whenever
it wants, since that's the point of the link.
However, if you want to avoid the race condition you're talking about,
the client program *using* the link needs to be written carefully,
using one of these tricks:
a) chdir() to the 'latest' directory (or one of its children) before
accessing any of the contents. When the symlink changes, your cwd
will still point at the old (symlink-resolved) location so you'll keep
getting the old set of files.
b) open(filename, O_DIRECTORY) the 'latest' directory or a dir inside
it, and then use openat() (and similar whateverat() functions) to
access files using relative paths.
c) readlink() the 'latest' link to find out where it points, then
explicitly use that as the base path.
I recommend (a) unless you have to use this trick on two symlinks
simultaneously, at which point you have to use (b) since you can only
chdir() to one place at a time. So far I've never needed two symlinks
at once. I don't like (c) because it's inelegant and makes
assumptions about the directory structure (ie. that 'latest' is always
a symlink, and the only symlink) where (a) works in all cases and is
very easy even from a shell script.
Have fun,
Avery