Thanks. I mean, no thanks. No, wait...!
> PS: when's bup-restore coming? :)
Pfft, people need to get a proper sense of priorities!
Seriously, though, pretty soon. I've been thinking about the
indexfile API, which is too gross, and which will be needed in order
to implement the restore stuff in as cool a way as I'd like. I'm
considering just redoing it using sqlite3 instead of its own special
file format, because that would solve basically all the problems with
it. Also, metadata support is sort of important for restoring, and
someone has been threatening to send me patches to add metadata
support for a while now. So I might wait for that.
Meanwhile, bup fuse and bup ftp are most of the way there. A
sufficiently motivated person could turn those into a restore-like
feature (minus the indexfile integration I was thinking about, which
isn't *that* important for use by average people) without much work.
Basically it's a matter of implementing a recursive version of the bup
ftp 'get' command.
Have fun,
Avery
> Also, metadata support is sort of important for restoring, and
> someone has been threatening to send me patches to add metadata
> support for a while now.
So that would be me. I've been working on the issue off and on for a
bit.
Originally, I came up with an implementation of a very dynamic (and not
very efficient) format. If we're planning to gzip the result, that
might not be critical, but the format was quite general purpose, and
arguably far more dynamic than necessary.
In any case, I got the impression that Avery would prefer something more
compact, even if it's not as flexible, so I've come up with an alternate
representation, more toward the other end of the spectrum.
While I'm still not sure this is even broadly what we're likely to want,
we need to start somewhere, so here's a summary; consider it a strawman.
First, one key type used by the format(s) is the vint. A vint is a
variable length integer. See here for a detailed explanation:
http://lucene.apache.org/java/1_4_3/fileformats.html. Values less than
128 only require one byte.
We may end up with several top-level formats. For now I'll just
describe a possible Linux format, and I'll leave aside the question of
whether we'll eventually need per-platform, per-filesystem, or modular,
compositional formats.
I think it may be best to get started, adjust as we find out what's
required, and just warn people that the formats will not be stable for a
while (and perhaps also leave metadata disabled by default).
With respect to the actual representation, each Linux metadata record
would look like this:
file-type - vint representing the file type, initially using
ord('r') for regular file
ord('b') for a block file
ord('l') for a link, etc.
uid - vint (store the string names more globally)
git - vint (store the string names more globally)
standard-perms - vint representing a bitfield with the permission bits
in this order: suid sgid sticky ru wu xu rg wg xg ro wo xo
The vint makes it possible to easily add more bits later if needed.
atime - vint nanoseconds
mtime - vint nanoseconds
ctime - vint nanoseconds
if symlink
target-name-length - vint
target-name - bytes
if device
major-number - vint
minor-number - vint
attributes - vint bitfield for chattr/lsattr permissions (like perms)
acl-text-length - vint
acl-text - bytes via acl_to_any_text() via posix1e (unless length 0)
I'm not yet certain the text format is stable/appropriate.
extended-attributes-count - vint
for each extended attribute: (via pyxattr)
attribute-length - vint
attribute - bytes
I've written the encoder, and this representation ends up requiring
about 40 bytes per-file when there are no attributes or acls, and it's
not a symlink or device node. Roughly 800 files produce about 28k which
gzip compresses to less than 5k.
There are a number of details that I'm not certain about, but for now,
I'd mostly just like to see what people think about this approach
in general.
Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4