Hi,
I am confused by how it is possible that files stored by bup can be restored by standard git checkout when they are split into multiple blobs. Hopefully you can help me and I do not bother you too much these days. The file DESIGN says:
>
Anyway, so we're dividing up those files into chunks based on the rolling
> checksum. Then we store each chunk separately (indexed by its sha1sum) as a
git blob.
And then, a sequence of multiple blobs is stored using as a tree
> The next problem is less obvious: after you store your series of chunks as
> git blobs, how do you store their sequence?
> (...)
> We didn't split this list in the same way. We could
> have, in fact, but it wouldn't have been very "git-like", since we'd like to
> store the list as a git 'tree' object in order to make sure git's
> refcounting and reachability analysis doesn't get confused. Never mind the
> fact that we want you to be able to 'git checkout' your data without any
special tools.
But that is strange, since the recommended reading "git for computer scientists" and the git internals documentation says that a tree represents a directory structure, not a list of blobs that are concatenated to one single file:
> tree: Directories are represented by tree object. They
refer to blobs that have the
> contents of files (filename,
access mode, etc is all stored in the tree), and to other
>
The next type of Git object we’ll examine is the tree, (...) All the content is
> stored as tree and blob objects, with trees
corresponding to UNIX directory entries and
> blobs corresponding more or
less to inodes or file contents.
A single tree object
> contains one or more entries, each of which is the
SHA-1 hash of a blob or subtree with
> its associated mode, type, and
filename.
How is it possible to assemble one file from multiple blobs?
Best regards,
Moritz