Well, but these changes propagate pretty minimally, due to the focus
on "composability": any subtrees that don't change at all will always
result in the exact same bit image. But yes, if one file changes size,
then this will affect the immediate chunk around that change, plus the
end-of-directory record for each directory this file is contained in,
all the way up the tree. i.e. if you insert one byte in
/foo/bar/baz.txt then this will affect 4 locations in the stream: the
serialization of the the file /foo/bar/baz.txt's payload itself, plus
the end-of-directory record of /foo/bar, of /foo and of /. However
that's it. The number of chunks changing is dependent on the depth of
the directory tree if you so will. But given that directories are a
concept of grouping usually when multiple things change they tend to
be close and thus the end-of-directory records are going to be the
same ones.
> worked around by making deriving where reflinks could happen when
> expanding a caidx to disk. But for the case of /usr/bin on a building
> "layered" containers, it will cause a fair amount of chunk churn.
The end-of-directory records never hit the disk when extracting
archives. In fact if you extract an archive serially (as you normally
do), then the end-of-directory records are pretty much ignored (not
entirely, they are always validated, as we strictly validate every
byte passing through to ensure reproducibility at every step).
Currently, if you use casync to extract a caidx/catar on an existing
directory tree (and the tree was never seen before), then for each
file in the stream it will unpack it into a temporary file first,
placed in the directory it shall end up in. While extracting it will
try to reflink as much as it can from the existing tree (to be
precise: in any file in the tree, the paths don't have to match,
i.e. this is very efficeint for file renames and moving files within
the tree to different subtrees). When it is done extracting the file
it will check if the file already exists and if so if it it identical
in contents and metadata. If so, it will remove the temporary file
again and leave the old file in place. If they are different otoh
we'll atomically replace the old file with the temporary file. It does
this to optimize disk space: if the old file is good enough we'll just
keep that one in place, and thus will continue sharing any data it is
sharing with other files. Only if it doesn't match the existing one
we'll replace it and make a change to the disk image. But even then
we'll use reflinks as much as we can, so that we share as much as
possible. I wrote it that way with btrfs subvols + reflinks and xfs
reflinks in mind: so that you can take a btrfs subvol snapshot or a
btrfs/xfs reflink copy, and then "mutate" it with casync replacing
only the files that actually changed with absolute minimal disk ops in
the end. overlayfs should benefit from this too, as this means the
temporary files would be written to the top layer always, and copy-up
never has to take place, as we'll always atomically write each full
file, and then either cancel it as a whole or keep it as a whole.
This behaviour as lots of benefits, including somewhat "atomic"
behaviour: apps either see the old or the new files, but never half
written files. However it also has some negative effects: for every
file we process the contianing dir's mtime will be changed while we
unpack it and then reset after we are done with the dir. I stil think
this is the best behaviour we can have, given the available Linux
APIs...