Karl-Philipp Richter <
karl.ri...@gmail.com> writes:
> Is this the use case to save 2TB of data covered by bup? I suspect a
> part of the software to not scale and cause the large amount unnecessary
> reads as soon as the amount of data or the number of files grows.
I'm still not sure I have a good understanding of your arrangement, but
in case it helps reason about it, and ignoring the underlying filesystem
type, bup should only need to read the contents of files that have
"changed"[1] between index/save runs, but it will have to read the entire
contents of each of those files in order to compute the fingerprints.
So, for example if the tree being indexed/saved only had a lot of large
VM images, each of which changed in small ways between backups, then bup
would have to read all of the data on every save, even though it
(likely) ended up not actually writing much new information to the
repository.
In terms of the storage for the repository itself, bup currently relies
very heavily on mmap, and during index/save it's likely to issue a lot
of scattered reads to the idx/midx/bloom files in the repository, along
with writes for the actual data being saved, which will be appends to
the new packfile(s) it creates.
So unless something unusual is going on due to your particular
arrangment, say an issue with cifs or something (which we might be able
to improve), then if you really do have either 2TB of actual change each
time, or files that bup thinks has changed (via modification time etc.)
whose sizes add up to ~2TB, then bup will have to read that much data.
There's no real alternative until/unless we have a filesystem interface
that provides a way to find out what regions of a large file have
changed, or something similar.
Oh, and I think I mentioned, but if your arrangement needs
--no-check-device and aren't specifying it, that can also cause bup
index to have to "start from scratch" every time, which means re-reading
*all* of the data.
[1]
https://github.com/bup/bup/blob/86e9359072ac5b11fe9eec17ad352ca39949be1a/lib/bup/index.py