"
mle...@gmail.com" <
mle...@gmail.com> writes:
> ... currently, they allow file uploads via a database frontend where you
> can enter descriptions. I was maybe a bit vague because I think I can
> negotiate with them as long as the space does not get cluttered with
> unordered files that noone knows/cares about.
Right, one thing I wasn't sure about either was the tape system. If
that doesn't behave like a "normal" filesystem somehow, i.e. if it
requires streaming, then you wouldn't be able to use bup directly for
the archival step.
>> How can I move saves from one bup store (group drive) to another
>> (tape-backed data freezer)? Using git, I would maybe register the big tape
>> drive as a remote and then do something like a cherry-pick onto branch
>> "remote/backup-set", and then git push. Is this the right way?
> bup get *copies* saves. The scary/tricky bit is removing them from the
> original.
Right, you might use bup to manage the working set, though the main
benefit to using bup there as compared to a pile of compressed tar
archives would I imagine mostly be deduplication.
With that arrangement access to the working set would be via
restore/fuse/web (though most likely restore for more intensive use). I
suppose you could also handle giant "blob" data sets (if there are any)
via split/join.
In terms of archiving, as I think you suggest, one option might be to
create a new repo with the things you want to archive using "bup get",
then run "bup rm" to drop the refs from the working repo, and finally
run "bup gc" to reclaim space.
That leaves you with the new "transfer repo" which would need to be
stored on the tapes somehow. Of course you'd have effectively no
deduplication across transfer repos (unless the tape system can do that
somehow itself -- doubtful(?), since each transfer repo would be a set
of new, custom-subset packfiles).
Another possibility I wondered about, would be streaming the archive set
directly to the tapes via "git archive". Not sure if "git archive"
would work OK for big saves/repos, but if it would, perhaps interesting.
And of course if git-archive can't handle big saves/repos, something
like that could be added to bup, given the time.
(Hmm, I also hadn't really thought about how "git bundle" might work for
bup repos, though I don't think it can stream, so if it worked, it'd be
more an alternative to a "bup get" subset.)
> Hmm, 'bup get --pick' could be what I'm looking for, it serms to 'get'
> a single save and not also all its ancestors, that would be the way to
> transfer to the long-term archive
Right bup-get is (overly?) flexible. You can cherry-pick saves from a
branch (or even across branches) to create a new branch (even within the
same repo), you can also promote subtress to saves, etc.
> As the pruning would then only affect the group drive, we have
> the Shadow Copy Service (aka Windows File History) as a failsafe.
> Then I would need a way to check for backup integrity, maybe by keeping
> a list of paths/md5suns that I expect in the backup and trying to
> recover random samples thereof... Then I would notice before the
> shadow copies time out...
If you had enough scratch space, you could also do what I do in some
cases, and double-check that a restore (or part of a restore) matches
the original (filesystem or other restore) via rsync -ni... We have a
helper for that in the source tree called compare-trees which the tests
rely on heavily.
One other issue -- you mentioned the working set being on a remote
filesystem. That might introduce some additional questions/concerns.
The happy path for bup is still either direct filesystem or ssh access
for saves/restores, etc. There is also (currently) no locking, or other
carefully vetted coordination with respect to concurrency.
Some operations may well be fairly safe. We do try to handle various
operations safely, at least for local "normal" filesystems, but we don't
do NFS-safe locking for example, and I doubt *all* of the code was
written with concurrency in mind (since that wasn't a goal). Given
that, you may need to handle locking, if you need it, at a higher level
somehow.
I also don't know how well bup will perform with NFS in general[1],
definitely with respect to concurrency, and also with respect to
performance for larger repos, given how much bup (at the moment) and git
rely on mmapping, etc.
I suppose I don't really have enough recent, careful experience with NFS
to have a well informed opinion.
[1] ...and we've had reports of actual trouble that might or might not
have been related to CIFS -- including one recent report.
Hope this helps