Request: Auto-joining large files by 'bup fuse'

159 views
Skip to first unread message

Michael Witten

unread,
Jun 24, 2011, 1:42:20 AM6/24/11
to bup-...@googlegroups.com
I've used bup to make a crude backup of the IMAGE of my entire
internal hard disk (including partition table, etc.) by splitting
it and storing the result in a bup repo on an external hard disk.

Now I'd like to mount a partition in that image via a loop device,
but I don't have the storage capacity to join the bup chunks into
a copy of the image that can then be mounted, and the `bup fuse'
interface just provides access to the individual chunks (which is
clearly completely useless).

Is there some way to access the image data in place, or do you
think it's possible to modify `bup fuse' to provide such a
feature rather than the current braindead access?

I think such a feature would be horrendously slow, but it would
be better than having nothing (and it would make `bup fuse'
actually useful in this case).

It might also be useful to add flags to bup for adding a log message
so that information like disk geometry can be recorded, which would
be useful in my case.

Sincerely,
Michael Witten

Zoran Zaric

unread,
Jun 24, 2011, 1:54:34 AM6/24/11
to Michael Witten, bup-...@googlegroups.com
On Fri, Jun 24, 2011 at 05:42:20AM -0000, Michael Witten wrote:
> I've used bup to make a crude backup of the IMAGE of my entire
> internal hard disk (including partition table, etc.) by splitting
> it and storing the result in a bup repo on an external hard disk.

I guess you used `bup split` to create the backup?

> Now I'd like to mount a partition in that image via a loop device,
> but I don't have the storage capacity to join the bup chunks into
> a copy of the image that can then be mounted, and the `bup fuse'
> interface just provides access to the individual chunks (which is
> clearly completely useless).
>
> Is there some way to access the image data in place, or do you
> think it's possible to modify `bup fuse' to provide such a
> feature rather than the current braindead access?

`bup index` and `bup save` are the combination that produces backups compatible
with `bup ls`, `bup fuse` etc..

> It might also be useful to add flags to bup for adding a log message
> so that information like disk geometry can be recorded, which would
> be useful in my case.

You can just save this information into a text file that you backup in the same
run.

> Sincerely,
> Michael Witten

Zoran

Avery Pennarun

unread,
Jun 24, 2011, 1:53:55 AM6/24/11
to Michael Witten, bup-...@googlegroups.com
On Fri, Jun 24, 2011 at 1:42 AM, Michael Witten <mfwi...@gmail.com> wrote:
> Now I'd like to mount a partition in that image via a loop device,
> but I don't have the storage capacity to join the bup chunks into
> a copy of the image that can then be mounted, and the `bup fuse'
> interface just provides access to the individual chunks (which is
> clearly completely useless).
>
> Is there some way to access the image data in place, or do you
> think it's possible to modify `bup fuse' to provide such a
> feature rather than the current braindead access?

Yes, this would be possible, although I'm not 100% sure you're allowed
to mount files from a fuse filesystem on a loopback fs anyway.
Moreover, I'm not sure how you would find a particular partition
inside the device in general, if you mount is as a loopback.

But anyway, it shouldn't be *too* hard to get 'bup fuse' to see the
group of files as a single file. You're right, bup should probably do
this by default, and I guess the right way to do it would be for 'bup
split' to generate a tree with a single toplevel file instead of
producing the raw split. Anyone who wants to send a patch to do that
should please do so, as it'll probably unconfuse a lot of people.

Anyway, you can take the current backup set and mangle it a bit too,
using git commands. You might have to play with it a little, but it
should be something like this (replace BRANCHNAME with the name of
your backup set):

cd ~/.bup
tree=$(git rev-parse BRANCHNAME:)
newtree=$(printf "040000 tree $tree\tdisk.img.bup" | git mktree)
commit=$(echo fixup | git commit-tree $newtree)
git branch fixup $commit

Now the branch 'fixup' should appear in your bup fuse, with a single
file named disk.img in it.

Please try it out and see if that helps!

Have fun,

Avery

Michael Witten

unread,
Jun 24, 2011, 2:05:01 AM6/24/11
to Zoran Zaric, bup-...@googlegroups.com
On Fri, 24 Jun 2011 07:54:34 +0200, Zoran Zaric wrote:

> On Fri, Jun 24, 2011 at 05:42:20AM -0000, Michael Witten wrote:
>> I've used bup to make a crude backup of the IMAGE of my entire
>> internal hard disk (including partition table, etc.) by splitting
>> it and storing the result in a bup repo on an external hard disk.
>
> I guess you used `bup split` to create the backup?

That is correct.

>> Now I'd like to mount a partition in that image via a loop device,
>> but I don't have the storage capacity to join the bup chunks into
>> a copy of the image that can then be mounted, and the `bup fuse'
>> interface just provides access to the individual chunks (which is
>> clearly completely useless).
>>
>> Is there some way to access the image data in place, or do you
>> think it's possible to modify `bup fuse' to provide such a
>> feature rather than the current braindead access?
>
> `bup index` and `bup save` are the combination that produces backups compatible
> with `bup ls`, `bup fuse` etc..

Yes, but I'm saying `bup ls', `bup fuse', etc. should be extended for the
usage case I've presented; it is my humble opinion that a backup system
should provide access to the data that has been backed up, and it is
unconscionable to require large files to be combined and stored in full
before accessing them.

>> It might also be useful to add flags to bup for adding a log message
>> so that information like disk geometry can be recorded, which would
>> be useful in my case.
>
> You can just save this information into a text file that you backup in the same
> run.

Certainly, but would it not be useful to make more use of git's
commit messages?

Avery Pennarun

unread,
Jun 24, 2011, 2:12:32 AM6/24/11
to Michael Witten, Zoran Zaric, bup-...@googlegroups.com
On Fri, Jun 24, 2011 at 2:05 AM, Michael Witten <mfwi...@gmail.com> wrote:
> it is my humble opinion that a backup system
> should provide access to the data that has been backed up, and it is
> unconscionable to require large files to be combined and stored in full
> before accessing them.

Okay, that's a bit of a stretch - other than bup, I haven't run into
any other backup packages that even *try* to do this. Other than the
ones that actually just store a full uncompressed copy of the file on
your filesystem (like rsnapshot and time machine), with all the
disadvantages of that.

The fact that bup can do it at all is, IMHO, kind of amazing :)

Have fun,

Avery

Michael Witten

unread,
Jun 24, 2011, 2:25:14 AM6/24/11
to Avery Pennarun, Zoran Zaric, bup-...@googlegroups.com
On Fri, 24 Jun 2011 02:12:32 -0400, Avery Pennarun wrote:

> On Fri, Jun 24, 2011 at 2:05 AM, Michael Witten <mfwi...@gmail.com> wrote:
>> it is my humble opinion that a backup system should provide access
>> to the data that has been backed up, and it is unconscionable to
>> require large files to be combined and stored in full before
>> accessing them.
>
> Okay, that's a bit of a stretch - other than bup, I haven't run into
> any other backup packages that even *try* to do this. Other than the
> ones that actually just store a full uncompressed copy of the file on
> your filesystem (like rsnapshot and time machine), with all the
> disadvantages of that.

And that is, in my humble opinion, unconscionable. :-)

> The fact that bup can do it at all is, IMHO, kind of amazing :)

Indeed. bup showcases the power of simple concepts and the availability
of low-level access; I look forward to trying your suggestion:

Message-ID: <BANLkTi=5Ux3Qt3M+=mfM-=YaMLyz...@mail.gmail.com>

Michael Witten

unread,
Jun 24, 2011, 4:57:34 AM6/24/11
to Avery Pennarun, bup-...@googlegroups.com
On Fri, 24 Jun 2011 01:53:55 -0400, Avery Pennarun wrote:

> On Fri, Jun 24, 2011 at 1:42 AM, Michael Witten <mfwi...@gmail.com> wrote:
>> Now I'd like to mount a partition in that image via a loop device,
>> but I don't have the storage capacity to join the bup chunks into
>> a copy of the image that can then be mounted, and the `bup fuse'
>> interface just provides access to the individual chunks (which is
>> clearly completely useless).
>>
>> Is there some way to access the image data in place, or do you
>> think it's possible to modify `bup fuse' to provide such a
>> feature rather than the current braindead access?
>
> Yes, this would be possible, although I'm not 100% sure you're allowed
> to mount files from a fuse filesystem on a loopback fs anyway.


See below.


> Moreover, I'm not sure how you would find a particular partition
> inside the device in general, if you mount is as a loopback.
>

> Anyway, you can take the current backup set and mangle it a bit too,
> using git commands. You might have to play with it a little, but it
> should be something like this (replace BRANCHNAME with the name of
> your backup set):
>
> cd ~/.bup
> tree=$(git rev-parse BRANCHNAME:)
> newtree=$(printf "040000 tree $tree\tdisk.img.bup" | git mktree)
> commit=$(echo fixup | git commit-tree $newtree)
> git branch fixup $commit
>
> Now the branch 'fixup' should appear in your bup fuse, with a single
> file named disk.img in it.
>
> Please try it out and see if that helps!

Avery, you are a GENIUS!!!!!!1111

So, here's what went down (hold on tight):

The external hard disk that I have available for my backup bup repo
is HFS+ formatted. Now, the `hfsplus' file system driver for Linux
[currently] refuses to provide write access when journaling is enabled
for the file system; unfortunately, somebody else is also using this
particular disk as the store for Apple's Time Machine feature, thereby
making it impossible to disable journaling. Consequently, I installed
bup on that person's Mac OS X machine and then made a remote backup
through it.

Now, to get access to the bup repo, I could have directly mounted this
external HFS+ hard disk in read only mode, but it was already attached
to the Macintosh, so I instead mounted it across the network with the
FUSE file system `sshfs'.

Next, I followed your steps to get the `fixup' branch, and then
mounted the repo and made my way to the image:

$ bup -d /path/to/sshfs/mounted/bup/repo fuse /path/to/bup-fuse
$ cd /path/to/bup-fuse/fixup/latest
$ ls
disk.img

BWAAAAAAA hahah HAAA!

I ran GNU `parted' on `disk.img' in order to find the byte offset of the
partition I wanted (let that be "$offset").

Now for the loop device:

$ sudo mount disk.img /path/to/partition -o loop,ro,offset="$offset"
/path/to/bup-fuse/.commit/fb/2b7261bf8ef05e8e00872c8ee59952beda6674/disk.img: Permission denied

DAMNIT!

It turns out that FUSE does not take kindly to a mix of users trying
to access a file system mounted under its control (in this case, user
`mfwitten' as owner of the `bup fuse' file system and user `root'
running the `sudo mount...'). So, I told FUSE that I intend to tell my
user-space file system drivers to allow other users access:

$ sudo sh -c 'echo user_allow_other >> /etc/fuse.conf'

and then actually told the `bup fuse' driver to allow other users access;
this was achieved by unmounting the bup repo and then remounting it with
the `-o' flag:

$ fusermount -u /path/to/bup-fuse
$ bup -d /path/to/sshfs/mounted/bup/repo fuse -o /path/to/bup-fuse

Now for the loop device:

$ sudo mount disk.img /path/to/partition -o loop,ro,offset="$offset"
mount: wrong fs type, bad option, bad superblock on /dev/loop0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

GREAT GOOGLY MOOGLY!

$ dmesg | tail -2
EXT4-fs (loop0): INFO: recovery required on readonly filesystem
EXT4-fs (loop0): write access unavailable, cannot proceed

Ah yes. So, it turns out that Linux's `ext4' driver was upset that
I had told it to load a journal that is dirty because the hard disk
image was produced by dumping an online file system. The trick is to
tell `ext4' not to care, which is done by mounting with the `noload'
option:

$ sudo mount disk.img /path/to/partition -o loop,ro,noload,offset="$offset"

Et voila!

As I write this email, I'm listening to an mp3 from the desired
partition; of course, this is all happening via a quite circuitous route:

HFS+ Hard Disk --> Mac OS X --> sshfs on Linux --> bup fuse --> loop device

Thanks!

Sincerely,
Michael Witten

Mark J Hewitt

unread,
Jun 24, 2011, 12:07:38 PM6/24/11
to bup-...@googlegroups.com
I have been thinking of a similar application in which I have used bup
to maintain ntfsclone images of Windows partitions.
I was wondering if it were possible to mount and explore versions of
those partitions like this, but I have not explored
the feasibility of this very far yet.

Mark.

Avery Pennarun

unread,
Jun 24, 2011, 1:15:52 PM6/24/11
to m.he...@computer.org, bup-...@googlegroups.com
On Fri, Jun 24, 2011 at 12:07 PM, Mark J Hewitt <m...@idnet.com> wrote:
> I have been thinking of a similar application in which I have used bup to
> maintain ntfsclone images of Windows partitions.
> I was wondering if it were possible to mount and explore versions of those
> partitions like this, but I have not explored
> the feasibility of this very far yet.

If someone ported an NBD client to Windows, you could export NBD from
a bup server and mount it that way.

Or if you run Windows in a virtual machine on top of Linux, maybe you
could connect the image file as a virtual disk somehow...

Let us know if you find anything!

Have fun,

Avery

Rob Browning

unread,
Jun 25, 2011, 12:51:15 PM6/25/11
to Avery Pennarun, Michael Witten, bup-...@googlegroups.com
Avery Pennarun <apen...@gmail.com> writes:

> Moreover, I'm not sure how you would find a particular partition
> inside the device in general, if you mount is as a loopback.

I'm not sure it will help in this case, but at least for Linux, kpartx
might be useful:

Package: kpartx
Description: create device mappings for partitions
Kpartx can be used to set up device mappings for the partitions of any
partitioned block device. It is part of the Linux multipath-tools.
Homepage: http://christophe.varoqui.free.fr/

--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Korkman

unread,
Jul 6, 2011, 2:37:27 PM7/6/11
to bup-list
Yes kpartx can be used on mounted loop devices to create mappings in /
dev/mapper (do not forget to remove the mappings before removing the
loop device!).

Many projects (disk imaging) create some form of NBD server for this
kind of task, though. No fuse, no loop, no kpart. And another
advantage: NBD can fake write access on read-only devices for
filesystems like XFS which require write access on mount (journal
recovery). So a "bup nbd ~1" would be really great, but in the
meantime, the git command sequence is a nice workaround.


On 25 Jun., 18:51, Rob Browning <r...@defaultvalue.org> wrote:

Avery Pennarun

unread,
Jul 6, 2011, 3:07:42 PM7/6/11
to Korkman, bup-list
On Wed, Jul 6, 2011 at 2:37 PM, Korkman <goo...@pierre-beck.de> wrote:
> Many projects (disk imaging) create some form of NBD server for this
> kind of task, though. No fuse, no loop, no kpart. And another
> advantage: NBD can fake write access on read-only devices for
> filesystems like XFS which require write access on mount (journal
> recovery). So a "bup nbd ~1" would be really great, but in the
> meantime, the git command sequence is a nice workaround.

Well, the git command sequence is orthogonal to the fuse vs. nbd
discussion. The git commands are only needed because 'bup split'
creates an inconvenient file layout right now, and it probably just
shouldn't.

Anyone who would like to take a crack at a 'bup nbd' command is
welcome to give it a try :) I think I heard the nbd protocol is
relatively painless (the magic is all on the client side, which is
already written), so it might be a pretty fun weekend project.

Have fun,

Avery

Yung-Chin Oei

unread,
Oct 8, 2012, 10:08:34 AM10/8/12
to bup-...@googlegroups.com, Avery Pennarun, Michael Witten
When viewing branches that were generated by bup-split through bup-fuse
(or any other frontend relying on vfs.py), these are presented as trees
of the hashsplitted blobs. This means that bup-split output is only
usefully accessible through bup-join.

This change makes bup-split store named commits such that they appear as
files, named with the last component of their branch name(*). That is,
from the vfs layer, they now appear like so:
branch_name/latest/branch_basename

(*) While bup doesn't currently handle slashes in branch names, patches
to this end are on the mailing list, so this patch should handle
them, in anticipation of their general support in bup.

To address potential concerns: the storage format is changed in subtle
ways, in that the top level tree now contains a "normally" named object,
rather than byte-offset names. However, bup-join doesn't care about
this, and as bup-join was previously the only way to use these commits,
the user experience is not affected.

We also add a test for the new functionality. (The test uses an empty
string as input data, because this is the second way in which this patch
changes the behaviour of bup-split: previously, passing empty strings to
bup-split would make it generate an empty git tree, whereas now it
relies on hashsplit.split_to_blob_or_tree() to make a blob for the empty
string. This is meaningful because vfs.py chokes on empty git trees.)

Signed-off-by: Yung-Chin Oei <yung...@yungchin.nl>
---

On Fri, Jun 24, 2011 at 01:53:55AM -0400, Avery Pennarun wrote:
> On Fri, Jun 24, 2011 at 1:42 AM, Michael Witten <mfwi...@gmail.com> wrote:
> > Now I'd like to mount a partition in that image via a loop device,
> > but I don't have the storage capacity to join the bup chunks into
> > a copy of the image that can then be mounted, and the `bup fuse'
> > interface just provides access to the individual chunks (which is
> > clearly completely useless).

> But anyway, it shouldn't be *too* hard to get 'bup fuse' to see the
> group of files as a single file. You're right, bup should probably do
> this by default, and I guess the right way to do it would be for 'bup
> split' to generate a tree with a single toplevel file instead of
> producing the raw split. Anyone who wants to send a patch to do that
> should please do so, as it'll probably unconfuse a lot of people.


cmd/split-cmd.py | 18 ++++++++++++------
lib/bup/hashsplit.py | 5 +++--
t/test.sh | 5 +++++
3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/cmd/split-cmd.py b/cmd/split-cmd.py
index 438fe40..7da9ad4 100755
--- a/cmd/split-cmd.py
+++ b/cmd/split-cmd.py
@@ -1,5 +1,5 @@
#!/usr/bin/env python
-import sys, time
+import os, sys, time
from bup import hashsplit, git, options, client
from bup.helpers import *

@@ -135,11 +135,17 @@ if pack_writer and opt.blobs:
print sha.encode('hex')
reprogress()
elif pack_writer: # tree or commit or name
- shalist = hashsplit.split_to_shalist(pack_writer.new_blob,
- pack_writer.new_tree,
- files,
- keep_boundaries=opt.keep_boundaries,
- progress=prog)
+ if opt.name: # insert dummy_name which may be used as a restore target
+ (mode, sha) = hashsplit.split_to_blob_or_tree(
+ pack_writer.new_blob, pack_writer.new_tree, files,
+ keep_boundaries=opt.keep_boundaries, progress=prog)
+ dummy_name = git.mangle_name(os.path.basename(opt.name),
+ hashsplit.GIT_MODE_FILE, mode)
+ shalist = [(mode, dummy_name, sha)]
+ else:
+ shalist = hashsplit.split_to_shalist(
+ pack_writer.new_blob, pack_writer.new_tree, files,
+ keep_boundaries=opt.keep_boundaries, progress=prog)
tree = pack_writer.new_tree(shalist)
else:
last = 0
diff --git a/lib/bup/hashsplit.py b/lib/bup/hashsplit.py
index 2c4ec3a..1466da2 100644
--- a/lib/bup/hashsplit.py
+++ b/lib/bup/hashsplit.py
@@ -167,9 +167,10 @@ def split_to_shalist(makeblob, maketree, files,
return _make_shalist(stacks[-1])[0]


-def split_to_blob_or_tree(makeblob, maketree, files, keep_boundaries):
+def split_to_blob_or_tree(makeblob, maketree, files,
+ keep_boundaries, progress=None):
shalist = list(split_to_shalist(makeblob, maketree,
- files, keep_boundaries))
+ files, keep_boundaries, progress))
if len(shalist) == 1:
return (shalist[0][0], shalist[0][2])
elif len(shalist) == 0:
diff --git a/t/test.sh b/t/test.sh
index 2f1f24c..5754090 100755
--- a/t/test.sh
+++ b/t/test.sh
@@ -149,6 +149,7 @@ WVPASSEQ "$(cat tag[ab].tmp | bup split -b --git-ids)" \
"$(cat tagab.tmp)"
WVPASS bup split --bench -b <t/testfile1 >tags1.tmp
WVPASS bup split -vvvv -b t/testfile2 >tags2.tmp
+WVPASS echo -n "" | bup split -n split_empty_string.tmp
WVPASS bup margin
WVPASS bup midx -f
WVPASS bup midx --check -a
@@ -210,6 +211,7 @@ WVPASS diff -u t/testfile1 out1.tmp
WVPASS diff -u t/testfile2 out2.tmp
WVPASS diff -u t/testfile2 out2t.tmp
WVPASS diff -u t/testfile2 out2c.tmp
+WVPASSEQ "$(bup join split_empty_string.tmp)" ""

WVSTART "save/git-fsck"
(
@@ -237,6 +239,9 @@ rm -rf buprestore.tmp
WVPASS bup restore -C buprestore.tmp "/master/latest/$TOP/$D/"
touch $D/non-existent-file buprestore.tmp/non-existent-file # else diff fails
WVPASS diff -ur $D/ buprestore.tmp/
+rm -f split_empty_string.tmp
+WVPASS bup restore split_empty_string.tmp/latest/split_empty_string.tmp
+WVPASSEQ "$(cat split_empty_string.tmp)" ""

WVSTART "ftp"
WVPASS bup ftp "cat /master/latest/$TOP/$D/b" >$D/b.new
--
1.7.2.5

Tim Riemenschneider

unread,
Oct 27, 2013, 10:42:21 AM10/27/13
to bup-...@googlegroups.com
Hi,

I know that this is a rather old thread, but....


Am Mittwoch, 6. Juli 2011 21:07:42 UTC+2 schrieb apenwarr:

Anyone who would like to take a crack at a 'bup nbd' command is
welcome to give it a try :)  I think I heard the nbd protocol is
relatively painless (the magic is all on the client side, which is
already written), so it might be a pretty fun weekend project.


... I did that. And yes, it was quite easy, even with my beginner-level python-skills.
It took me one day to hack together a working nbd-server and another to clean that up a bit.
(I'll send that as a patchset in a moment)

To get it working, one basically needs something implementing read() and seek(),
and the class _FileReader from vfs already does that.

Avery Pennarun

unread,
Oct 27, 2013, 4:40:59 PM10/27/13
to Tim Riemenschneider, bup-list
Fun! What did you do with it after?
Reply all
Reply to author
Forward
0 new messages