Bup vs. Obnam?

2907 views
Skip to first unread message

Adam Porter

unread,
Feb 23, 2012, 3:26:43 AM2/23/12
to bup-list
Hi list,

Having little experience with git, and none with bup, I wonder how bup
compares to Obnam. I've been testing Obnam lately and it seems to
work quite well for remote backups of large files; its bottleneck
seems to be SFTP RTT command latency when backing up many small files
(though I think Lars may address that eventually). But its feature
set is quite impressive, and I also like its thorough build-time
tests.

brackup also sounds interesting, but I haven't even touched it yet.
What I do know is that, while it's a reliable and powerful tool,
Duplicity falls short when it comes to removing old backup data and
restoring even small files from long chains.

Thanks for any thoughts.

Michael Alan Dorman

unread,
Feb 23, 2012, 7:54:18 AM2/23/12
to bup-list
Adam Porter <ad...@alphapapa.net> writes:
> Having little experience with git, and none with bup, I wonder how bup
> compares to Obnam. I've been testing Obnam lately and it seems to
> work quite well for remote backups of large files; its bottleneck
> seems to be SFTP RTT command latency when backing up many small files
> (though I think Lars may address that eventually). But its feature
> set is quite impressive, and I also like its thorough build-time
> tests.

I liked it, but found it intolerably slow.

Mike.

Avery Pennarun

unread,
Feb 23, 2012, 4:24:10 PM2/23/12
to Michael Alan Dorman, bup-list

If it uploads small files one by one, then it definitely doesn't have
the potential to be as fast as bup. However, bup currently doesn't
support sftp. I really think someone should make a try at an sftp
version of 'bup server' - it would probably not be very hard, and
would be pretty cool.

Alternatively, David Anderson recently repackaged the "upload yourself
to the server" feature of my sshuttle program to make it more flexible
and reusable. With that package, it should actually be possible for
bup to upload itself to the remote end (even if the remote end doesn't
have bup installed) and run the very efficient bup-server protocol
that way. If you have sftp access, you probably have full ssh access,
right? So that would be pretty powerful.

Have fun,

Avery

Adam Porter

unread,
Mar 1, 2012, 6:13:19 AM3/1/12
to bup-list
Thanks for your replies.

One of bup's claims is very fast backing up of VM images. I tried
this, and it seems to have decent speed backing up, but restoring is
very slow. I'm guessing it's because of the 8 KB chunks? (Assuming I
understand bup correctly.)

In contrast, Obnam backs up faster than bup, and also restores much
faster: restoring a 15 GB VirtualBox image took 27 minutes with Obnam
vs. 47 minutes for bup.

Is there a way to tweak bup to make it faster, maybe using larger
chunks, or is Obnam just faster for this scenario?

Greg Troxel

unread,
Mar 1, 2012, 8:53:49 AM3/1/12
to Adam Porter, bup-list

Adam Porter <ad...@alphapapa.net> writes:

> Thanks for your replies.
>
> One of bup's claims is very fast backing up of VM images. I tried
> this, and it seems to have decent speed backing up, but restoring is
> very slow. I'm guessing it's because of the 8 KB chunks? (Assuming I
> understand bup correctly.)
>
> In contrast, Obnam backs up faster than bup, and also restores much
> faster: restoring a 15 GB VirtualBox image took 27 minutes with Obnam
> vs. 47 minutes for bup.

When you do the restores, what are the cpu and disk loads?

It would be interesting to create a composite performance metric for
measuring backup systems. My personal weighting is that the 47 vs 27
restore time is not a big deal and I'm more concerned with backup time,
space efficiency, correctness of backup of a changing filesystem (where
matching each file either before or after is ok) and especially the
probability of actually getting data back. But, all other things equal,
if it can be sped up that's great.

Jim Wilcoxson

unread,
Mar 1, 2012, 2:07:39 PM3/1/12
to bup-list
Are these backup and restore times to local storage?

Tony Godshall

unread,
Mar 1, 2012, 2:57:17 PM3/1/12
to bup-list
Also, use of physical media vs SSD should be a large effect-
writes will probably be unfragmented while reads will require
many seeks (deduplicated chunks will obviously be retrieved
from where they were first written)

Adam- are you using spinning platter media?

--
Best Regards.
This is unedited.

Adam Porter

unread,
Mar 2, 2012, 8:44:10 AM3/2/12
to bup-list
Thanks for all your replies.

This test I did was on my Acer Aspire One netbook--not a powerhouse by
any means! I'm sure my other systems would have done it more quickly.
However, I was only interested in the times relative to each other.
And, yes, it was on a single, local hard disk, both the backup repo
and the restore target.

I'm not an expert on Obnam either, but from what I can tell, it uses
chunks up to at least 1.1 MB, which is much larger than 8 KB.

Michael Alan Dorman

unread,
Mar 2, 2012, 8:59:09 AM3/2/12
to bup-list
Adam Porter <ad...@alphapapa.net> writes:
> And, yes, it was on a single, local hard disk, both the backup repo
> and the restore target.

I think you will find obnam slows to a crawl when going over the net.
That may not be important for you, but if it is, be sure to test.

Mike.

Adam Porter

unread,
Mar 2, 2012, 10:03:46 AM3/2/12
to Michael Alan Dorman, bup-list

I did mention this in the first post. :) For small files, Obnam over
the net is quite slow. For large files, the chunks are large enough
that it seems to do about as well as, say, Duplicity. I guess you
mean that bup is fast over the net even for small files--but since it
doesn't encrypt, it's not as useful for network backups for me.

Zoran Zaric

unread,
Mar 4, 2012, 9:39:51 PM3/4/12
to Avery Pennarun, bup-list
On Thu, Feb 23, 2012 at 04:24:10PM -0500, Avery Pennarun wrote:
> Alternatively, David Anderson recently repackaged the "upload yourself
> to the server" feature of my sshuttle program to make it more flexible
> and reusable. With that package, it should actually be possible for
> bup to upload itself to the remote end (even if the remote end doesn't
> have bup installed) and run the very efficient bup-server protocol
> that way. If you have sftp access, you probably have full ssh access,
> right? So that would be pretty powerful.

Do you have a link for us to read about this?

Thanks,
Zoran

t.riemen...@detco.de

unread,
Jul 4, 2012, 6:40:32 AM7/4/12
to bup-...@googlegroups.com
I have tested both on a server of mine, and obnam there is too slow, even to local storage:
currently I use duplicity for backing up, but that is suboptimal, because the backup-set contains many duplicate files, so I am looking for a program with deduplication.
The partition so be saved contains 70GB, spread over round about 2 million files.

The first obnam-backup (which thus has to save everything) took ~22 hours, the backup was de-duplicated down to 20GB (thats what obnam tells as "uploading 20GB"), the repository is 14 GB (so that is the gain of compression, I think)
A second backup uploaded 50MB, so there was not much change, but it still took 10h!

Then I tried bup (master-branch):
The initial "bup save" was done in about 2 to 4 hours (it started with about "2 hours remaining", and was done 4 hours later, but it ran in background, and I forgot to time it)
The initial "bup index" was also quite fast (I could wait for it to finish without loosing patience....), the repository is about the same size as with obnam
But additional backup-runs are FAST (because there isn't changing much, I think?):

server:~/src/bup# time ./bup index -ux /home ; time ./bup save -n home /home
Indexing: 2162381, done.
bup: merging indexes (2162978/2162978), done.

real    6m44.133s
user    3m57.839s
sys     0m19.953s
Reading index: 8100, done.
Saving: 100.00% (5571406/5571406k, 8100/8100 files), done.
bloom: adding 1 file (1193 objects).

real    4m21.420s
user    3m1.459s
sys     0m7.200s
server:~/src/bup#


PS: I just ran "du" on the repositories, because of the many small files, "du obnam" takes ages
server:~/src/bup# du -s /mnt/bup-test /mnt/obnam-test
12703592        /mnt/bup-test
14225812        /mnt/obnam-test
server:~/src/bup# find /mnt/bup-test/ |wc -l
71
server:~/src/bup# find /mnt/obnam-test/ |wc -l
387297

Some more info: the disk is a RAID-1 (linux-software-raid) over two harddisks:
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3750528AS      Rev: CC38
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: SAMSUNG HD753LJ  Rev: 1AA0
  Type:   Direct-Access                    ANSI  SCSI revision: 05

/mnt is a lvm-volume I created for this test, /home is also a lvm-volume

cu
Tim

Zoran Zaric

unread,
Jul 4, 2012, 8:18:23 AM7/4/12
to t.riemen...@detco.de, bup-...@googlegroups.com
Hello Tim,

On Wed, Jul 04, 2012 at 03:40:32AM -0700, t.riemen...@detco.de wrote:
> The partition so be saved contains 70GB, spread over round about 2 million
> files.
>
> The first obnam-backup (which thus has to save everything) took ~22 hours,
> the backup was de-duplicated down to 20GB (thats what obnam tells as
> "uploading 20GB"), the repository is 14 GB (so that is the gain of
> compression, I think)
> A second backup uploaded 50MB, so there was not much change, but it still
> took 10h!
>
> Then I tried bup (master-branch):
> The initial "bup save" was done in about 2 to 4 hours (it started with
> about "2 hours remaining", and was done 4 hours later, but it ran in
> background, and I forgot to time it)
> The initial "bup index" was also quite fast (I could wait for it to finish
> without loosing patience....), the repository is about the same size as
> with obnam
> But additional backup-runs are FAST (because there isn't changing much, I
> think?):
>
> PS: I just ran "du" on the repositories, because of the many small files,
> "du obnam" takes ages
> server:~/src/bup# du -s /mnt/bup-test /mnt/obnam-test
> 12703592 /mnt/bup-test
> 14225812 /mnt/obnam-test
> server:~/src/bup# find /mnt/bup-test/ |wc -l
> 71
> server:~/src/bup# find /mnt/obnam-test/ |wc -l
> 387297

thanks a lot for your numbers!

Thanks,
Zoran

Adam Porter

unread,
Jul 7, 2012, 5:49:40 PM7/7/12
to t.riemen...@detco.de, bup-...@googlegroups.com
On Wed, Jul 4, 2012 at 5:40 AM, <t.riemen...@detco.de> wrote:

> PS: I just ran "du" on the repositories, because of the many small files,
> "du obnam" takes ages
> server:~/src/bup# du -s /mnt/bup-test /mnt/obnam-test
> 12703592 /mnt/bup-test
> 14225812 /mnt/obnam-test
> server:~/src/bup# find /mnt/bup-test/ |wc -l
> 71
> server:~/src/bup# find /mnt/obnam-test/ |wc -l
> 387297

Does bup really use only 71 files for 20 GB of deduplicated data? If
so, I guess that's impressive. :) But over time, Obnam will let you
easily expire old backup data without having to rewrite large files to
actually free disk space.

Another interesting idea is to run Obnam locally and rsync the entire
repository to a remote server. This lets you make backups at local
speed, and rsync handles more efficiently the many small files that
Obnam creates (and you end up with a handy local backup, too). I'm
experimenting with this now. Lars is also giving thought to ways to
optimize or cache remote transfers, so Obnam will probably see speed
improvements over time.

Gabriel Filion

unread,
Jul 9, 2012, 12:06:23 AM7/9/12
to Adam Porter, t.riemen...@detco.de, bup-...@googlegroups.com
On 12-07-07 05:49 PM, Adam Porter wrote:
> On Wed, Jul 4, 2012 at 5:40 AM, <t.riemen...@detco.de> wrote:
>
>> PS: I just ran "du" on the repositories, because of the many small files,
>> "du obnam" takes ages
>> server:~/src/bup# du -s /mnt/bup-test /mnt/obnam-test
>> 12703592 /mnt/bup-test
>> 14225812 /mnt/obnam-test
>> server:~/src/bup# find /mnt/bup-test/ |wc -l
>> 71
>> server:~/src/bup# find /mnt/obnam-test/ |wc -l
>> 387297
>
> Does bup really use only 71 files for 20 GB of deduplicated data? If
> so, I guess that's impressive. :)

Yep, bup saves data to git-style .pack files, so it can pack up a whole
bunch of things together, hence the small amount of files.

> But over time, Obnam will let you
> easily expire old backup data without having to rewrite large files to
> actually free disk space.

That feature is one we'll want to include in bup very soon.

but like you're pointing out, bup will have to rewrite large files, like
git does when running "git gc" (or "git repack"), so it's possible it
will be slow and I/O intensive.

maybe there could be optimizations to help balance file number vs.
rewrite size when removing history. but we'll need to first get the
feature of expiring backups before optimizing it :P

> Another interesting idea is to run Obnam locally and rsync the entire
> repository to a remote server.

bup backups can be duplicated with rsync, too.

> This lets you make backups at local
> speed,

for bup, it lets you have a local copy of your backup, which could be
interesting if you really need to have backups easily accessible on the
machine.. but it takes up some disk space on the machine.

like Avery and others pointed out earlier, bup is pretty efficient with
what it transfers when doing a remote backup (bup save with the -r
option), so rsync is recommended only when duplicating a backup
repository to an additional machine.

> and rsync handles more efficiently the many small files that
> Obnam creates (and you end up with a handy local backup, too).

in my experience, rsync is pretty slow with a multitude of small files,
like pretty much all backup/transfer tools, since it needs to stat each
and every file to make sure they haven't changed from one
backup/transfer to another.

here's one peronally felt experience:

one web server has different web sites in /var/www. It's not that big
(around 24Gb), but there's a good amount of small files in there (around
150K files).

rsync'ing the whole server even though nothing much has changed (rsync
was run just before this run) takes about 1h43mins

rsync'ing a near-fresh transfer, while excluding this directory, takes
about 1min and a half

--
Gabriel Filion

Jos van den Oever

unread,
Jul 9, 2012, 2:21:22 AM7/9/12
to bup-...@googlegroups.com
On Monday 09 July 2012 06:06:23 AM Gabriel Filion wrote:
> On 12-07-07 05:49 PM, Adam Porter wrote:
>
> That feature is one we'll want to include in bup very soon.
>
> but like you're pointing out, bup will have to rewrite large files, like
> git does when running "git gc" (or "git repack"), so it's possible it
> will be slow and I/O intensive.
>
> maybe there could be optimizations to help balance file number vs.
> rewrite size when removing history. but we'll need to first get the
> feature of expiring backups before optimizing it :P

For faster removal of old data Bup could use ideas from Lucene. The text
search library Lucene has a documented file format that makes it very
efficient at writing and reading.

One important idea is to merge index files (or pack files in bup) when they
reach a certain size. This merging behaviour is determined by a merge factor,
e.g. 10.
When there are 10 files of size 1 MB, clucene will merge them into one 10 MB
file, discarding entries that are marked as deleted in the process. Then it
starts writing the next 1MB file until there are 10 again. They are merged to
form the second 10 MB file. When there are 10 10 MB files, they are merged
into one 100 MB file and so on.

In addition you can force rewrites. A rewrite will take a lot of disk space.
In the example of 10 files of 1 MB, 1 MB of extra space is needed while
merging.

http://lucene.apache.org/core/3_6_0/fileformats.html

Cheers,
Jos

Oei, YC

unread,
Jul 9, 2012, 8:33:44 AM7/9/12
to Jos van den Oever, z...@zoranzaric.de, bup-...@googlegroups.com
On 9 July 2012 07:21, Jos van den Oever <j...@vandenoever.info> wrote:
> One important idea is to merge index files (or pack files in bup) when they
> reach a certain size. This merging behaviour is determined by a merge factor,
> e.g. 10.

That algorithm sounds like an especially good fit for the (probably
very typical) use case of root partition backups; I find that on most
days, it's always the same smallish files that are changing. Repacking
only those wouldn't cost much time or space, but would allow for a bit
of delta-compression.

Zoran, I lost track a bit of what the status of the repack branch is -
can I just quickly ask what the open loops are currently, or where I
can go to read up on it? Thanks!

YC

Zoran Zaric

unread,
Jul 9, 2012, 5:56:43 PM7/9/12
to Oei, YC, Jos van den Oever, bup-...@googlegroups.com
On Mon, Jul 09, 2012 at 01:33:44PM +0100, Oei, YC wrote:
> Zoran, I lost track a bit of what the status of the repack branch is -
> can I just quickly ask what the open loops are currently, or where I
> can go to read up on it? Thanks!

My repack branch on GitHub is working -- somehow. For little repositories
there's no problem, but as the repo grows from some point on my tree traversal
algorithm breaks and yields elements multiple times. The repack step then gets
even slower.

It would really help if someone would read the traversal code and maybe find the
problem.

My current code is on GitHub in the locked-repack branch [1].

Thanks,
Zoran

[1]: https://github.com/zoranzaric/bup/tree/locked-repack

Zoran Zaric

unread,
Jul 9, 2012, 7:26:19 PM7/9/12
to Oei, YC, Jos van den Oever, bup-...@googlegroups.com
On Mon, Jul 09, 2012 at 01:33:44PM +0100, Oei, YC wrote:
> Zoran, I lost track a bit of what the status of the repack branch is -
> can I just quickly ask what the open loops are currently, or where I
> can go to read up on it? Thanks!

My repack command is working for small repos. With bigger repos my traversal
code starts to yield elements multiple times. I haven't found a threshhold for
"big" repos.

If somebody finds time to review the traversal code of my locked-repack branch
on GitHub this would be very helpful.

Thanks,
Zoran

Adam Porter

unread,
Jul 12, 2012, 9:44:10 PM7/12/12
to Gabriel Filion, t.riemen...@detco.de, bup-...@googlegroups.com
On Sun, Jul 8, 2012 at 11:06 PM, Gabriel Filion <lel...@gmail.com> wrote:
> here's one peronally felt experience:
>
> one web server has different web sites in /var/www. It's not that big
> (around 24Gb), but there's a good amount of small files in there (around
> 150K files).
>
> rsync'ing the whole server even though nothing much has changed (rsync
> was run just before this run) takes about 1h43mins
>
> rsync'ing a near-fresh transfer, while excluding this directory, takes
> about 1min and a half

That's interesting, but not entirely unexpected or unreasonable, I
suppose. How busy is the server besides rsync? Do you think the
files are pushed out of the cache between runs?

I think I was thinking more about initial transfers of many smaller
files, in which case there aren't any files on the server to compare
with.

Gabriel Filion

unread,
Jul 14, 2012, 4:29:41 PM7/14/12
to Adam Porter, t.riemen...@detco.de, bup-...@googlegroups.com


On 12-07-12 09:44 PM, Adam Porter wrote:
> On Sun, Jul 8, 2012 at 11:06 PM, Gabriel Filion <lel...@gmail.com> wrote:
>> here's one peronally felt experience:
>>
>> one web server has different web sites in /var/www. It's not that big
>> (around 24Gb), but there's a good amount of small files in there (around
>> 150K files).
>>
>> rsync'ing the whole server even though nothing much has changed (rsync
>> was run just before this run) takes about 1h43mins
>>
>> rsync'ing a near-fresh transfer, while excluding this directory, takes
>> about 1min and a half
>
> That's interesting, but not entirely unexpected or unreasonable, I
> suppose. How busy is the server besides rsync? Do you think the
> files are pushed out of the cache between runs?

actually in the case cited above, the server is doing almost nothing..
(it's hosting a great number of very low trafifc sites and the total
amount of traffic is now very high)

> I think I was thinking more about initial transfers of many smaller
> files, in which case there aren't any files on the server to compare
> with.

right. I'm guessing this case will be slow with all tools. the
bottleneck here is all the read operations on the disk. so for better
performance, you need to crank up the hw (e.g. like using an SSD )

--
Gabriel Filion
Reply all
Reply to author
Forward
0 new messages