bup re-index slow

86 views
Skip to first unread message

Niklas Hambüchen

unread,
Oct 30, 2012, 6:02:20 PM10/30/12
to bup-...@googlegroups.com
Hi,

why is it that bup index being run the first time runs in 18s for me,
but subsequent runs take 36s?

I find it weird that a re-index takes longer - could I not just discard
the old index if this is so? Also, where does the increased time come
from? I would expect that listing all files is the majority work for
index, and that is the same for both cases.

Can you explain me what is going on here?

Thank you
Niklas

Avery Pennarun

unread,
Oct 30, 2012, 6:08:34 PM10/30/12
to Niklas Hambüchen, bup-...@googlegroups.com
On Tue, Oct 30, 2012 at 6:02 PM, Niklas Hambüchen <ma...@nh2.me> wrote:
> why is it that bup index being run the first time runs in 18s for me,
> but subsequent runs take 36s?
>
> I find it weird that a re-index takes longer - could I not just discard
> the old index if this is so? Also, where does the increased time come
> from? I would expect that listing all files is the majority work for
> index, and that is the same for both cases.

The second time, bup needs to merge changes between the old index file
and the new one, which is a slow process because the bupindex file
format has not been well-optimized. It involves grinding through the
old and new files in python, which is slow.

We could either come up with a better file format (one that doesn't
require totally rewriting the file to add new entries) or rewrite the
bupindex code in C, which would probably be "fast enough" for
virtually all use cases.

Have fun,

Avery

Niklas Hambüchen

unread,
Oct 30, 2012, 6:13:14 PM10/30/12
to Avery Pennarun, bup-...@googlegroups.com
Hello Avery,

thanks for your swift answer! That makes sense.

It would be interesting to know how much speed-up you would get from a C
(or Cython?) implementation (the nice thing is that this would also
speed up initial indexing).

Indexing is what takes most time on my servers.

On 30/10/12 22:08, Avery Pennarun wrote:

Damien Robert

unread,
Oct 31, 2012, 9:00:58 AM10/31/12
to bup-...@googlegroups.com

Niklas Hambüchen wrote in message <50904E6C...@nh2.me>:
> why is it that bup index being run the first time runs in 18s for me,
> but subsequent runs take 36s?

Hi Niklas,

This is not an answer, but if you are providing lot of paths to bup index,
you can try my patch here
https://groups.google.com/forum/?fromgroups=#!topic/bup-list/SxS3-P4Oars
it should improve the speed of bup index.

As the commit says:

The current implementation of bup index go through each path in paths, and
call update_index on it. Each update_index write a temporary index
corresponding to this path, and merge this index to the bupindex. When
passing a lot of paths we get a lot of merges and this is slow.

I modified the code so that update_index now take the list of paths. We
then iterate through the path inside update_index to write in the temporary
index, so that we only have to merge to the bupindex once.

and in my setup (where I use a script to generate the files I want to
save), so I typically call bup index with a lot of files, it gives a hudge
speed-up.

Rob Browning

unread,
Nov 3, 2012, 11:58:40 AM11/3/12
to Avery Pennarun, Niklas Hambüchen, bup-...@googlegroups.com
Avery Pennarun <apen...@gmail.com> writes:

> The second time, bup needs to merge changes between the old index file
> and the new one, which is a slow process because the bupindex file
> format has not been well-optimized. It involves grinding through the
> old and new files in python, which is slow.
>
> We could either come up with a better file format (one that doesn't
> require totally rewriting the file to add new entries) or rewrite the
> bupindex code in C, which would probably be "fast enough" for
> virtually all use cases.

Avery,

Given our discussion about metadata (cf. "bup save could fail trying to
read the metadata of a deleted file"[1]), we're already contemplating the
possibility/wisdom of changing the index format to include metadata:

Any thoughts?

[1] http://thread.gmane.org/gmane.comp.sysutils.backup.bup/1657

Thanks
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Rob Browning

unread,
Nov 3, 2012, 3:21:25 PM11/3/12
to Avery Pennarun, Niklas Hambüchen, bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> Given our discussion about metadata (cf. "bup save could fail trying to
> read the metadata of a deleted file"[1]), we're already contemplating the
> possibility/wisdom of changing the index format to include metadata:
>
> Any thoughts?

Assuming I understand the bits I've read so far, it looks like the
current index implementation may rely on the fact that index entries
(not the names) are fixed length (i.e. repack()). Obviously, if the
metadata is directly included, that will no longer be true.
Reply all
Reply to author
Forward
0 new messages