[PATCH 1/1] Use bup.repo.id for client index-cache/NAME when possible

0 views
Skip to first unread message

Rob Browning

unread,
Apr 16, 2025, 2:52:08 PM4/16/25
to bup-...@googlegroups.com, Johannes Berg
From: Johannes Berg <joha...@sipsolutions.net>

If you have multiple repos on the same machine, or your host name
changes, or such a thing then the index-cache always gets very
confused and can end up downloading everything over and over again.

So whenever we can fetch a bup.repo.id from the remote, use that,
prefixed with "id--" so it can't collide with the existing names
(because they currently can't have a dash) for the client index-cache
directory name.

Also try to move the cache from the old location to the new one if the
old one exists, so the upgrade path doesn't re-download it all.

Signed-off-by: Johannes Berg <joha...@sipsolutions.net>
Reviewed-by: Rob Browning <r...@defaultvalue.org>
[r...@defaultvalue.org: rebase; handle servers without config-get]
[r...@defaultvalue.org: add id-- prefix]
[r...@defaultvalue.org: add release note]
[r...@defaultvalue.org: adjust commit message]
Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---

Now that we have the bup.repo.id, use it for the index-cache name,
rather than the previous host/path based names that might well
collide.

Proposed for main.

lib/bup/client.py | 39 +++++++++++++++++++++++++++++----------
lib/bup/protocol.py | 3 ++-
note/main.md | 5 +++++
test/ext/test-on | 2 +-
4 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/lib/bup/client.py b/lib/bup/client.py
index 82597599..15846195 100644
--- a/lib/bup/client.py
+++ b/lib/bup/client.py
@@ -3,9 +3,9 @@ from binascii import hexlify, unhexlify
from contextlib import closing
from functools import partial
import os, re, struct, sys, time, zlib
-import socket
+import socket, shutil

-from bup import git, ssh, vint, protocol, path
+from bup import git, ssh, vint, protocol
from bup.git import PackWriter
from bup.helpers import \
(Conn,
@@ -23,6 +23,7 @@ from bup.helpers import \
qprogress,
DemuxConn)
from bup.io import path_msg
+from bup.path import indexcache
from bup.vint import read_vint, read_vuint, read_bvec, write_bvec


@@ -241,6 +242,31 @@ class Client:
closing(self._sockw):
pass

+ def _prep_cache(self, host, port, path):
+ # Set up the index-cache directory, prefer repo-id derived
+ # dirs when the remote repo has one (that can be accessed).
+ repo_id = None
+ if b'config-get' in self._available_commands:
+ try:
+ repo_id = self.config_get(b'bup.repo.id')
+ except PermissionError:
+ repo_id = None
+ # The b'None' here matches python2's behavior of b'%s' % None == 'None',
+ # python3 will (as of version 3.7.5) do the same for str ('%s' % None),
+ # but crashes instead when doing b'%s' % None.
+ legacy = indexcache(b':'.join((b'None' if host is None else host,
+ b'None' if path is None else path)))
+ if repo_id is None:
+ return legacy
+ # legacy ids can't include -, so avoid aliasing with an id--
+ # prefix, and terminate with double-dash to leave some future
+ # flexibility.
+ new = indexcache(b'id--' + repo_id)
+ # upgrade path - if we have the old but not the new name, move it
+ if os.path.exists(legacy) and not os.path.exists(new):
+ shutil.move(legacy, new)
+ return new
+
def __init__(self, remote, create=False):
# only hand over to __del__ -> close() if complete, which
# means it's fine to initialize attrs incrementally.
@@ -249,14 +275,6 @@ class Client:
self._call = partial(_TypicalCall, self)
self._line_based_call = partial(_LineBasedCall, self)
self.protocol, self.host, self.port, self.dir = parse_remote(remote)
- # The b'None' here matches python2's behavior of b'%s' % None == 'None',
- # python3 will (as of version 3.7.5) do the same for str ('%s' % None),
- # but crashes instead when doing b'%s' % None.
- cachehost = b'None' if self.host is None else self.host
- cachedir = b'None' if self.dir is None else self.dir
- self.cachedir = path.indexcache(re.sub(br'[^@\w]',
- b'_',
- b'%s:%s' % (cachehost, cachedir)))
self._busy = None
if self.protocol == b'bup-rev':
self._transport = Client.ViaBupRev()
@@ -276,6 +294,7 @@ class Client:
else:
self.conn.write(b'set-dir %s\n' % self.dir)
self.check_ok()
+ self.cachedir = self._prep_cache(self.host, self.port, self.dir)
self.sync_indexes()
ctx.pop_all()
self.closed = False
diff --git a/lib/bup/protocol.py b/lib/bup/protocol.py
index b39e9875..99109b50 100644
--- a/lib/bup/protocol.py
+++ b/lib/bup/protocol.py
@@ -385,7 +385,8 @@ class Server:
assert not args
key, opttype = vint.recv(self.conn, 'ss')
# git is case-insensitve, and the client sends lower-case
- if key in (b'bup.split.trees',
+ if key in (b'bup.repo.id',
+ b'bup.split.trees',
b'bup.split.files',
b'pack.packsizelimit',
b'core.compression',
diff --git a/note/main.md b/note/main.md
index a60831cc..1d6c5c2f 100644
--- a/note/main.md
+++ b/note/main.md
@@ -56,6 +56,11 @@ General
creation, or when run again on an existing repository. See
`bup-config`(5) for more information.

+* The REMOTE directory name in the client index cache (typically
+ `~/.bup/index-cache/REMOTE`) is now the `bup.repo.id` when the
+ remote repository provides, one and existing directories will be
+ renamed when appropriate and possible.
+
* `bup init DIRECTORY` is now supported, and places the repository in
the given `DIRECTORY` which takes precedence over `-d` and
`BUP_DIR`.
diff --git a/test/ext/test-on b/test/ext/test-on
index d0aa893d..2edabfc0 100755
--- a/test/ext/test-on
+++ b/test/ext/test-on
@@ -48,7 +48,7 @@ WVSTART "index-cache"
# the trailing _ is because there's no dir specified
# and that should thus be empty
hostname=$(uname -n)
-idxcache=$(echo "$hostname" | sed 's/[^@a-zA-Z0-9_]/_/g')_
+idxcache="id--$(git config --file "$tmpdir"/bup/config bup.repo.id)"
# there should be an index-cache now
for idx in "$tmpdir"/bup/objects/pack/*.idx ; do
cachedidx="$tmpdir/bup/index-cache/$idxcache/$(basename "$idx")"
--
2.47.2

Rob Browning

unread,
Apr 26, 2025, 6:09:20 PM4/26/25
to bup-...@googlegroups.com, Johannes Berg
Rob Browning <r...@defaultvalue.org> writes:

> From: Johannes Berg <joha...@sipsolutions.net>
>
> If you have multiple repos on the same machine, or your host name
> changes, or such a thing then the index-cache always gets very
> confused and can end up downloading everything over and over again.
>
> So whenever we can fetch a bup.repo.id from the remote, use that,
> prefixed with "id--" so it can't collide with the existing names
> (because they currently can't have a dash) for the client index-cache
> directory name.
>
> Also try to move the cache from the old location to the new one if the
> old one exists, so the upgrade path doesn't re-download it all.
>
> Signed-off-by: Johannes Berg <joha...@sipsolutions.net>
> Reviewed-by: Rob Browning <r...@defaultvalue.org>
> [r...@defaultvalue.org: rebase; handle servers without config-get]
> [r...@defaultvalue.org: add id-- prefix]
> [r...@defaultvalue.org: add release note]
> [r...@defaultvalue.org: adjust commit message]
> Signed-off-by: Rob Browning <r...@defaultvalue.org>
> Tested-by: Rob Browning <r...@defaultvalue.org>
> ---
>
> Now that we have the bup.repo.id, use it for the index-cache name,
> rather than the previous host/path based names that might well
> collide.
>
> Proposed for main.

Pushed.

--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
Reply all
Reply to author
Forward
0 new messages