Overhaul --remote argument semantics/syntax

0 views
Skip to first unread message

Rob Browning

unread,
Mar 2, 2026, 7:07:33 PM (2 days ago) Mar 2
to bup-...@googlegroups.com
Formalize the --remote argument syntax and semantics, including the
previously undocumented support for URLs in preparation for the
addition of URL oriented arguments (e.g. "bup get --source-url").

See the individual commit messages for additional information.

Proposed for main.

--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Rob Browning

unread,
Mar 2, 2026, 7:07:34 PM (2 days ago) Mar 2
to bup-...@googlegroups.com
Replace the current parse_remote URL parsing with a (hopefully) fairly
complete RFC 3986 parser via bup.url, but intentionally diverge from
the RFC in at least two ways for the current ssh: and bup: schemes.

First, because URLs with an authority (i.e. with a ://, so any URL
with a host/user/port) cannot express a relative path, support a "dot
encoding" where any URL path that begins with a /./ is taken to
indicate a relative path, so ssh://host/./x indicates the relative
path x.

Second, because URLs will primarily be provided via program arguments,
arrange for the "path" of the bup: and ssh: schemes to be all of the
uninterpreted (undecoded) bytes, if any, after the authority. This
allows their path to be provided on the command line without any extra
encoding (say for a filesystem path including a dir like "mkdir
$'foo\xb5'").

When parsing a non-URL remote, treat everything before the last @
character as the user, as SSH does.

Handle URL addresses as IPv[46]Address instances (previously they were
just bytes and so could alias with hostnames), and rework bup.client
to rely on URLs. Rename the client dir to path (just to avoid the
built-in), store the dot-decoded url path there, and don't mangle it.

Rename the internal bup+ssh scheme to just be ssh.

Make ssh.connect a bit stricter about the subcommand name.

Expand the remote parsing tests (URL and non-URL) accordingly, and add
tests for bup.url.

See the changes to bup(1) for some additional details, and note that
although we document file:, nothing actually allows it yet.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---
Documentation/bup.1.md | 46 +++++++++++++-
lib/bup/client.py | 101 ++++++++++++++++-------------
lib/bup/config.py | 7 ++-
lib/bup/repo/__init__.py | 4 +-
lib/bup/ssh.py | 2 +-
lib/bup/url.py | 133 +++++++++++++++++++++++++++++++++++++++
note/main.md | 2 +-
test/int/test_client.py | 64 +++++++++++--------
test/int/test_url.py | 71 +++++++++++++++++++++
9 files changed, 352 insertions(+), 78 deletions(-)
create mode 100644 lib/bup/url.py
create mode 100644 test/int/test_url.py

diff --git a/Documentation/bup.1.md b/Documentation/bup.1.md
index 3690a788..5194bde6 100644
--- a/Documentation/bup.1.md
+++ b/Documentation/bup.1.md
@@ -158,7 +158,8 @@ Subcommands are described in separate man pages. For example
# REMOTE OPTIONS

Some options (currently just `--reverse`) allow the specification of a
-remote path as either a URL or a `[*user*@]*host*:[*path*]`.
+remote path as either a URL (see `REPOSITORY URLS` below) or a
+`[*user*@]*host*:[*path*]`.

For either format, when there is no path, the default path on the
server will be used, and SSH settings for the connection can be
@@ -170,7 +171,9 @@ valid URL scheme prefix that contains an "authority" (meaning that it
begins with `SCHEME://` as `ssh://...` does), and the scheme must be
either `ssh` or `bup`; others will be rejected.

-For the `[*user*@]*host*:[*path*]` syntax the *host* must always be
+For the `[*user*@]*host*:[*path*]` syntax, if there is an @ symbol,
+then everything before the rightmost @ is included in the *user* so
+`-r x@y@z` indicates user `x@y`, host `z`. The *host* must always be
followed by a colon, and anything after the first colon is the *path*.

For fully general purposes, prefer URLs to `[*user*@]*host*:[*path*]`,
@@ -179,6 +182,45 @@ so that there is no potential ambiguity. For example, consider the
and path `//x/y` which would be interpreted as a URL with host `x` and
path `/y`.

+# REPOSITORY URLS
+
+Bup supports the following URL schemes (i.e. `scheme:`) for referring
+to a repository. Note that the term "authority" below just means the
+URL section after the `scheme://` and before the path, for example the
+"user@host:port" of an SSH URL.
+
+As an exception to the standard, a scheme may be "path-oriented",
+which means that there is no separate query or fragment. Anything
+after the (optional) authority is taken as the "path" and the
+constituent bytes are not decoded (e.g. percent decoded). This allows
+URLs provided on the command line to work naturally. So
+`ssh://host/x?z` has a path of `/x?z`.
+
+`ssh:`
+: A path-oriented scheme (see above) that specifies access to a
+ repository via a `bup-server(1)` launched on a host via SSH. This
+ scheme has syntax and semantics matching a typical `ssh:` URL,
+ including support for a user and port
+ (e.g. `ssh://user@host:2222/some/repo`), and the user and host can
+ be percent encoded.
+
+ As an extension to the standard, because URLs with an authority
+ cannot specify a relative path when there's an authority, a
+ leading `/./` is taken to indicate a relative path. So
+ `ssh://host/./x` indicates the path `x`.
+
+`bup:`
+: Specifies a direct network connection to to an existing
+ `bup-server(1)`. Otherwise identical to `ssh:`, except that it
+ does not support a user.
+
+`file:`
+: A path-oriented scheme (see above) that specifies a repository's
+ filesystem path. This scheme has syntax and semantics matching a
+ typical `file:` URL, except that it does not allow an authority
+ (i.e. user, host, etc.). To avoid rejection, ensure absolute
+ paths have a single leading slash, i.e. `file:/x`, not `file://x`.
+
# ENVIRONMENT

`BUP_ASSUME_GIT_VERSION_IS_FINE`
diff --git a/lib/bup/client.py b/lib/bup/client.py
index ee005682..b76f95a8 100644
--- a/lib/bup/client.py
+++ b/lib/bup/client.py
@@ -2,6 +2,7 @@
from binascii import hexlify, unhexlify
from contextlib import ExitStack, closing
from functools import partial
+from ipaddress import IPv4Address, IPv6Address
import os, re, struct, sys, time, zlib
import socket, shutil

@@ -24,6 +25,7 @@ from bup.helpers import \
DemuxConn)
from bup.io import path_msg as pm
from bup.path import index_cache
+from bup.url import parse_bytes_path_url, dot_decoded_url_path, dot_encoded_url
from bup.vint import read_vint, read_vuint, read_bvec, write_bvec


@@ -117,41 +119,35 @@ def _raw_write_bwlimit(f, buf, bwcount, bwtime):
return (bwcount, bwtime)


-_protocol_rs = br'([-a-z]+)://'
-_host_rs = br'(?P<sb>\[)?((?(sb)[0-9a-f:]+|[^:/]+))(?(sb)\])'
-_port_rs = br'(?::(\d+))?'
-_path_rs = br'(/.*)?'
-_url_rx = re.compile(br'%s(?:%s%s)?%s' % (_protocol_rs, _host_rs, _port_rs, _path_rs),
- re.I)
-
def parse_remote(remote):
def parse_non_url(remote):
- if b':' not in remote:
+ user, at, hostpath = remote.rpartition(b'@') # ssh x@y@z has user x@y
+ if b':' not in hostpath:
raise ClientError(f'remote {pm(remote)} has no colon')
- host, path = remote.split(b':', 1)
+ host, path = hostpath.split(b':', 1)
if host == b'-': # use a subprocess for testing
- return b'ssh', None, None, path if path else None
+ return dot_encoded_url(scheme=b'ssh', path=path)
if not host:
raise ClientError(f'remote {pm(remote)} has no host')
- return b'ssh', host, None, path
- if remote.startswith(b'file:'):
- raise ClientError(f'unexpected file scheme for {pm(remote)}')
- if remote.startswith(b'bup-rev://'):
- # It should be a hostname, so just make the value the host for now
- return b'bup-rev', remote[len(b'bup-rev://'):], None, None
- m = re.match(br'([a-zA-Z][-+.a-zA-Z0-9]+):', remote) # has valid scheme
- if m:
- scheme = m.group(1)
- if scheme not in (b'ssh', b'bup'):
- raise ClientError(f'unexpected {scheme} scheme for {pm(remote)}')
- if remote[3:6] != b'://':
- raise ClientError(f'{scheme} URL {pm(remote)} has no host')
- url_match = _url_rx.match(remote)
- if not url_match:
- raise ClientError(f'invalid URL {pm(remote)}')
- assert url_match.group(1) == scheme
- return url_match.group(1,3,4,5)
- return parse_non_url(remote)
+ return dot_encoded_url(scheme=b'ssh', host=host, user=user, path=path)
+ url = parse_bytes_path_url(remote, require_auth=True)
+ if not url:
+ return parse_non_url(remote)
+ if url.scheme == b'bup':
+ if url.user:
+ raise ClientError(f'bup URL {pm(remote)} has a user')
+ elif url.scheme in (b'ssh', b'bup'): # for now
+ if not url.host: # i.e. b''
+ raise ClientError(f'remote {pm(remote)} has no host')
+ elif url.scheme == b'bup-rev':
+ def raise_unexpected(attr):
+ raise ClientError(f'bup-rev remote {pm(remote)} has a {attr}')
+ if url.user: raise_unexpected('user')
+ if url.path: raise_unexpected('path')
+ if url.port is not None: raise_unexpected('port')
+ else:
+ raise ClientError(f'unexpected {pm(url.scheme)} scheme for {pm(remote)}')
+ return url


def _legacy_cache_id(remote, reversed=False):
@@ -215,14 +211,22 @@ class Client:
pass

class ViaSsh:
- def __init__(self, host, port):
+ def __init__(self, url):
self._closed = True # only false when ready for close
+ host = url.host
+ if isinstance(host, (IPv4Address, IPv6Address)):
+ host = host.compressed
+ elif not isinstance(host, bytes):
+ raise Exception(f'unexpected host type for {host}')
+ dest = host if not url.user else b'%s@%s' % (url.user, host)
try:
- # FIXME: ssh and file (ViaBup) shouldn't use the same module
- self._proc = ssh.connect(host, port, b'server')
+ self._proc = ssh.connect(dest, url.port, b'server')
except OSError as e:
raise ClientError('connect: %s' % e) from e
try:
+ assert not url.host, url
+ assert not url.user, url
+ assert url.port is None, url
self.conn = Conn(self._proc.stdout, self._proc.stdin)
except:
self._proc.terminate()
@@ -253,8 +257,13 @@ class Client:
raise ClientError(e) from e

class ViaBup:
- def __init__(self, host, port):
+ def __init__(self, url):
self._closed = True # only false when ready for close
+ host, port = url.host, url.port
+ if isinstance(host, (IPv4Address, IPv6Address)):
+ host = host.compressed
+ elif not isinstance(host, bytes):
+ raise Exception(f'unexpected host type for {host}')
with ExitStack() as ctx:
self._sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
ctx.enter_context(closing(self._sock))
@@ -308,14 +317,16 @@ class Client:
with ExitStack() as ctx:
self._call = partial(_TypicalCall, self)
self._line_based_call = partial(_LineBasedCall, self)
- self.protocol, self.host, self.port, self.dir = parse_remote(remote)
+ url = parse_remote(remote)
+ self.url = url
+ self.path = dot_decoded_url_path(url)
self._busy = None
- if self.protocol == b'bup-rev':
+ if url.scheme == b'bup-rev':
self._transport = Client.ViaBupRev()
- elif self.protocol == b'ssh':
- self._transport = Client.ViaSsh(self.host, self.port)
- elif self.protocol == b'bup':
- self._transport = Client.ViaBup(self.host, self.port)
+ elif url.scheme == b'ssh':
+ self._transport = Client.ViaSsh(url)
+ elif url.scheme == b'bup':
+ self._transport = Client.ViaBup(url)
else:
raise ClientError(f'unrecognized remote {pm(remote)}')
ctx.enter_context(self._transport)
@@ -323,15 +334,15 @@ class Client:
self._available_commands = self._get_available_commands()
self._require_command(b'init-dir')
self._require_command(b'set-dir')
- if self.dir:
- self.dir = re.sub(br'[\r\n]', b' ', self.dir)
+ if self.path:
+ mangled_path = re.sub(br'[\r\n]', b' ', self.path)
if create:
- self.conn.write(b'init-dir %s\n' % self.dir)
+ self.conn.write(b'init-dir %s\n' % mangled_path)
else:
- self.conn.write(b'set-dir %s\n' % self.dir)
+ self.conn.write(b'set-dir %s\n' % mangled_path)
self.check_ok()
- if self.protocol == b'bup-rev':
- self.cachedir = self._prep_cache(self.host, True)
+ if url.scheme == b'bup-rev':
+ self.cachedir = self._prep_cache(url.host, True)
else:
self.cachedir = self._prep_cache(remote, False)
self.sync_indexes()
diff --git a/lib/bup/config.py b/lib/bup/config.py
index d36e25a2..50736ed2 100644
--- a/lib/bup/config.py
+++ b/lib/bup/config.py
@@ -1,5 +1,6 @@

from os import environb as environ
+from urllib.parse import quote_from_bytes

import bup.path

@@ -14,7 +15,9 @@ def derive_repo_addr(*, remote, die):
if remote:
if reverse:
die("don't use -r in reverse mode; it's automatic")
- return b'bup+ssh://' + remote
+ return b'ssh://' + remote
if reverse:
- return (b'bup-rev://' + reverse)
+ # Since it should effectively always be a hostname provided by
+ # on--server, make it the URL host.
+ return b'bup-rev://' + quote_from_bytes(reverse, safe='').encode('ascii')
return b'file://' + bup.path.defaultrepo()
diff --git a/lib/bup/repo/__init__.py b/lib/bup/repo/__init__.py
index b04163f1..7dd7703c 100644
--- a/lib/bup/repo/__init__.py
+++ b/lib/bup/repo/__init__.py
@@ -15,8 +15,8 @@ def make_repo(address, create=False, compression_level=None,
if create:
LocalRepo.create(path)
return LocalRepo(repo_dir=path, **opts)
- if address.startswith(b'bup+ssh://'):
- address = address[len(b'bup+ssh://'):]
+ if address.startswith(b'ssh://'):
+ address = address[len(b'ssh://'):]
elif not address.startswith(b'bup-rev://'):
raise Exception(f'unrecognized repository address {address}')
return RemoteRepo(address, create=create, **opts)
diff --git a/lib/bup/ssh.py b/lib/bup/ssh.py
index c086c652..aff8e24c 100644
--- a/lib/bup/ssh.py
+++ b/lib/bup/ssh.py
@@ -20,7 +20,7 @@ def connect(destination, port, subcmd, stderr=None):
ssh.

"""
- assert not re.search(br'[^\w-]', subcmd)
+ assert re.fullmatch(br'[-_a-zA-Z0-9]+', subcmd), subcmd
if not destination:
if b'BUP_TEST_LEVEL' not in environ:
raise Exception(f'no ssh destination')
diff --git a/lib/bup/url.py b/lib/bup/url.py
new file mode 100644
index 00000000..7cff04fe
--- /dev/null
+++ b/lib/bup/url.py
@@ -0,0 +1,133 @@
+
+from ipaddress import IPv4Address, IPv6Address, ip_address
+from typing import Optional, Union
+from urllib.parse import unquote_to_bytes
+import re
+
+from bup.compat import dataclass
+from bup.io import path_msg as pm
+
+## Current schemes
+
+# bup-rev: for reversed connections; only the host should be set.
+# ssh: ssh connection to a bup server; has dot-encoded bytes path
+# bup: net connection to a bup server; has dot-encoded bytes path
+# file: filesystem repository; has dot-encoded bytes path
+
+
+@dataclass(slots=True, frozen=True)
+class URL:
+ scheme: bytes
+ host: Union[IPv4Address, IPv6Address, bytes] = b''
+ port: Optional[int] = None
+ user: bytes = b''
+ path: bytes = b''
+ auth: bool = False # there was an authority, even if empty (URL had a ://)
+ def __post_init__(self):
+ assert self.scheme, self.scheme
+ assert isinstance(self.host, (IPv4Address, IPv6Address, bytes)), self.host
+ assert isinstance(self.port, (int, type(None))), self.port
+ assert isinstance(self.user, bytes), self.user
+ assert isinstance(self.path, bytes), self.path
+ if not isinstance(self.auth, bool):
+ object.__setattr__(self, 'auth', bool(self.auth))
+
+# As an extension, some schemes support having a path with a leading
+# "/./" to indicate a relative path for a URL with an authority
+# (ie. one that starts with scheme://). For example
+# scheme://host/./foo and scheme:///./foo both specify the relative
+# path foo. See REPOSITORY URLS in bup(1).
+
+def dot_encode_path(path):
+ if path and not path.startswith(b'/'):
+ return b'/./' + path
+ return path
+
+def dot_encoded_url(**kwargs):
+ """Return a URL for kwargs, with a dot encoded path."""
+ # REVIEW: fine to leave auth alone for absolute paths?
+ path = kwargs.get('path', b'')
+ if not path:
+ kwargs['auth'] = True # i.e. scheme://MAYBE-AUTH/
+ elif not path.startswith(b'/'):
+ kwargs['auth'] = True
+ kwargs['path'] = b'/./' + path
+ return URL(**kwargs)
+
+def dot_decoded_url_path(url):
+ path = url.path
+ if not url.auth:
+ return path
+ if path.startswith(b'/./'):
+ return path[3:]
+ return path
+
+def for_path(path):
+ if path.startswith(b'://'):
+ return dot_encoded_url(scheme=b'file', path=path)
+ return URL(scheme=b'file', path=path)
+
+
+_scheme_and_rest_rx = re.compile(br'([a-zA-Z][-+.a-zA-Z0-9]*):(//)?(.*)')
+_userinfo_host_port_rx = re.compile(b"(?:([-._~0-9a-zA-Z!$&'()*+,;='%:]*)@)?(.*?)(?::([0-9]*))?")
+# ^------------- user ----------------^ ^--- port --^
+_host_reg_name_rx = re.compile(b"[-._~0-9a-zA-Z!$&'()*+,;='%]*")
+_port_int_rx = re.compile(br'[0-9]+')
+
+class ParseError(Exception): pass
+
+def parse_bytes_path_url(url, require_auth=False):
+ """Parse URL mostly according to RFC 3986. Return None if it
+ doesn't appear to be a URL at all (or doesn't start with a scheme
+ and authority when require_auth is true). Return a string
+ summarizing what's wrong if part of the URL is invalid
+ (e.g. "invalid host 'foo\xb5'"). Return a URL instance on
+ success.
+
+ Return the URL's path bytes without any interpretation or decoding
+ so that this function is suitable for URL-like references
+ referring to filesystem paths provided via the command line.
+ Parse the rest of the URL mostly according to the RFC, including
+ percent decoding the host and user.
+
+ RFC 3986 Uniform Resource Identifier (URI): Generic Syntax
+ https://datatracker.ietf.org/doc/html/rfc3986
+
+ """
+ def parse_addr(addr):
+ try:
+ return ip_address(addr.decode('ascii'))
+ except ValueError:
+ return None
+ m = _scheme_and_rest_rx.fullmatch(url)
+ if not m:
+ return None
+ scheme, slashes, rest = m.group(1, 2, 3)
+ if not slashes: # no authority (not even an empty one) x:... not x://...
+ if require_auth:
+ return None
+ return URL(scheme=scheme, path=rest, auth=False)
+ auth, slash, path = rest.partition(b'/')
+ if slash: path = b'/' + path
+ if not auth: # Use a subprocess for testing
+ return URL(scheme=scheme, path=path, auth=True) # auth, even if empty
+ m = _userinfo_host_port_rx.fullmatch(auth)
+ if not m:
+ user, host, port = b'', auth, None
+ else:
+ user, host, port = m.groups(b'')
+ user, colon, passwd = user.partition(b':')
+ user = unquote_to_bytes(user)
+ port = int(port) if port else None
+ # REVIEW: is ip_address exactly right for this?
+ if host and host[0] == b'['[0] and host[-1] == b']'[0]:
+ addr = parse_addr(host[1:-1])
+ if isinstance(addr, IPv6Address):
+ return URL(scheme=scheme, host=addr, port=port, user=user, path=path, auth=True)
+ addr = parse_addr(host)
+ if isinstance(addr, IPv4Address):
+ return URL(scheme=scheme, host=addr, port=port, user=user, path=path, auth=True)
+ if not _host_reg_name_rx.fullmatch(host):
+ return f'invalid host {pm(host)}'
+ host = unquote_to_bytes(host)
+ return URL(scheme=scheme, host=host, port=port, user=user, path=path, auth=True)
diff --git a/note/main.md b/note/main.md
index fb62948c..c5682dc2 100644
--- a/note/main.md
+++ b/note/main.md
@@ -88,7 +88,7 @@ General
value is treated as a URL if it begins with a syntactically valid
URL scheme prefix that contains an "authority" (meaning that it
begins with `SCHEME://` as `ssh://...` does) and anything else is
- interpreted as a `host:[path]` where the `host` is no longer
+ interpreted as a `[user@]host:[path]` where the `host` is no longer
optional). `file:` URLs are no longer allowed; the semantics were
potentially surprising (e.g. `file://p` would ssh to host `p`). Use
`ssh:` URLs instead. The URL support, though long standing, was
diff --git a/test/int/test_client.py b/test/int/test_client.py
index b48ebf2b..5ab40888 100644
--- a/test/int/test_client.py
+++ b/test/int/test_client.py
@@ -1,9 +1,12 @@

+from ipaddress import IPv4Address, IPv6Address
+from pytest import raises
import os, time, random, subprocess, glob
import pytest

from bup import client, git, path
-#from bup.client import ClientError
+from bup.url import URL
+from bup.client import ClientError
from bup.compat import environ
from bup.config import ConfigError
from bup.repo import LocalRepo
@@ -102,7 +105,7 @@ def test_dumb_client_server_conflict(tmpdir):
open(git.repo(b'bup-dumb-server'), 'w').close()
ex((b'git', b'config', b'bup.server.deduplicate-writes', b'true'))
# FIXME: propagate server ConfigError to Client()
- with pytest.raises(ConfigError) as ex_info, \
+ with raises(ConfigError) as ex_info, \
LocalRepo() as repo:
repo.config_get(b'bup.server.deduplicate-writes', opttype='bool')
assert str(ex_info.value) \
@@ -169,22 +172,36 @@ def test_midx_refreshing(tmpdir):


def test_remote_parsing():
- tests = (
- (b'-:/bup', (b'ssh', None, None, b'/bup')),
- (b'192.168.1.1:/bup', (b'ssh', b'192.168.1.1', None, b'/bup')),
- (b'ssh://192.168.1.1:2222/bup', (b'ssh', b'192.168.1.1', b'2222', b'/bup')),
- (b'ssh://[ff:fe::1]:2222/bup', (b'ssh', b'ff:fe::1', b'2222', b'/bup')),
- (b'bup://foo.com:1950', (b'bup', b'foo.com', b'1950', None)),
- (b'bup://foo.com:1950/bup', (b'bup', b'foo.com', b'1950', b'/bup')),
- (b'bup://[ff:fe::1]/bup', (b'bup', b'ff:fe::1', None, b'/bup')),
- (b'bup://[ff:fe::1]/bup', (b'bup', b'ff:fe::1', None, b'/bup')),
- (b'bup-rev://', (b'bup-rev', b'', None, None)),
- (b'bup-rev://host/dir', (b'bup-rev', b'host/dir', None, None)),
- )
- for remote, values in tests:
- assert client.parse_remote(remote) == values
-
- with pytest.raises(client.ClientError):
+ def ssh(**kwargs): return URL(scheme=b'ssh', **kwargs)
+ def ssha(**kwargs): return URL(scheme=b'ssh', auth=True, **kwargs)
+ def bup(**kwargs): return URL(scheme=b'bup', **kwargs)
+ def bupa(**kwargs): return URL(scheme=b'bup', auth=True, **kwargs)
+ def bup_rev(**kwargs): return URL(scheme=b'bup-rev', **kwargs)
+ ip4 = IPv4Address
+ ip6 = IPv6Address
+ pr = client.parse_remote
+ with raises(ClientError, match='remote : has no host'): pr(b':')
+ with raises(ClientError, match='remote :x has no host'): pr(b':x')
+ assert pr(b'x:') == ssh(host=b'x', auth=True)
+ assert pr(b'x:y') == ssh(host=b'x', path=b'/./y', auth=True)
+ assert pr(b'x:y:z') == ssh(host=b'x', path=b'/./y:z', auth=True)
+ assert pr(b'u@x:') == ssh(host=b'x', user=b'u', auth=True)
+ assert pr(b'u@u@x:') == ssh(host=b'x', user=b'u@u', auth=True)
+ assert pr(b'u@x:/') == ssh(host=b'x', user=b'u', path=b'/', auth=False)
+ assert pr(b'w:x@y:z') == ssh(host=b'y', user=b'w:x', path=b'/./z', auth=True)
+ assert pr(b'-:/bup') == ssh(path=b'/bup')
+ assert pr(b'192.168.1.1:/bup') == ssh(host=b'192.168.1.1', path=b'/bup')
+ assert pr(b'ssh://192.168.1.1:2222/bup') == ssha(host=ip4('192.168.1.1'), port=2222, path=b'/bup')
+ assert pr(b'ssh://[ff:fe::1]:2222/bup') == ssha(host=ip6('ff:fe::1'), port=2222, path=b'/bup')
+ assert pr(b'bup://foo.com:1950') == bupa(host=b'foo.com', port=1950)
+ assert pr(b'bup://foo.com:1950/bup') == bupa(host=b'foo.com', port=1950, path=b'/bup')
+ assert pr(b'bup://[ff:fe::1]/bup') == bupa(host=ip6('ff:fe::1'), path=b'/bup')
+ assert pr(b'bup://[ff:fe::1]/bup') == bupa(host=ip6('ff:fe::1'), path=b'/bup')
+ assert pr(b'bup-rev://%2f') == bup_rev(host=b'/', auth=True)
+ with raises(ClientError, match='has a port'): pr(b'bup-rev://:1')
+ with raises(ClientError, match='has a user'): pr(b'bup-rev://u@')
+ with raises(ClientError, match='has a path'): pr(b'bup-rev:///dir')
+ with raises(ClientError, match='unexpected http scheme'):
client.parse_remote(b'http://asdf.com/bup')


@@ -198,9 +215,9 @@ def test_legacy_cache_ids():
assert not remote, remote
return client._legacy_cache_id(reverse, True)
return client._legacy_cache_id(remote)
- with pytest.raises(AssertionError):
+ with raises(AssertionError):
assert cid(b'x', b'y')
- with pytest.raises(TypeError):
+ with raises(TypeError):
cid(None, None)
# remotes
assert cid(None, b'') == b'None_'
@@ -210,9 +227,6 @@ def test_legacy_cache_ids():
assert cid(None, b'h:') == b'h_'
assert cid(None, b':p') == b'None_p'
assert cid(None, b'h:p') == b'h_p'
- # FIXME: document unusual -r behavior if we're not going to change it, e.g.
- # file:p means ssh with host file, path p
- # file://p means a "file" with host p and path ''
assert cid(None, b'file:p') == b'file_p'
assert cid(None, b'file:/p') == b'file__p'
assert cid(None, b'file://p') == b'p_None' # bug if not rejected elsewhere?
@@ -225,7 +239,7 @@ def test_legacy_cache_ids():

# reverses - note that on__server always sets BUP_SERVER_REVERSE
# to the hostname so most of these cases *should* be irrelevant.
- with pytest.raises(TypeError):
+ with raises(TypeError):
cid(b'', None)
assert cid(b':', None) == b'None__'
assert cid(b'-', None) == b'None_'
@@ -246,6 +260,6 @@ def test_config(tmpdir):
assert c.config_get(b'bup.split.trees', opttype='int') == 0
ex((b'git', b'config', b'bup.split.trees', b'1'))
assert c.config_get(b'bup.split.trees', opttype='bool') == True
- with pytest.raises(PermissionError) as exinfo:
+ with raises(PermissionError) as exinfo:
c.config_get(b'bup.not-an-allowed-key')
assert 'does not allow remote access' in str(exinfo.value)
diff --git a/test/int/test_url.py b/test/int/test_url.py
new file mode 100644
index 00000000..7b56c3bb
--- /dev/null
+++ b/test/int/test_url.py
@@ -0,0 +1,71 @@
+
+from ipaddress import IPv4Address, IPv6Address
+
+from bup.url import URL
+import bup.url
+
+
+def test_dot_encode_path():
+ enc = bup.url.dot_encode_path
+ assert enc(b'') == b''
+ assert enc(b'x') == b'/./x'
+ assert enc(b'/x') == b'/x'
+
+def test_dot_encoded_url():
+ enc = bup.url.dot_encoded_url
+ assert enc(scheme=b'x', path=b'') == URL(scheme=b'x', path=b'', auth=True)
+ assert enc(scheme=b'x', path=b'x') == URL(scheme=b'x', path=b'/./x', auth=True)
+ assert enc(scheme=b'x', path=b'/x') == URL(scheme=b'x', path=b'/x', auth=False)
+ assert enc(scheme=b'x', path=b'/x', auth=True) == URL(scheme=b'x', path=b'/x', auth=True)
+
+def test_dot_decoded_url_path():
+ dec = bup.url.dot_decoded_url_path
+ assert dec(URL(scheme=b'x', path=b'', auth=True)) == b''
+ assert dec(URL(scheme=b'x', path=b'/./x', auth=True)) == b'x'
+ assert dec(URL(scheme=b'x', path=b'/x', auth=False)) == b'/x'
+ assert dec(URL(scheme=b'x', path=b'/x', auth=True)) == b'/x'
+
+# FIXME: more error paths
+
+def test_parse_bytes_path_url():
+ def urla(**kwargs): return URL(auth=True, **kwargs)
+ ip4 = IPv4Address
+ ip6 = IPv6Address
+ parse = bup.url.parse_bytes_path_url
+ assert parse(b'x:') == URL(scheme=b'x')
+ assert parse(b'x:', require_auth=True) is None
+ assert parse(b'x:/', require_auth=True) is None
+ assert parse(b'x://', require_auth=True) == urla(scheme=b'x')
+ assert parse(b'x:') == URL(scheme=b'x')
+ assert parse(b'x:p') == URL(scheme=b'x', path=b'p')
+ assert parse(b'x://h') == urla(scheme=b'x', host=b'h')
+ assert parse(b'x://192.168.1.1') == urla(scheme=b'x', host=ip4('192.168.1.1'))
+ assert parse(b'x://[::]') == urla(scheme=b'x', host=ip6('::'))
+ assert parse(b'x://[ff::1]') == urla(scheme=b'x', host=ip6('ff::1'))
+ assert parse(b'x://-') == urla(scheme=b'x', host=b'-')
+ assert parse(b'x:///p') == urla(scheme=b'x', path=b'/p')
+ assert parse(b'x:///\xb5') == urla(scheme=b'x', path=b'/\xb5')
+ assert parse(b'x://h/') == urla(scheme=b'x', host=b'h', path=b'/')
+ assert parse(b'x://-/') == urla(scheme=b'x', host=b'-', path=b'/')
+ assert parse(b'x://:') == urla(scheme=b'x')
+ assert parse(b'x://:/') == urla(scheme=b'x', path=b'/')
+ assert parse(b'x://:1') == urla(scheme=b'x', port=1)
+ assert parse(b'x://p:1') == urla(scheme=b'x', host=b'p', port=1)
+ assert parse(b'x://p:1/') == urla(scheme=b'x', host=b'p', port=1, path=b'/')
+ assert parse(b'x://@/') == urla(scheme=b'x', path=b'/')
+ assert parse(b'x://@:/') == urla(scheme=b'x', path=b'/')
+ assert parse(b'x://u@/') == urla(scheme=b'x', user=b'u', path=b'/')
+ assert parse(b'x://@h/') == urla(scheme=b'x', host=b'h', path=b'/')
+ assert parse(b'x://u@h:1') == urla(scheme=b'x', host=b'h', port=1, user=b'u')
+ assert parse(b'x://u@h:1/') == urla(scheme=b'x', host=b'h', port=1, user=b'u', path=b'/')
+ assert parse(b'x://%75@h:1') == urla(scheme=b'x', host=b'h', port=1, user=b'u')
+ assert parse(b'x://%75@h:1/') == urla(scheme=b'x', host=b'h', port=1, user=b'u', path=b'/')
+ assert parse(b'x://u@%68:1') == urla(scheme=b'x', host=b'h', port=1, user=b'u')
+ assert parse(b'x://u@%68:1/') == urla(scheme=b'x', host=b'h', port=1, user=b'u', path=b'/')
+ assert parse(b'x://u@%68:1/p') == urla(scheme=b'x', host=b'h', port=1, user=b'u', path=b'/p')
+ assert parse(b'ssh://u@%68:1/p') == urla(scheme=b'ssh', host=b'h', port=1, user=b'u', path=b'/p')
+ assert parse(b':') is None
+ assert parse(b':y') is None
+ assert parse(b'-') is None
+ assert parse(b'-:') is None
+ assert parse(b'x://h:x') == 'invalid host h:x'
--
2.47.3

Reply all
Reply to author
Forward
0 new messages