Assertion failure

7 views
Skip to first unread message

Mark J Hewitt

unread,
May 11, 2023, 4:42:53 AM5/11/23
to bup-...@googlegroups.com
I have a number of small (mostly Raspberry Pi) systems that use bup to
backup to a bup server.  The current master is now failing to build for me:

bup: lib/bup/_hashsplit.c:663: hashsplit_init: Assertion `sizeof(off_t)
<= sizeof(size_t)' failed.

which is because off_t = 8, size_t = 4

Is that the end of supporting small systems?

Mark

--
Mark J Hewitt

Rob Browning

unread,
May 11, 2023, 2:11:57 PM5/11/23
to m.he...@computer.org, bup-...@googlegroups.com
Nope. Most, if not all of those assertions are there to guard
(generally simplifying) assumptions that current code depends on.

This likely just means we'll have to make adjustments to avoid depending
on the assertion.

I'll plan to take a look later.

Thanks for the report.
--
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Greg Troxel

unread,
May 11, 2023, 2:22:29 PM5/11/23
to Rob Browning, m.he...@computer.org, bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> Mark J Hewitt <m...@idnet.com> writes:
>
>> I have a number of small (mostly Raspberry Pi) systems that use bup to
>> backup to a bup server.  The current master is now failing to build for me:
>>
>> bup: lib/bup/_hashsplit.c:663: hashsplit_init: Assertion `sizeof(off_t)
>> <= sizeof(size_t)' failed.
>>
>> which is because off_t = 8, size_t = 4
>>
>> Is that the end of supporting small systems?
>
> Nope. Most, if not all of those assertions are there to guard
> (generally simplifying) assumptions that current code depends on.
>
> This likely just means we'll have to make adjustments to avoid depending
> on the assertion.
>
> I'll plan to take a look later.

Having off_t 8 and size_t 4 is not only POSIX-compliant but normal on
systems with 32-bit CPU types that natively support what used to be
called large files.

I would suggest that bup just try not to mix types, and drop this
assertion.

I see this assertion fire on NetBSD 9 earmv7hf-el and NetBSD 9 i386.

./bup features
assertion "sizeof(off_t) <= sizeof(size_t)" failed: file "lib/bup/_hashsplit.c", line 662, function "hashsplit_init"


I wonder if those self-> variables should just be all off_t. It doesn't
seem right to cross-assign off_t (which is about file positions) and
size_t (which is about the size on in-memory objects).

Johannes Berg

unread,
May 11, 2023, 3:30:27 PM5/11/23
to Greg Troxel, Rob Browning, m.he...@computer.org, bup-...@googlegroups.com
On Thu, 2023-05-11 at 14:22 -0400, Greg Troxel wrote:
>
> Having off_t 8 and size_t 4 is not only POSIX-compliant but normal on
> systems with 32-bit CPU types that natively support what used to be
> called large files.
>
> I would suggest that bup just try not to mix types, and drop this
> assertion.
>
> I see this assertion fire on NetBSD 9 earmv7hf-el and NetBSD 9 i386.
>
> ./bup features
> assertion "sizeof(off_t) <= sizeof(size_t)" failed: file "lib/bup/_hashsplit.c", line 662, function "hashsplit_init"
>
>
> I wonder if those self-> variables should just be all off_t. It doesn't
> seem right to cross-assign off_t (which is about file positions) and
> size_t (which is about the size on in-memory objects).

Hmm. Most of the size_t ones are about in-memory objects?

Though agree that >uncached/>read (and then the local 'len' in _uncache
and maybe a few more there) should be off_t, and I guess those were the
"cross-assign" you're referring to?

johannes

Greg Troxel

unread,
May 11, 2023, 6:16:49 PM5/11/23
to Johannes Berg, Rob Browning, m.he...@computer.org, bup-...@googlegroups.com
Yes, I meant those. I did not study the rest of self.

Stepping back, any assignment between variables/expressions where one is
off_t and one is size_t feels like a bug. That's probably a bit
overstated, but I suspect it's mostly right.

Rob Browning

unread,
May 11, 2023, 10:32:31 PM5/11/23
to Greg Troxel, Johannes Berg, m.he...@computer.org, bup-...@googlegroups.com
Greg Troxel <g...@lexort.com> writes:

> Yes, I meant those. I did not study the rest of self.
>
> Stepping back, any assignment between variables/expressions where one is
> off_t and one is size_t feels like a bug. That's probably a bit
> overstated, but I suspect it's mostly right.

Many of the assertions were to just avoid extra work/complexity that
might not be worth it (though there's also an implicit, but likely I
suspect fine these days, twos-complement assumption in some of them
too).

One other complicating factor is that (c)python doesn't have solid
unsigned handling in its python -> C api, i.e. in a number of cases it
clearly states it won't check for overflow when the argument is
unsigned.

And python doesn't/didn't provide many of the posix types, other than
some handling for ssize_t, e.g. Py_ssize_t, etc. i.e. no off_t, mode_t,
gid_t, ...

In any case, I'd just assumed we'd fix the code (adding the relevant
conversions, etc.) if/when (as now) the assertions failed somewhere.

Thanks

Rob Browning

unread,
May 12, 2023, 3:05:20 AM5/12/23
to Johannes Berg, Greg Troxel, m.he...@computer.org, bup-...@googlegroups.com
Johannes Berg <joha...@sipsolutions.net> writes:

> Though agree that >uncached/>read (and then the local 'len' in _uncache
> and maybe a few more there) should be off_t, and I guess those were the
> "cross-assign" you're referring to?

Hmm, I'm now wondering if we (I?) might have gotten the assertion
backwards, regardless:

off_t start = self->uncached; // see assumptions (off_t <= size_t)

suggests we assume sizeof(size_t) <= sizeof(off_t), not the reverse?

(I also noticed at least another place where we might need an overflow
check to be strictly correct.)

Thanks

Greg Troxel

unread,
May 12, 2023, 7:09:50 AM5/12/23
to Rob Browning, Johannes Berg, m.he...@computer.org, bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> One other complicating factor is that (c)python doesn't have solid
> unsigned handling in its python -> C api, i.e. in a number of cases it
> clearly states it won't check for overflow when the argument is
> unsigned.
>
> And python doesn't/didn't provide many of the posix types, other than
> some handling for ssize_t, e.g. Py_ssize_t, etc. i.e. no off_t, mode_t,
> gid_t, ...

I'm not 100% following, but if this is all in C, why does it matter what
Python provides? Doesn't it only matter if we try to turn a value into
a python value?

Rob Browning

unread,
May 12, 2023, 7:31:28 PM5/12/23
to Greg Troxel, Johannes Berg, m.he...@computer.org, bup-...@googlegroups.com
Greg Troxel <g...@lexort.com> writes:

> I'm not 100% following, but if this is all in C, why does it matter what
> Python provides? Doesn't it only matter if we try to turn a value into
> a python value?

In a any number of cases we gather values on the C side that we need to
return to Python (e.g. xstat), or (more likely to have the issue) we
pass values from python to our C code.

For example

https://codeberg.org/bup/bup/src/branch/main/lib/bup/_helpers.c#L1381-L1404

or

https://codeberg.org/bup/bup/src/branch/main/lib/bup/_helpers.c#L955-L956

Oh, and for the typical way to handle values (which has the unsigned
problem in various places):

https://docs.python.org/3/c-api/arg.html#c.PyArg_ParseTuple

Related -- with the broader ACL overhaul I've been toying with, I've
been wondering if we might want to add some additional helpers for
uid_t, gid_t, etc., insstead of just trying to use a "wide enough"
signed or unsigned integer type via ./configure and/or runtime
checks/assertions.

Greg Troxel

unread,
May 13, 2023, 10:10:07 AM5/13/23
to Rob Browning, Johannes Berg, m.he...@computer.org, bup-...@googlegroups.com
Rob Browning <r...@defaultvalue.org> writes:

> Greg Troxel <g...@lexort.com> writes:
>
>> I'm not 100% following, but if this is all in C, why does it matter what
>> Python provides? Doesn't it only matter if we try to turn a value into
>> a python value?
>
> In a any number of cases we gather values on the C side that we need to
> return to Python (e.g. xstat), or (more likely to have the issue) we
> pass values from python to our C code.
>
> For example
>
> https://codeberg.org/bup/bup/src/branch/main/lib/bup/_helpers.c#L1381-L1404
>
> or
>
> https://codeberg.org/bup/bup/src/branch/main/lib/bup/_helpers.c#L955-L956
>
> Oh, and for the typical way to handle values (which has the unsigned
> problem in various places):
>
> https://docs.python.org/3/c-api/arg.html#c.PyArg_ParseTuple
>
> Related -- with the broader ACL overhaul I've been toying with, I've
> been wondering if we might want to add some additional helpers for
> uid_t, gid_t, etc., insstead of just trying to use a "wide enough"
> signed or unsigned integer type via ./configure and/or runtime
> checks/assertions.

So it sounds like

the current issue can be fixed without solving the larger issues

we want various types in Python

and it seems like either Python needs support for various widths, or
support for intmax_t and we can use that, or something else. Or, for
now, support for int64_t and uint64_t and we can assert that lots of
things fit in that.

Rob Browning

unread,
May 13, 2023, 1:52:18 PM5/13/23
to Greg Troxel, Johannes Berg, m.he...@computer.org, bup-...@googlegroups.com
Greg Troxel <g...@lexort.com> writes:

> So it sounds like
>
> the current issue can be fixed without solving the larger issues
>
> we want various types in Python
>
> and it seems like either Python needs support for various widths, or
> support for intmax_t and we can use that, or something else. Or, for
> now, support for int64_t and uint64_t and we can assert that lots of
> things fit in that.

Yep. I wish they provided (u)intmax_t support, but we have some
converters from (unsigned) long, and (unsigned) long long, etc.
Although they're not as convenient as direct support in say ParseTuple,
and not comprehensive, i.e. wrt anything larger than (unsigned) long
long.

(e.g. see the INTEGER_TO_PY() in _helpers.c and the code just after it.)

Rob Browning

unread,
May 13, 2023, 2:59:02 PM5/13/23
to bup-...@googlegroups.com, Mark J Hewitt
Reverse the order of the check since we'd originally intended it to
guard the assignment to the uncached HashSplitter field (allowing a
simple assignment), meaning that it needs to check that any size_t
will fit in an off_t.

Thanks to Mark J Hewitt for reporting the problem, which appeared on a
32-bit system where, unsurprisingly, off_t was 8 and size_t was 4.

Signed-off-by: Rob Browning <r...@defaultvalue.org>
Tested-by: Rob Browning <r...@defaultvalue.org>
---

Proposed. This isn't relevant to 0.33.x (or anything older).

lib/bup/_hashsplit.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/bup/_hashsplit.c b/lib/bup/_hashsplit.c
index 289509638..ade413181 100644
--- a/lib/bup/_hashsplit.c
+++ b/lib/bup/_hashsplit.c
@@ -383,7 +383,7 @@ static int HashSplitter_uncache(HashSplitter *self, int last)
size_t pages = len / page_size;

// now track where and how much to uncache
- off_t start = self->uncached; // see assumptions (off_t <= size_t)
+ off_t start = self->uncached; // see assumptions (size_t <= off_t)

// Check against overflow up front
size_t pgstart = self->uncached / page_size;
@@ -659,7 +659,7 @@ int hashsplit_init(void)
{
// Assumptions the rest of the code can depend on.
assert(sizeof(Py_ssize_t) <= sizeof(size_t));
- assert(sizeof(off_t) <= sizeof(size_t));
+ assert(sizeof(size_t) <= sizeof(off_t));
assert(CHAR_BIT == 8);
assert(sizeof(Py_ssize_t) <= sizeof(size_t));

--
2.39.2

Mark J Hewitt

unread,
May 14, 2023, 2:51:01 PM5/14/23
to Rob Browning, bup-...@googlegroups.com
Thank you Rob: Passes all the bup tests on a 32 bit Raspberry Pi, and my
own sanity checks too.

Mark.
Mark J Hewitt

Rob Browning

unread,
May 14, 2023, 6:59:47 PM5/14/23
to m.he...@computer.org, bup-...@googlegroups.com
Mark J Hewitt <m...@idnet.com> writes:

> Thank you Rob: Passes all the bup tests on a 32 bit Raspberry Pi, and my
> own sanity checks too.

Great, and thanks for the testing.

Nix

unread,
May 15, 2023, 6:08:50 PM5/15/23
to Rob Browning, Greg Troxel, Johannes Berg, m.he...@computer.org, bup-...@googlegroups.com
On 13 May 2023, Rob Browning spake thusly:
> Yep. I wish they provided (u)intmax_t support, but we have some

Alas, that won't help for long -- due to ABI constraints, (u)intmax_t is
the max size of integers *when the typedef was added*: there are already
128-bit integral types that are longer.

--
NULL && (void)

Greg Troxel

unread,
May 15, 2023, 6:41:02 PM5/15/23
to Nix, Rob Browning
That seems broken. AIUI, intmax_t is compile time, and if you can't
store and extract any other type, that doesn't meet specs.

I also don't see why it is part of any ABI; it seems like an error for
any program or the OS to include it in any structures used by the ABI --
it seems the point is to be able to use it within a program. ABIs
should use things like int32_t or whatever is needed.

Is this a Linux practice, or does "intmax_t too small" happen in other
environments? I can ask my C99/POSIX lawyers; it's fun to get them
stirred up!

Nix

unread,
May 16, 2023, 4:19:26 PM5/16/23
to Greg Troxel, Rob Browning, bup-...@googlegroups.com
On 15 May 2023, Greg Troxel verbalised:
Well, uintmax_t is an integral type (compilers generally agree on this
even when they can't decide whether __int128 is integral or not).

*Obviously* its size becomes part of the ABI, just like the size of any
other integral type, because you might use it in an exported function
(examples in the link below). This then means its size and alignment
cannot change without an ABI break.

A hilarious (knowledgeable) take on this from the C standard's current
editor (otherwise known as "the guy who didn't run fast enough to get
out of the way"): <https://thephd.dev/intmax_t-hell-c++-c>.

--
NULL && (void)

Rob Browning

unread,
May 28, 2023, 6:49:21 PM5/28/23
to bup-...@googlegroups.com, Mark J Hewitt
Rob Browning <r...@defaultvalue.org> writes:

> Reverse the order of the check since we'd originally intended it to
> guard the assignment to the uncached HashSplitter field (allowing a
> simple assignment), meaning that it needs to check that any size_t
> will fit in an off_t.
>
> Thanks to Mark J Hewitt for reporting the problem, which appeared on a
> 32-bit system where, unsurprisingly, off_t was 8 and size_t was 4.
>
> Signed-off-by: Rob Browning <r...@defaultvalue.org>
> Tested-by: Rob Browning <r...@defaultvalue.org>
> ---
>
> Proposed. This isn't relevant to 0.33.x (or anything older).

Pushed.

Thanks
Reply all
Reply to author
Forward
0 new messages