Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Is this really necessary

318 views
Skip to first unread message

Bonita Montero

unread,
Jun 5, 2021, 6:53:49 AM6/5/21
to
Consider the following code:


size_t s( int *begin, int *end )
{
return (end - begin) * sizeof(int);
}

This is MSVC's disassembly:

sub rdx, rcx
and rdx, -4
mov rax, rdx
ret 0

This is gcc's disassembly:

movq %rsi, %rax
subq %rdi, %rax
ret

It the masking really necessary or just an optimizer weakness;
it seems to me MSVC sees that two bits are shifted out and "in"
and replaces this with a mask.
I think that the language-standard assumes equally aligned data
among same types for the above code and gcc is corcect, but I'm
not sure.

David Brown

unread,
Jun 5, 2021, 8:20:07 AM6/5/21
to
I can't help thinking that putting a non-aligned value into an int
pointer is undefined behaviour - but I can't find a reference to that in
the C standards (and I don't know the C++ standards well enough to look
there). However, when you subtract two pointers, the behaviour is only
defined if they both point to elements inside the same array (or one
past the end). It doesn't matter how they are aligned, as the offset
from the standard "int" alignment will be the same for both pointers.
(This applies also to 8-bit systems where you might have 16-bit int but
8-bit alignment for int.) Thus gcc's code is fine.

MSVC has a tradition of being more conservative in its optimisations,
while gcc has a tradition of expecting people to write valid code and
optimise on the assumption that undefined behaviour does not occur.
This is, I think, because MSDOS and Windows programmers have a tradition
of assuming their code runs on one platform and one compiler, and "it
worked when I tested it" means "it is correct". (Your own misstatements
about undefined behaviour have demonstrated that.) gcc users are more
likely to understand that their code could be used on different
platforms and different processors, and pay a bit more attention to the
rules of the language.

Bonita Montero

unread,
Jun 5, 2021, 8:28:10 AM6/5/21
to
> I can't help thinking that putting a non-aligned value into an int
> pointer is undefined behaviour - but I can't find a reference to that
> in the C standards (and I don't know the C++ standards well enough to
> look there). However, when you subtract two pointers, the behaviour
> is only defined if they both point to elements inside the same array
> (or one past the end). ...

And in a struct with two ints ?

> MSVC has a tradition of being more conservative in its optimisations,
> ...

MSVC is not more conservative, but more stupid because it lacks many
safe optimizations.

David Brown

unread,
Jun 5, 2021, 8:34:07 AM6/5/21
to
On 05/06/2021 14:27, Bonita Montero wrote:
>> I can't help thinking that putting a non-aligned value into an int
>> pointer is undefined behaviour - but I can't find a reference to that
>> in the C standards (and I don't know the C++ standards well enough to
>> look there).  However, when you subtract two pointers, the behaviour
>> is only defined if they both point to elements inside the same array
>> (or one past the end). ...
>
> And in a struct with two ints ?
>

No. Subtraction of pointers is defined as the difference in their
indexes within a single array.

>> MSVC has a tradition of being more conservative in its optimisations,
>> ...
>
> MSVC is not more conservative, but more stupid because it lacks many
> safe optimizations.

I was not being judgemental about what is a good or bad implementation.
I personally prefer gcc's philosophy, but I know other people have
preferences that are somewhere in between. (You have posted in the past
about your beliefs about how compilers handle some kinds of undefined
behaviour - and shown why there is a market for such conservative
compilers.)

Öö Tiib

unread,
Jun 5, 2021, 8:38:44 AM6/5/21
to
On Saturday, 5 June 2021 at 15:20:07 UTC+3, David Brown wrote:
> On 05/06/2021 12:53, Bonita Montero wrote:
> > Consider the following code:
> >
> >
> > size_t s( int *begin, int *end )
> > {
> > return (end - begin) * sizeof(int);
> > }
> >
> > This is MSVC's disassembly:
> >
> > sub rdx, rcx
> > and rdx, -4
> > mov rax, rdx
> > ret 0
> >
> > This is gcc's disassembly:
> >
> > movq %rsi, %rax
> > subq %rdi, %rax
> > ret
> >
> > It the masking really necessary or just an optimizer weakness;
> > it seems to me MSVC sees that two bits are shifted out and "in"
> > and replaces this with a mask.
> > I think that the language-standard assumes equally aligned data
> > among same types for the above code and gcc is corcect, but I'm
> > not sure.
> >
> I can't help thinking that putting a non-aligned value into an int
> pointer is undefined behaviour - but I can't find a reference to that in
> the C standards (and I don't know the C++ standards well enough to look
> there).

The C and C++ programs that use unaligned pointers are undefined (in
sense of standard) regardless of the target architecture (that may allow
unaligned accesses) but the implementations can extend.

> However, when you subtract two pointers, the behaviour is only
> defined if they both point to elements inside the same array (or one
> past the end). It doesn't matter how they are aligned, as the offset
> from the standard "int" alignment will be the same for both pointers.
> (This applies also to 8-bit systems where you might have 16-bit int but
> 8-bit alignment for int.) Thus gcc's code is fine.

Also MS code is fine as that masking should not have any ill effects
to conforming code.

> MSVC has a tradition of being more conservative in its optimisations,
> while gcc has a tradition of expecting people to write valid code and
> optimise on the assumption that undefined behaviour does not occur.
> This is, I think, because MSDOS and Windows programmers have a tradition
> of assuming their code runs on one platform and one compiler, and "it
> worked when I tested it" means "it is correct". (Your own misstatements
> about undefined behaviour have demonstrated that.) gcc users are more
> likely to understand that their code could be used on different
> platforms and different processors, and pay a bit more attention to the
> rules of the language.

I think that most important requirement of MSVC is that it should build
good binaries out of Microsoft's own code base regardless how tricky
that code base is. The gcc as whole does not have that sort of obligations.
Perhaps some people working on gcc code base have but they are from
wide variety of companies.

Bonita Montero

unread,
Jun 5, 2021, 9:06:41 AM6/5/21
to
> No. Subtraction of pointers is defined as the difference in their
> indexes within a single array.

That coudn't be true because you can cast any pointer-pair
to char *, subtract them and use the difference for memcpy().

David Brown

unread,
Jun 5, 2021, 9:30:07 AM6/5/21
to
Look it up.

You can do lots of things in C and C++ that are syntactically correct,
but might have undefined behaviour.

Öö Tiib

unread,
Jun 5, 2021, 9:34:58 AM6/5/21
to
So it couldn't be true that standard specifies it so:
| If the expressions P and Q point to, respectively, elements
| x[i] and x[j] of the same array object x, the expression P - Q has
| the value i − j; otherwise, the behavior is undefined.

I don't understand what supposedly stops it?

David Brown

unread,
Jun 5, 2021, 9:41:35 AM6/5/21
to
Of course implementations can add whatever definitions they want beyond
the requirements of the standard.

And while dereferencing unaligned pointers is undefined behaviour (by
the standards), I haven't found anything that says that merely assigning
an unaligned value to a pointer is undefined behaviour. But that could
easily be something I missed - hopefully someone can then give the
reference (in the C or C++ standards).

>> However, when you subtract two pointers, the behaviour is only
>> defined if they both point to elements inside the same array (or one
>> past the end). It doesn't matter how they are aligned, as the offset
>> from the standard "int" alignment will be the same for both pointers.
>> (This applies also to 8-bit systems where you might have 16-bit int but
>> 8-bit alignment for int.) Thus gcc's code is fine.
>
> Also MS code is fine as that masking should not have any ill effects
> to conforming code.
>

Sure. Suboptimal, but correct.

>> MSVC has a tradition of being more conservative in its optimisations,
>> while gcc has a tradition of expecting people to write valid code and
>> optimise on the assumption that undefined behaviour does not occur.
>> This is, I think, because MSDOS and Windows programmers have a tradition
>> of assuming their code runs on one platform and one compiler, and "it
>> worked when I tested it" means "it is correct". (Your own misstatements
>> about undefined behaviour have demonstrated that.) gcc users are more
>> likely to understand that their code could be used on different
>> platforms and different processors, and pay a bit more attention to the
>> rules of the language.
>
> I think that most important requirement of MSVC is that it should build
> good binaries out of Microsoft's own code base regardless how tricky
> that code base is.

That is a reasonable requirement!

> The gcc as whole does not have that sort of obligations.

gcc needs to be able to compile gcc and all its dependencies, libraries,
etc. That in itself is a rather massive and complex code base, full of
all kinds of weird stuff for historic reasons (including garbage
collection, mixes of C and C++, and code that dates back 30+ years that
no one really understands).

They also work with the Linux kernel folk and distributions like Debian
to test on a huge variety of existing software. I'm not sure whether
you could call that an "obligation" or a "requirement" for gcc as a
whole or, as you say, just for some people working on gcc. But it is
certainly something they do in the process of testing and preparing
releases.

Öö Tiib

unread,
Jun 5, 2021, 11:18:17 AM6/5/21
to
When to attempt to make the pointer that is unaligned then "resulting
pointer value is unspecified" or equal wording in couple places of
C++ standard. I did mean usage like dereferencing of such unspecified
pointer value is undefined (unless implementation gives some better
guarantees).

> >> MSVC has a tradition of being more conservative in its optimisations,
> >> while gcc has a tradition of expecting people to write valid code and
> >> optimise on the assumption that undefined behaviour does not occur.
> >> This is, I think, because MSDOS and Windows programmers have a tradition
> >> of assuming their code runs on one platform and one compiler, and "it
> >> worked when I tested it" means "it is correct". (Your own misstatements
> >> about undefined behaviour have demonstrated that.) gcc users are more
> >> likely to understand that their code could be used on different
> >> platforms and different processors, and pay a bit more attention to the
> >> rules of the language.
> >
> > I think that most important requirement of MSVC is that it should build
> > good binaries out of Microsoft's own code base regardless how tricky
> > that code base is.
>
> That is a reasonable requirement!
>
> > The gcc as whole does not have that sort of obligations.
>
> gcc needs to be able to compile gcc and all its dependencies, libraries,
> etc. That in itself is a rather massive and complex code base, full of
> all kinds of weird stuff for historic reasons (including garbage
> collection, mixes of C and C++, and code that dates back 30+ years that
> no one really understands).

I agree. Still in MS if a code that did run with compiler version A does not
run with with compiler version B then it is about what is cheaper to
business: (1) to fix undefined behavior in that code or (2) to adjust that
compiler B. That (2) is more common with msvc than with gcc that also
compiles undefined behaviors in popular benchmarks "correctly" (as
example of (2) with gcc).

> They also work with the Linux kernel folk and distributions like Debian
> to test on a huge variety of existing software. I'm not sure whether
> you could call that an "obligation" or a "requirement" for gcc as a
> whole or, as you say, just for some people working on gcc. But it is
> certainly something they do in the process of testing and preparing
> releases.

That code-base can't be declared sacred by business or by being
popular benchmark. So there we see Linus being vulgar but complying
and fixing such legacy code.

Bonita Montero

unread,
Jun 5, 2021, 11:38:49 AM6/5/21
to
>> That coudn't be true because you can cast any pointer-pair
>> to char *, subtract them and use the difference for memcpy().

> Look it up.
> You can do lots of things in C and C++ that are syntactically correct,
> but might have undefined behaviour.

I don't believe that memcpy()ing this way is UB.

David Brown

unread,
Jun 5, 2021, 11:51:53 AM6/5/21
to
Can you give an example of what you are thinking about?


Richard Damon

unread,
Jun 5, 2021, 11:57:04 AM6/5/21
to
The memcpy might not be, but subtracting two pointers that don't point
to elements of the same array is.

Richard Damon

unread,
Jun 5, 2021, 11:59:02 AM6/5/21
to
My understanding is that the unaligned pointer has an unspecified value
that might be a trap value, so any operation that uses that value can
cause Undefined Behavior.

>

MrSpoo...@4hhtozmpj299zx.tv

unread,
Jun 5, 2021, 12:00:46 PM6/5/21
to
On Sat, 5 Jun 2021 14:19:52 +0200
David Brown <david...@hesbynett.no> wrote:
>On 05/06/2021 12:53, Bonita Montero wrote:
>> It the masking really necessary or just an optimizer weakness;
>> it seems to me MSVC sees that two bits are shifted out and "in"
>> and replaces this with a mask.
>> I think that the language-standard assumes equally aligned data
>> among same types for the above code and gcc is corcect, but I'm
>> not sure.
>>
>
>I can't help thinking that putting a non-aligned value into an int
>pointer is undefined behaviour - but I can't find a reference to that in

Alignment only matters on certain architectures, and even then , not always.
eg this compiles and runs fine on x86 MacOS using clang, setting non aligned
ints on both the stack and the heap:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

int main()
{
uint32_t i;
uint32_t j;
uint32_t *p1;
uint32_t *p2;

p1 = (&j < &i ? &j : &i);
p2 = (uint32_t *)((char *)p1 + 1);
printf("p1 = %p, p2 = %p\n",p1,p2);
*p2 = (uint32_t)-1;
printf("*p2 = %u\n",*p2);

p1 = (uint32_t *)malloc(sizeof(uint32_t) * 2);
p2 = (uint32_t *)((char *)p1 + 1);
printf("p1 = %p, p2 = %p\n",p1,p2);
*p2 = (uint32_t)-1;
printf("*p2 = %u\n",*p2);
return 0;
}

David Brown

unread,
Jun 5, 2021, 12:17:07 PM6/5/21
to
On 05/06/2021 18:00, MrSpoo...@4hhtozmpj299zx.tv wrote:
> On Sat, 5 Jun 2021 14:19:52 +0200
> David Brown <david...@hesbynett.no> wrote:
>> On 05/06/2021 12:53, Bonita Montero wrote:
>>> It the masking really necessary or just an optimizer weakness;
>>> it seems to me MSVC sees that two bits are shifted out and "in"
>>> and replaces this with a mask.
>>> I think that the language-standard assumes equally aligned data
>>> among same types for the above code and gcc is corcect, but I'm
>>> not sure.
>>>
>>
>> I can't help thinking that putting a non-aligned value into an int
>> pointer is undefined behaviour - but I can't find a reference to that in
>
> Alignment only matters on certain architectures, and even then , not always.

The discussion is about behaviour that is not defined by the C and C++
standards. If you can point to /documentation/ that says clang on x86
MacOS defines the behaviour of unaligned access, that would be
interesting. But other than that, a sample of "this example happens to
work on this compiler with these flags on this target" is irrelevant.

We all know that on many - but not all - cpu targets, unaligned accesses
work as expected, albeit usually at a performance cost. But we are
talking about the standards definition of C++ here (and perhaps C, in
that C++ inherits such things from C), not cpus.


Bo Persson

unread,
Jun 5, 2021, 1:50:38 PM6/5/21
to
You did notice (right?) that the optimiser transformed

printf("*p2 = %u\n",*p2);

into

printf("*p2 = %u\n", (uint32_t)-1);

resulting in the code

00007FF71990106D mov rcx,rsi
00007FF719901070 mov edx,0FFFFFFFFh
00007FF719901075 call printf (07FF719901090h)




James Kuyper

unread,
Jun 5, 2021, 5:43:01 PM6/5/21
to
On 6/5/21 8:19 AM, David Brown wrote:
...
> I can't help thinking that putting a non-aligned value into an int
> pointer is undefined behaviour - but I can't find a reference to that in
> the C standards (and I don't know the C++ standards well enough to look
> there).

It's not something either standard says explicitly. Rather, it's
something that need to be derived from what it says about other things.
If you start from a pointer that is correctly aligned for it's type,
most pointer operations give a result that still points to the same
type, and is still correctly aligned for that type. This includes
conversion to an integer type and back to the original pointer type.
The only operations that could result in a mis-aligned pointer all have
undefined behavior for one reason or another. For instance, conversion
to a pointer to a more strictly aligned type has undefined behavior if
the original pointer doesn't meet the alignment requirements of the new
type. Converting a pointer to an integer, performing any kind on
arithmetic on that integer to produce a different integer value, and
converting back again, has undefined behavior due to the omission of any
explicit definition of the behavior.
So what about starting with a mis-aligned pointer? If you have a packed
struct, the members of that struct might not be correctly aligned for
their type. By packing a struct is not a core language feature - it's
only available as an extension. On platforms with strong alignment
requirements, implementations that allow struct packing will generally
provide warnings about ways you should not use normal pointers to access
objects that might be misaligned.
You could also convert an integer that represents a memory location that
is not correctly aligned for a given type, and convert it into a pointer
to that type - but the behavior of such a conversion is undefined.

I'm not sure that the above argument covers every possibility, but I do
believe that every possible way of getting a misaligned pointer is
covered, in some fashion, by both standards.

James Kuyper

unread,
Jun 5, 2021, 7:09:04 PM6/5/21
to
Apologies to David, who has already received two versions of this
message as e-mail, because I keep hitting the Thunderbird "Reply" button
instead of their new "Followup" button.

On 6/5/21 9:29 AM, David Brown wrote:
> On 05/06/2021 15:06, Bonita Montero wrote:
>>> No.  Subtraction of pointers is defined as the difference in their
>>> indexes within a single array.
>>
>> That coudn't be true because you can cast any pointer-pair
>> to char *, subtract them and use the difference for memcpy().
>>
>
> Look it up.


That works in C because every C object can be accessed as an array of
char. C++ allows more complicated possibilities, including objects that
are not contiguous. But it works in C++ if the relevant objects are
required to be contiguous. If two such objects are both sub-objects of
the same larger object, the difference between those pointers satisfies
that requirement, otherwise the subtraction is undefined.

Richard Damon

unread,
Jun 5, 2021, 8:03:43 PM6/5/21
to
Actually it is true. Char (and its relatives) do have a few special
cases, as Any pointer is allowed to be converted into a pointer to char
and back and still be used. Also I believe that any object can be
treated as an array of char. So you can do the conversion to char and
subtract for two pointers in the same object.

It is still not defined if the pointers are to two separate objects. For
machines with segmented memory, this can be an issue.

David Brown

unread,
Jun 6, 2021, 5:05:49 AM6/6/21
to
The only major possibility I can think of that is missing above, is type
punning of various types (such as via a union in C, or by accessing the
pointer via a char pointer or memcpy). But I expect they too would have
to provide a valid pointer or the results would be undefined when the
pointer was used.

Thank you for the explanation.

David Brown

unread,
Jun 6, 2021, 5:24:20 AM6/6/21
to
On 06/06/2021 01:08, James Kuyper wrote:
> Apologies to David, who has already received two versions of this
> message as e-mail, because I keep hitting the Thunderbird "Reply" button
> instead of their new "Followup" button.

No problem. It makes me feel special :-)

>
> On 6/5/21 9:29 AM, David Brown wrote:
>> On 05/06/2021 15:06, Bonita Montero wrote:
>>>> No.  Subtraction of pointers is defined as the difference in their
>>>> indexes within a single array.
>>>
>>> That coudn't be true because you can cast any pointer-pair
>>> to char *, subtract them and use the difference for memcpy().
>>>
>>
>> Look it up.
>
>
> That works in C because every C object can be accessed as an array of
> char. C++ allows more complicated possibilities, including objects that
> are not contiguous. But it works in C++ if the relevant objects are
> required to be contiguous. If two such objects are both sub-objects of
> the same larger object, the difference between those pointers satisfies
> that requirement, otherwise the subtraction is undefined.
>

When you do that, however, you are not subtracting the original pointers
- you are subtracting two char* pointers. And that subtraction works as
the difference of their indexes into an array (of char type).

So if you have:

struct A {
int x;
iny y;
};

A a;

the expression "&a.x - &a.y" is not defined by the C or C++ standards.
You have to cast the pointers to character type, and then you are
subtracting two pointers that are part of the same array, rather than
subtracting pointers to two ints in a structure.

Chris Vine

unread,
Jun 6, 2021, 5:33:57 AM6/6/21
to
Assuming C++20, on my (possibly faulty) reading you could _not_ access
any given object as an array of char, but you could access it as an
array of unsigned char or std::byte. This is because you can construct
any object in an array of unsigned char or of std::byte, so that even
where some other storage for a particular object has in fact been
provided, such an array will implicitly arise in consequence of the
carrying out of any pointer arithmetic which prods around in the
object's internals.

So although the strict aliasing rule is not offended by the use of
char* for this purpose, I suspect the rules on pointer arithmetic are.

Richard Damon

unread,
Jun 6, 2021, 7:56:24 AM6/6/21
to
I haven't poured over the standard recently, but my memory was that the
type family char / signed char / unsigned char all had this property,
but if you actually accessed the values, signed char (and char is
signed) had the possibility of a trap value (-0).

unsigned char had the natural property that its values were precisely
defined by the standard, but any character type could be used, if only
because of ancient code that used char for this purpose.

MrSpo...@q67rq6_2ly9ut44j.org

unread,
Jun 6, 2021, 9:56:05 AM6/6/21
to
On Sat, 5 Jun 2021 18:16:52 +0200
David Brown <david...@hesbynett.no> wrote:
>On 05/06/2021 18:00, MrSpoo...@4hhtozmpj299zx.tv wrote:
>> On Sat, 5 Jun 2021 14:19:52 +0200
>> David Brown <david...@hesbynett.no> wrote:
>>> On 05/06/2021 12:53, Bonita Montero wrote:
>>>> It the masking really necessary or just an optimizer weakness;
>>>> it seems to me MSVC sees that two bits are shifted out and "in"
>>>> and replaces this with a mask.
>>>> I think that the language-standard assumes equally aligned data
>>>> among same types for the above code and gcc is corcect, but I'm
>>>> not sure.
>>>>
>>>
>>> I can't help thinking that putting a non-aligned value into an int
>>> pointer is undefined behaviour - but I can't find a reference to that in
>>
>> Alignment only matters on certain architectures, and even then , not always.
>
>The discussion is about behaviour that is not defined by the C and C++
>standards. If you can point to /documentation/ that says clang on x86
>MacOS defines the behaviour of unaligned access, that would be

Probably isn't any. I suspect most compilers just create the machine code and
the rest is up to the CPU.


MrSpo...@4tzq7j92.gov

unread,
Jun 6, 2021, 9:59:58 AM6/6/21
to
Nope, guess the compiler didn't notice either:

movq -40(%rbp), %rax ## 8-byte Reload
movq %rax, -24(%rbp)
movq -24(%rbp), %rax
addq $1, %rax
movq %rax, -32(%rbp)
movq -24(%rbp), %rsi
movq -32(%rbp), %rdx
leaq L_.str(%rip), %rdi
movb $0, %al
callq _printf
movq -32(%rbp), %rcx
movl $-1, (%rcx)
movq -32(%rbp), %rcx
movl (%rcx), %esi
leaq L_.str.1(%rip), %rdi
movl %eax, -44(%rbp) ## 4-byte Spill
movb $0, %al
callq _printf



Chris Vine

unread,
Jun 6, 2021, 10:00:11 AM6/6/21
to
On Sun, 6 Jun 2021 07:56:07 -0400
Richard Damon <Ric...@Damon-Family.org> wrote:
> On 6/6/21 5:33 AM, Chris Vine wrote:
> > On Sat, 5 Jun 2021 19:08:49 -0400
> > James Kuyper <james...@alumni.caltech.edu> wrote:
[snip]
> >> That works in C because every C object can be accessed as an array of
> >> char. C++ allows more complicated possibilities, including objects that
> >> are not contiguous. But it works in C++ if the relevant objects are
> >> required to be contiguous. If two such objects are both sub-objects of
> >> the same larger object, the difference between those pointers satisfies
> >> that requirement, otherwise the subtraction is undefined.
> >
> > Assuming C++20, on my (possibly faulty) reading you could _not_ access
> > any given object as an array of char, but you could access it as an
> > array of unsigned char or std::byte. This is because you can construct
> > any object in an array of unsigned char or of std::byte, so that even
> > where some other storage for a particular object has in fact been
> > provided, such an array will implicitly arise in consequence of the
> > carrying out of any pointer arithmetic which prods around in the
> > object's internals.
> >
> > So although the strict aliasing rule is not offended by the use of
> > char* for this purpose, I suspect the rules on pointer arithmetic are.
> >
>
> I haven't poured over the standard recently, but my memory was that the
> type family char / signed char / unsigned char all had this property,
> but if you actually accessed the values, signed char (and char is
> signed) had the possibility of a trap value (-0).
>
> unsigned char had the natural property that its values were precisely
> defined by the standard, but any character type could be used, if only
> because of ancient code that used char for this purpose.

I wasn't thinking of the point concerning the assigning of
indeterminate values into narrow character types (permitted for
unsigned but not for signed), although I imagine that may come into
play.

The issue I was referring was that (i) new objects can be constructed
(either by placement new or as implicit-lifetime types) in an array of
unsigned char or array of std::byte if properly aligned
([intro.object]/3 and /4), (ii) iterating by pointer arithmetic can only
be carried out in respect of arrays ([expr.add]/4), (iii) in C++20
arrays (but not necessarily their elements) are implicit-lifetime
types, (iv) implicit-lifetime types arise spontaneously where necessary
to obtain defined behaviour, and (v) accordingly you can iterate over
an object as if it were stored in an array of unsigned char or
std::byte even if it isn't[1].

I imagine that in practice you can construct an object in an array of
char on any compilers in widespread use but it is not formally supported
by the standard. I don't know why there is that restriction. It cannot
be entirely down to indeterminate values, because you can put an
indeterminate value in a char object if the char type is in fact
unsigned, but that is not true of constructing an object in an array of
char using placement new.

This is as I understand it. But my understanding may not be complete.

[1]: It is this that enables you to implement your own version of
std::memcpy() in standard C++20. That was impossible in C++17.

David Brown

unread,
Jun 6, 2021, 11:35:08 AM6/6/21
to
In the majority of cases, that is correct.

The fun comes when the compiler can see that if it assumes there is no
undefined behaviour, it can make significant efficiency improvements -
then it might well do so. (Optimisations vary by compiler, flags, etc.)
Remember, "undefined behaviour" means compiler might generate code that
does what you might expect as the "natural" behaviour on the cpu in
question - it also means it might assume it doesn't happen and then you
get weird effects if you break the rules.

That's why "it worked when I tested it" is not something you should rely
on for undefined behaviour - you might have got lucky, or maybe things
will change with the next compiler version.


Paavo Helde

unread,
Jun 6, 2021, 11:48:22 AM6/6/21
to
This holds only for linear memory model. While this is a dominant memory
model nowadays, the C++ language is old enough to take also other memory
models (like segmented ones) into account. In segmented memory models,
pointer arithmetic only works in a single segment, and accordingly the
arrays are limited to a single segment. There is no such limitation for
struct members.

As an example, with Intel 386 you could have a 16-bit program working
simultaneously with at least 4 different 64 kB segments, which might
have been fully separate in the physical memory. Good luck with forming
a difference of pointers in a 16-bit size_t variable when the segments
are more than 64kB separate in the physical memory!


Richard Damon

unread,
Jun 6, 2021, 2:27:00 PM6/6/21
to
Actually, unless the pointers where specially declared, taking the
difference ignored the segment part of the pointers and only subtracted
the offsets, so if the pointers were to things in different segments,
the difference was largely meaningless.

Vir Campestris

unread,
Jun 6, 2021, 4:28:09 PM6/6/21
to
On 05/06/2021 11:53, Bonita Montero wrote:
> Consider the following code:
>
>
> size_t s( int *begin, int *end )
> {
>     return (end - begin) * sizeof(int);
> }
>
> This is MSVC's disassembly:
>
>     sub    rdx, rcx
>     and    rdx, -4
>     mov    rax, rdx
>     ret    0
>
> This is gcc's disassembly:
>
>     movq    %rsi, %rax
>     subq    %rdi, %rax
>     ret
>
> It the masking really necessary or just an optimizer weakness;
> it seems to me MSVC sees that two bits are shifted out and "in"
> and replaces this with a mask.
> I think that the language-standard assumes equally aligned data
> among same types for the above code and gcc is corcect, but I'm
> not sure.
>
I've gone through the whole thread, and nobody else has commented.

That GCC disassembly can't possibly be a correct and complete rendition
of the function.

Maybe on entry begin is in eax, and end in rdi, which would explain the
second instruction. But what's it doing with rsi?

Andy

Alf P. Steinbach

unread,
Jun 6, 2021, 6:22:45 PM6/6/21
to
Possibly you've assumed that the examples are expressed with the same
syntax.

The MSVC example uses Intel syntax, where e.g. `sub rdx, rcx` means `rdx
:= rdx - rcx`.

So in the MSVC example `rdx` and `rcx` hold the arguments on entry.

The gcc example uses AT&T syntax, where e.g. `movq %rsi, %rax` means
`rax := rsi`. I.e. the opposite order of the instruction arguments.

So in the gcc example `rsi` and `rdi` hold the arguments on entry.

One can tell gcc (or at least g++) to use the less noisy and more
conventional, in short more reasonable, Intel syntax via option
`-masm=intel`, unless I recall the details of that incorrectly.


Cheers,

- Alf

Keith Thompson

unread,
Jun 6, 2021, 7:10:30 PM6/6/21
to
Paavo Helde <myfir...@osa.pri.ee> writes:
> 05.06.2021 16:06 Bonita Montero kirjutas:
>>> No.  Subtraction of pointers is defined as the difference in their
>>> indexes within a single array.
>>
>> That coudn't be true because you can cast any pointer-pair
>> to char *, subtract them and use the difference for memcpy().
>
> This holds only for linear memory model. While this is a dominant
> memory model nowadays, the C++ language is old enough to take also
> other memory models (like segmented ones) into account. In segmented
> memory models, pointer arithmetic only works in a single segment, and
> accordingly the arrays are limited to a single segment. There is no
> such limitation for struct members.

I believe there is. Without that limitation, the offsetof macro
wouldn't work.

> As an example, with Intel 386 you could have a 16-bit program working
> simultaneously with at least 4 different 64 kB segments, which might
> have been fully separate in the physical memory. Good luck with
> forming a difference of pointers in a 16-bit size_t variable when the
> segments are more than 64kB separate in the physical memory!

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson

unread,
Jun 6, 2021, 7:20:37 PM6/6/21
to
I think you meant "(and char *if* signed)".

As of C++17, signed types including signed char can use two's
complement, ones' complement, or signed magnitude, but:
For unsigned narrow character types, each possible bit pattern of
the value representation represents a distinct number. These
requirements do not hold for other types.
which means that plain char and signed char have no trap representations
and no padding bits.

Drafts of C++20 permit only two's complement for signed integer types.
(You're unlikely to find a pre-C++20 implementation that doesn't already
meet that requirement.)

I haven't looked into what the standard says about aliasing arbitrary
objects with arrays of signed char.

> unsigned char had the natural property that its values were precisely
> defined by the standard, but any character type could be used, if only
> because of ancient code that used char for this purpose.

Richard Damon

unread,
Jun 6, 2021, 9:31:00 PM6/6/21
to
On 6/6/21 7:10 PM, Keith Thompson wrote:
> Paavo Helde <myfir...@osa.pri.ee> writes:
>> 05.06.2021 16:06 Bonita Montero kirjutas:
>>>> No.  Subtraction of pointers is defined as the difference in their
>>>> indexes within a single array.
>>>
>>> That coudn't be true because you can cast any pointer-pair
>>> to char *, subtract them and use the difference for memcpy().
>>
>> This holds only for linear memory model. While this is a dominant
>> memory model nowadays, the C++ language is old enough to take also
>> other memory models (like segmented ones) into account. In segmented
>> memory models, pointer arithmetic only works in a single segment, and
>> accordingly the arrays are limited to a single segment. There is no
>> such limitation for struct members.
>
> I believe there is. Without that limitation, the offsetof macro
> wouldn't work.
>
>> As an example, with Intel 386 you could have a 16-bit program working
>> simultaneously with at least 4 different 64 kB segments, which might
>> have been fully separate in the physical memory. Good luck with
>> forming a difference of pointers in a 16-bit size_t variable when the
>> segments are more than 64kB separate in the physical memory!
>

Actually, one reason offsetof is a Standard Macro so it can be made to
use implementation dependent tricks to make it work here.

The implementation has enough information to handle a bigger than a
segment structure if it wanted to. It might only be able to make it work
easily for 'real' mode where segments offset from the current segment
can just be computed, or it might need special support from the OS to
make multiple overlapping segments to build a net address space bigger
than one segment long.

If you reference an member of the structure whose offset is big enough
that it won't fit in the first segment of the structure, the
implementation just needs to make a segment offset from the start of the
object that does hold that full member.

I will say that I never heard of a compiler that did that for a
structure, there were implementation that did that for specified arrays,
but pointers to elements of that array had a non-standard type to keep
track of the fact that segment updates might be needed for pointer
arithmetic. These weree __huge__ arrays, with __huge__ pointers.

Keith Thompson

unread,
Jun 6, 2021, 9:49:57 PM6/6/21
to
Sure, and it can do the same thing for arrays.

Both array objects and struct objects has to *act like* they occupy a
contiguous range of memory addresses. If the implementation has to play
some tricks to make it act that way, that's fine. And if it restricts
object to a single segment, that's fine too (as long as the segment size
is big enough -- 65535 bytes for hosted implementations, C99 and later).

And if an implementation plays tricks for arrays but not for structures,
that's probably fine too. The standard doesn't mention segments.

Keith Thompson

unread,
Jun 6, 2021, 9:51:52 PM6/6/21
to
Keith Thompson <Keith.S.T...@gmail.com> writes:
[...]
> Both array objects and struct objects has to *act like* they occupy a
> contiguous range of memory addresses. If the implementation has to play
> some tricks to make it act that way, that's fine. And if it restricts
> object to a single segment, that's fine too (as long as the segment size
> is big enough -- 65535 bytes for hosted implementations, C99 and later).

Or whatever the corresponding limits are for C++, of course. *sigh*

Richard Damon

unread,
Jun 6, 2021, 10:58:02 PM6/6/21
to
The problem with doing it for arrays is that either you need to do it
for ALL pointers, or you need to make big array use a special syntax.

The key thing for structs is that the pointer to the big structure is
already a special type, so is easy to recognize without cost to other code.

In essence, you can treat a BigStruct* pointer special when you apply
the member selection operation to it.

Given a int* pointer into a big array, you don't know if it IS a big
array, unless you pessimistically do for all, or make big arrays an
extension that creates a special type of pointer that needs a
non-standard type definition (like __huge__)

Öö Tiib

unread,
Jun 7, 2021, 2:17:27 AM6/7/21
to
On Monday, 7 June 2021 at 02:10:30 UTC+3, Keith Thompson wrote:
> Paavo Helde <myfir...@osa.pri.ee> writes:
>
> > This holds only for linear memory model. While this is a dominant
> > memory model nowadays, the C++ language is old enough to take also
> > other memory models (like segmented ones) into account. In segmented
> > memory models, pointer arithmetic only works in a single segment, and
> > accordingly the arrays are limited to a single segment. There is no
> > such limitation for struct members.
>
> I believe there is. Without that limitation, the offsetof macro
> wouldn't work.

In C++ offsetof is required only to work on standard layout types (UB
otherwise). So (either there is some other constraint we haven't
thought about or) only standard layout classes and arrays have to
be limited to work in single segment (when memory is segmented).

Tim Rentsch

unread,
Jun 7, 2021, 3:27:33 AM6/7/21
to
James Kuyper <james...@alumni.caltech.edu> writes:

> [...] Converting a pointer to an integer, performing any kind on
> arithmetic on that integer to produce a different integer value, and
> converting back again, has undefined behavior due to the omission of
> any explicit definition of the behavior.

Converting an arbitrary integer to a pointer is implementation-defined
behavior, not undefined behavior.

Tim Rentsch

unread,
Jun 7, 2021, 3:44:18 AM6/7/21
to
Keith Thompson <Keith.S.T...@gmail.com> writes:

> Richard Damon <Ric...@Damon-Family.org> writes:
>
>> I haven't poured over the standard recently, but my memory was that
>> the type family char / signed char / unsigned char all had this
>> property, but if you actually accessed the values, signed char (and
>> char is signed) had the possibility of a trap value (-0).
>
> I think you meant "(and char *if* signed)".
>
> As of C++17, signed types including signed char can use two's
> complement, ones' complement, or signed magnitude, but:
> For unsigned narrow character types, each possible bit pattern of
> the value representation represents a distinct number. These
> requirements do not hold for other types.
> which means that plain char and signed char have no trap representations
> and no padding bits.

The passage quoted is about _un_signed narrow character types.
It doesn't apply to signed char or to plain char of the signed
variety.

An earlier sentence (in the C++17 standard) says this

For narrow character types, all bits of the object
representation participate in the value representation.

which does rule out padding bits, but it still admits the
possibility of there being a trap representation.

Scott Lurndal

unread,
Jun 7, 2021, 10:52:52 AM6/7/21
to
Vir Campestris <vir.cam...@invalid.invalid> writes:
>On 05/06/2021 11:53, Bonita Montero wrote:
>> Consider the following code:
>>
>>
>> size_t s( int *begin, int *end )
>> {
>>     return (end - begin) * sizeof(int);
>> }
>>
>> This is MSVC's disassembly:
>>
>>     sub    rdx, rcx
>>     and    rdx, -4
>>     mov    rax, rdx
>>     ret    0
>>
>> This is gcc's disassembly:
>>
>>     movq    %rsi, %rax
>>     subq    %rdi, %rax
>>     ret
>>
>> It the masking really necessary or just an optimizer weakness;
>> it seems to me MSVC sees that two bits are shifted out and "in"
>> and replaces this with a mask.
>> I think that the language-standard assumes equally aligned data
>> among same types for the above code and gcc is corcect, but I'm
>> not sure.
>>
>I've gone through the whole thread, and nobody else has commented.
>
>That GCC disassembly can't possibly be a correct and complete rendition
>of the function.

The first input parameter is passed in %rdi, the second in %rsi and
the result is returned in %rax. That's the normal ABI on linux.

Scott Lurndal

unread,
Jun 7, 2021, 10:54:05 AM6/7/21
to
Purely a matter of opinion. Probably depends on what world you
came from - Unix people prefer the AT&T syntax, DOS/Windows people
prefer the Intel syntax.

Personally, I detest the Intel syntax with passion.

Bonita Montero

unread,
Jun 7, 2021, 12:02:47 PM6/7/21
to
> Personally, I detest the Intel syntax with passion.

That's a matter of habituation.

Tim Rentsch

unread,
Jun 7, 2021, 12:19:39 PM6/7/21
to
Paavo Helde <myfir...@osa.pri.ee> writes:

> 05.06.2021 16:06 Bonita Montero kirjutas:
>
>>> No. Subtraction of pointers is defined as the difference in their
>>> indexes within a single array.
>>
>> That coudn't be true because you can cast any pointer-pair
>> to char *, subtract them and use the difference for memcpy().
>
> This holds only for linear memory model. While this is a dominant
> memory model nowadays, the C++ language is old enough to take also
> other memory models (like segmented ones) into account. In segmented
> memory models, pointer arithmetic only works in a single segment, and
> accordingly the arrays are limited to a single segment. There is no
> such limitation for struct members.

In C there is, because any object can be treated as an array
of character type, and that includes structs.

In C++ the rules are so complicated that no one knows whether
that reasoning applies, so the safest course of action is not
to use C++ on machines that use segmentation.

Tim Rentsch

unread,
Jun 7, 2021, 12:44:12 PM6/7/21
to
In C a pointer to any object can be converted to unsigned char *
which then can be used as though it were pointing into a
character array whose extent covers the entire object.

Do you have some reason to believe that property does not apply
to C++?

Öö Tiib

unread,
Jun 7, 2021, 12:51:51 PM6/7/21
to
Yes in C++ it is so only about pointers to objects of trivially copyable
or standard-layout types as only those are required to occupy
contiguous bytes of storage.

james...@alumni.caltech.edu

unread,
Jun 7, 2021, 1:14:06 PM6/7/21
to
On Monday, June 7, 2021 at 12:44:12 PM UTC-4, Tim Rentsch wrote:
...
> In C a pointer to any object can be converted to unsigned char *
> which then can be used as though it were pointing into a
> character array whose extent covers the entire object.
>
> Do you have some reason to believe that property does not apply
> to C++?

"An object of trivially copyable or standard-layout type (6.8) shall occupy contiguous bytes of storage." (6.7.2p8.4). Which implies that any type that is neither trivially copyable nor a standard-layout type need not occupy contiguous bytes of storage. C's requirement (6.2.6.1p2) has no such exceptions.

Paavo Helde

unread,
Jun 7, 2021, 2:58:51 PM6/7/21
to
I'm pretty sure that any C++ implementations on segmented architectures
would require also structs and classes to fit in a single segment, it's
much easier that way.

Anyway, this is not really important because Bonita talked about "any
pair of pointers", no mention of structs was made.

When you take one pointer from say DS segment and the other from ES
segment, then calculating their difference might not have any meaning
for the program, not to speak about fitting this difference into a
size_t variable or copying this memory range somewhere. At best you
could copy a tail of the DS segment and the head of the ES segment
somewhere, but why not vice versa?


Keith Thompson

unread,
Jun 7, 2021, 3:14:00 PM6/7/21
to
True, but C++17 has this rather odd wording later in the same paragraph:

For each value i of type unsigned char in the range 0 to 255
inclusive, there exists a value j of type char such that the result
of an integral conversion (7.8) from i to char is j, and the result
of an integral conversion from j to unsigned char is i.

I believe this implies that there must be 256 distinct values of type
signed char (and plain char if it's signed), disallowing treating -0 or
-128 as a trap representation.

What's odd about it is that it uses the value 255, so it wouldn't apply
that same requirement if CHAR_BIT > 8.

C++20 (at least in the draft I have) requires 2**CHAR_BIT distinct
values for the narrow character types (char, unsigned char, signed char,
and char8_t).

Bo Persson

unread,
Jun 7, 2021, 3:59:00 PM6/7/21
to
Not so odd really. On the CHAR_BIT == 9 system that I once used (Univac,
and for C not C++), the char size was *not* chosen to get additional
character values, but because the hardware had part-word operations that
let you extract any quarter of the 36-bit word.

That made using four 9-bit characters per word a lot more efficient than
four-and-a-half 8-bit characters.


Tim Rentsch

unread,
Jun 10, 2021, 2:30:05 PM6/10/21
to
Yes, as a practical matter no actual C++ implementation is
going to accommodate objects that cross a segment boundary.

My question though was about what the C++ standard allows for
pointers into a single struct object (of the contiguous
variety). I know that C does allow pointer arithmetic for
such pointers (ie, character pointers), but my efforts to
discover whether the C++ standard allows it have not been
successful.

Vir Campestris

unread,
Jun 16, 2021, 4:38:41 PM6/16/21
to
On 06/06/2021 23:22, Alf P. Steinbach wrote:
>
> Possibly you've assumed that the examples are expressed with the same
> syntax.
>
> The MSVC example uses Intel syntax, where e.g. `sub rdx, rcx` means `rdx
> := rdx - rcx`.
>
> So in the MSVC example `rdx` and `rcx` hold the arguments on entry.
>
> The gcc example uses AT&T syntax, where e.g. `movq %rsi, %rax` means
> `rax := rsi`. I.e. the opposite order of the instruction arguments.
>
> So in the gcc example `rsi` and `rdi` hold the arguments on entry.
>
> One can tell gcc (or at least g++) to use the less noisy and more
> conventional, in short more reasonable, Intel syntax via option
> `-masm=intel`, unless I recall the details of that incorrectly.

Ah, that explains it all.

Except why the GCC/ARM disassembly I occasionally look at in my day job
is in Intel order...

Thanks

Andy

Tim Rentsch

unread,
Aug 7, 2021, 1:22:16 PM8/7/21
to
Sorry, I thought it was obvious from context that my
question was meant to ask only about contiguous objects.

Tim Rentsch

unread,
Aug 7, 2021, 1:22:46 PM8/7/21
to

Tim Rentsch

unread,
Aug 7, 2021, 1:26:08 PM8/7/21
to
Ahhh, as usual for the C++ standard. Yes repeat no.

Thank you for the additional information.

Bo Persson

unread,
Aug 7, 2021, 3:21:32 PM8/7/21
to
Things change. :-)


One difference is that C++20 requires two's complement (in another
section), so it can know what value a bit pattern represents.

james...@alumni.caltech.edu

unread,
Aug 7, 2021, 3:41:05 PM8/7/21
to
No, it wasn't. The statement you made about C is about "any object",
not "contiguous objects". It ceases to be true for C++ precisely
because C++ allows objects to not be contiguous. The context of
that statement was a comparison of arrays and structs, and while
arrays are required to contiguous, structs are not. That discussion, in
turn, evolved from an earlier discussion of offsetof(), which is only
conditionally-supported in C++ for types that aren't standard-layout
types.

Tim Rentsch

unread,
Aug 7, 2021, 3:54:24 PM8/7/21
to
My statement was and is a true and correct statement.

Tim Rentsch

unread,
Aug 7, 2021, 3:59:17 PM8/7/21
to
Richard Damon <Ric...@Damon-Family.org> writes:

> On 6/5/21 9:06 AM, Bonita Montero wrote:
>
>>> No. Subtraction of pointers is defined as the difference in their
>>> indexes within a single array.
>>
>> That coudn't be true because you can cast any pointer-pair
>> to char *, subtract them and use the difference for memcpy().
>
> Actually it is true. Char (and its relatives) do have a few special
> cases, as Any pointer is allowed to be converted into a pointer to char
> and back and still be used. Also I believe that any object can be
> treated as an array of char. So you can do the conversion to char and
> subtract for two pointers in the same object.

In C the difference of two pointers might not work, in particular
if the result doesn't fit in ptrdiff_t. I don't know if that
rule is different in C++.

MrSpud...@1ahbfz.gov

unread,
Aug 8, 2021, 5:19:02 AM8/8/21
to
Huh? Any type that can be used to hold a pointer address can also be used
to hold the difference of 2 pointers. If the difference will be negative then
simply invert it.

James Kuyper

unread,
Aug 8, 2021, 9:24:03 AM8/8/21
to
On 8/8/21 5:18 AM, MrSpud...@1ahbfz.gov wrote:
> On Sat, 07 Aug 2021 12:59:02 -0700
> Tim Rentsch <tr.1...@z991.linuxsc.com> wrote:
>> Richard Damon <Ric...@Damon-Family.org> writes:
...
>>> Actually it is true. Char (and its relatives) do have a few special
>>> cases, as Any pointer is allowed to be converted into a pointer to char
>>> and back and still be used. Also I believe that any object can be
>>> treated as an array of char. So you can do the conversion to char and
>>> subtract for two pointers in the same object.
>>
>> In C the difference of two pointers might not work, in particular
>> if the result doesn't fit in ptrdiff_t. I don't know if that
>> rule is different in C++.
>
> Huh? Any type that can be used to hold a pointer address can also be used
> to hold the difference of 2 pointers. If the difference will be negative then
> simply invert it.


"When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object; the
result is the difference of the subscripts of the two array elements.
The size of the result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that type, the
behavior is undefined." (C2011 6.5.6p9).

If the result of a pointer subtraction were always guaranteed to be
representable in ptrdiff_t, then that last sentence would be vacuous.

In the C standard, 7.20.1.4p1 describes intptr_t and uintptr_t, and says
"These types are optional". A fully conforming implementation may have
pointers that are too big to be representable using any supported
integer type, and the same is true of pointer differences.

Juha Nieminen

unread,
Aug 8, 2021, 11:05:10 AM8/8/21
to
Indeed. For example in x86 16-bit real mode (think MS-DOS) it may
well be that a pointer is 32-bit (because it has to contain a
segment and an offset) but ptrdiff_t may well be 16-bit, and the
compiler may limit eg. arrays to 64 kilobytes (minus 1 byte) so
that they will fit inside a segment.

A difference between pointers will only give a valid (16-bit)
result when they point to the same array (which would be within
the same segment). Else you get garbage (if the two pointers are
pointing to different segments).

Richard Damon

unread,
Aug 8, 2021, 1:14:48 PM8/8/21
to
And a key point is that typically ptrdiff_t will be of the sames size as
size_t, only signed instead of unsigned. (This doesn't work if size_t is
16 bits, as ptrdiff_t needs to be at least 17-bits as I remember).

This means that if size_t is 32 bits, and ptrdiff_t is also 32 bits, an
array of char with size bigger than 0x80000000 can generate differences
bigger than can be handled by ptrdiff_t.

Chris Vine

unread,
Aug 8, 2021, 3:01:31 PM8/8/21
to
Note that "array" has a different meaning in C and C++. In C it is
(according to C11 §6.2.5/20) "a contiguously allocated nonempty set of
objects with a particular member object type, called the element type".
Furthermore in C (according to C11 §6.2.6.1/2) "Except for bit-fields,
objects are composed of contiguous sequences of one or more bytes, the
number, order, and encoding of which are either explicitly specified or
implementation-defined". This has been taken to mean that in C the
internal bytes comprising an object can themselves be treated as, and
iterated over as, an array of char, aided of course by the fact that
dereferencing pointers to char pointing to the internals of such objects
does not infringe the strict aliasing rule (§6.5/7 of C11).

That is, on my reading, no longer true in C++. Although by analogy with
C "An object of trivially copyable or standard-layout type shall occupy
contiguous bytes of storage" (C++20 §6.7.2/8.4), arrays in C++ have a
different definition than in C.

As I read C++20 (and I am willing to be corrected), an array is
something meeting the requirements of C++20 §9.3.3.4. Accordingly, an
array is something declared as an array following the C++ array syntax
described there. Merely having a contiguous storage of bytes is not
enough (on that reading) for the object concerned to be treated as an
array of char in C++. This means that any iteration over a standard
layout object using a char pointer type where that object is not
actually an array of char, or any pointer arithmetic respecting such
char pointer types, results in undefined behaviour by virtue of the
restrictions on pointer arithmetic in C++20 §7.6.6/4 and /5.

I don't imagine any compilers do anything other than what is hoped for
when using pointers to char to iterate over standard layout objects:
for one thing, g++ even allows you to carry out pointer arithmetic on
void pointers. But it does follow a pattern of C++ turning common and
reasonable coding practices, as adopted from C, into apparent undefined
behaviour.

I would be interested to know if you have different reading. I would
be pleased to be wrong.

MrSpud...@nldls6_1kg3nl2qnwv.biz

unread,
Aug 9, 2021, 4:19:49 AM8/9/21
to
On Sun, 8 Aug 2021 09:23:46 -0400
James Kuyper <james...@alumni.caltech.edu> wrote:
>On 8/8/21 5:18 AM, MrSpud...@1ahbfz.gov wrote:
>> On Sat, 07 Aug 2021 12:59:02 -0700
>> Tim Rentsch <tr.1...@z991.linuxsc.com> wrote:
>>> Richard Damon <Ric...@Damon-Family.org> writes:
>....
>>>> Actually it is true. Char (and its relatives) do have a few special
>>>> cases, as Any pointer is allowed to be converted into a pointer to char
>>>> and back and still be used. Also I believe that any object can be
>>>> treated as an array of char. So you can do the conversion to char and
>>>> subtract for two pointers in the same object.
>>>
>>> In C the difference of two pointers might not work, in particular
>>> if the result doesn't fit in ptrdiff_t. I don't know if that
>>> rule is different in C++.
>>
>> Huh? Any type that can be used to hold a pointer address can also be used
>> to hold the difference of 2 pointers. If the difference will be negative then
>
>> simply invert it.
>
>
>"When two pointers are subtracted, both shall point to elements of the
>same array object, or one past the last element of the array object; the
>result is the difference of the subscripts of the two array elements.
>The size of the result is implementation-defined, and its type (a signed
>integer type) is ptrdiff_t defined in the <stddef.h> header.
>If the result is not representable in an object of that type, the
>behavior is undefined." (C2011 6.5.6p9).
>
>If the result of a pointer subtraction were always guaranteed to be
>representable in ptrdiff_t, then that last sentence would be vacuous.

Then don't use ptrdiff_t. I suspect like most people I wasn't even aware
it existed.

>In the C standard, 7.20.1.4p1 describes intptr_t and uintptr_t, and says
>"These types are optional". A fully conforming implementation may have
>pointers that are too big to be representable using any supported
>integer type, and the same is true of pointer differences.

So long as mathematical operations can be done on the pointer types (which
is a given or they'd be no use as pointers) then they are de facto integer
types and that statement is wrong.

MrSpud...@dx58865qyrdw30lqllb.info

unread,
Aug 9, 2021, 4:21:25 AM8/9/21
to
I was discussing from the POV of a proper OS, not some half baked monitor
program from the 70s. However if you can't work out how to get a valid diff
from the above then clearly your maths skills need some work.

Juha Nieminen

unread,
Aug 9, 2021, 4:53:50 AM8/9/21
to
The standard (neither the C nor the C++ standard) doesn't care what kind of
OS is running the program. And, on top of that, what I described is a
feature of the x86 architecture, not a feature of the OS. If the OS, any OS,
runs in 16-bit real mode, it will have to deal with that problem somehow.

The fact is that the standard only supports calculating the difference
between two pointers if they point to the same object or the same array
(or one past the end of the array). In any other situation the behavior
is undefined. I gave a practical example of where this could cause an
issue if you ignore the standard.

Juha Nieminen

unread,
Aug 9, 2021, 4:55:48 AM8/9/21
to
MrSpud_ifhov@nldls6_1kg3nl2qnwv.biz wrote:
> So long as mathematical operations can be done on the pointer types (which
> is a given or they'd be no use as pointers) then they are de facto integer
> types and that statement is wrong.

You are assuming that a pointer is internally not only an integer, but a
single integer. That's not necessarily the case.

James Kuyper

unread,
Aug 9, 2021, 12:17:03 PM8/9/21
to
On 8/9/21 4:19 AM, MrSpud_ifhov@nldls6_1kg3nl2qnwv.biz wrote:
> On Sun, 8 Aug 2021 09:23:46 -0400
> James Kuyper <james...@alumni.caltech.edu> wrote:
...
>> "When two pointers are subtracted, both shall point to elements of the
>> same array object, or one past the last element of the array object; the
>> result is the difference of the subscripts of the two array elements.
>> The size of the result is implementation-defined, and its type (a signed
>> integer type) is ptrdiff_t defined in the <stddef.h> header.
>> If the result is not representable in an object of that type, the
>> behavior is undefined." (C2011 6.5.6p9).
>>
>> If the result of a pointer subtraction were always guaranteed to be
>> representable in ptrdiff_t, then that last sentence would be vacuous.
>
> Then don't use ptrdiff_t. I suspect like most people I wasn't even aware
> it existed.

I hope you're wrong about that - it's a fairly basic aspect of C, like
[u]intptr_t or size_t. But my degree was in Physics, not CS, so I don't
have any idea how bad the average CS major's education might have been.

If you never need to store the result of pointer subtractions, there's
no need to use ptrdiff_t. If you're calculating the difference between
pointers, and know enough about the calculation to at least roughly
estimate the minimum and maximum possible values that might result, and
know that both the minimum and maximum differences are small enough to
be stored in an integer of a given type, you can use an integer of that
type to store the result. ptrdiff_t is needed only if you have no other
information to go on about the size of a pointer difference that you
need to store - and if ptrdiff_t cannot represent such a value, it's
likely to be the case that there is no integer type supported by that
implementation that can. If there were such a type, it would have been
used as ptrdiff_t.

>> In the C standard, 7.20.1.4p1 describes intptr_t and uintptr_t, and says
>> "These types are optional". A fully conforming implementation may have
>> pointers that are too big to be representable using any supported
>> integer type, and the same is true of pointer differences.
>
> So long as mathematical operations can be done on the pointer types (which
> is a given or they'd be no use as pointers) then they are de facto integer
> types and that statement is wrong.

Yes, but when these issues come into play, the relevant mathematical
operations cannot be done on pointer types. The result of a pointer
subtraction has the type ptrdiff_t, and if an actual difference results
in a value that cannot be represented in that type, the behavior is
undefined, and on real-life implementations where this can happen, the
likely result will be the same as trying to calculate a value by any
other means that is too large to be represented in that type. That
wording is deliberately vague, because the "likely result" I'm referring
to can be (and probably is) different for different implementations.
Possible results include rolling over large positive differences to
become large negative ones and vice-versa, or saturation arithmetic, or
the raising of a signal. How many cases do you know of where any one of
those three results would be acceptable?

You're used to systems where ptrdiff_t and [u]intptr_t can be defined by
the implementation to be types that are big enough to store any pointer
difference, and any suitably-converted pointer value, respectively. So
am I. But the standard was written by a committee that, collectively,
had far broader experience than either you or I, and that wording was
added because the committee was well aware of systems for which that was
not the case. Unless you know which systems the committee thought had
that issue, and can authoritatively tell them that they were wrong about
those systems, your beliefs to the contrary don't count for much.

MrSp...@_0b772d8ha3yjo0xb.edu

unread,
Aug 9, 2021, 12:23:33 PM8/9/21
to
On Mon, 9 Aug 2021 12:16:46 -0400
James Kuyper <james...@alumni.caltech.edu> wrote:
>On 8/9/21 4:19 AM, MrSpud_ifhov@nldls6_1kg3nl2qnwv.biz wrote:
>> On Sun, 8 Aug 2021 09:23:46 -0400
>> James Kuyper <james...@alumni.caltech.edu> wrote:
>> Then don't use ptrdiff_t. I suspect like most people I wasn't even aware
>> it existed.
>
>I hope you're wrong about that - it's a fairly basic aspect of C, like
>[u]intptr_t or size_t. But my degree was in Physics, not CS, so I don't
>have any idea how bad the average CS major's education might have been.

Ooo look at you, supercilious and patronising all in one go. Well done,have
a scooby snack.

>If you never need to store the result of pointer subtractions, there's
>no need to use ptrdiff_t. If you're calculating the difference between
>pointers, and know enough about the calculation to at least roughly

Any parser beyond the most basic needs to do pointer arithmetic and I've
written a LOT of them.

>type to store the result. ptrdiff_t is needed only if you have no other
>information to go on about the size of a pointer difference that you
>need to store - and if ptrdiff_t cannot represent such a value, it's
>likely to be the case that there is no integer type supported by that
>implementation that can. If there were such a type, it would have been
>used as ptrdiff_t.

I should have clarified in my origional post that I was refering to programming
in grown up OS's on grown up CPUs. Not on DOS with 1970s x86 segmentation.

>> So long as mathematical operations can be done on the pointer types (which
>> is a given or they'd be no use as pointers) then they are de facto integer
>> types and that statement is wrong.
>
>Yes, but when these issues come into play, the relevant mathematical
>operations cannot be done on pointer types. The result of a pointer

They can in *nix which is good enough for me.

Alf P. Steinbach

unread,
Aug 9, 2021, 1:54:02 PM8/9/21
to
True, but. `std::less` & family impose a total order on pointers of the
same type, where for directly comparable pointers (i.e. within the same
array or single-object-as-size-one-array) that order is the same as the
`<` order. That implies that internally these functions compute some
absolute representation of pointers, mapping segment selectors to lower
level addresses as necessary.

I would guess that an argument can be made that in practice you will
always get that absolute representation by converting to `void*` and
then to `uintptr_t`.

However, as far as I know that's not guaranteed or formally implied by
the standard. It would have been nice if the standard /had/ exposed the
internal workings here, just as it would have been nice if the standard
/had/ exposed e.g. the internal non-throwing string representation used
in exceptions. Alas.


- Alf

Vir Campestris

unread,
Aug 9, 2021, 4:50:36 PM8/9/21
to
On 09/08/2021 17:23, MrSpud_HG@_0b772d8ha3yjo0xb.edu wrote:
> I should have clarified in my origional post that I was refering to programming
> in grown up OS's on grown up CPUs. Not on DOS with 1970s x86 segmentation.

You're probably not aware then that the CD DS ES SS registers from that
1970s x86 are still alive and well under the hood, even though they
often point to the same address space, and the offsets are 32 or 64 bit.

What bothers me about ptr_diff_t is the size. The difference between two
32-bit pointers is plus or minus 4GB. Which needs a 33 bit value :(

It's the same with 64 bit.

As it happens it's never been a practical problem as I've never had a 32
bit system with more than 2GB RAM, and don't expect the 64 bit limit to
be a problem any time soon.

Andy

Richard Damon

unread,
Aug 9, 2021, 8:57:31 PM8/9/21
to
Since for 16 bit size_t ptrdiff_t has a requirement to be at least 17
bits, I think the committee decided that it was unlikely for an
application with a 32-bit address space to have over half of its memory
dedicated to a single char array (the only real case where you can have
this issue).

Machines with 16 bit size_t are going to need to be segmented systems,
so having over 32k in an array is plausable, so ptrdif_t needs to be big
enough for that.

From what I remember of minumum limits, a machine with only 64k of
address space can't really be fully conforming, so if that machine makes
ptrdif_t be only 16 bits to save space, that isn't the only non-conformity.

Maybe if 32-bit environments that actually used the segments like was
used in the 16-bit segmented world to increase memory space were common,
then the standard might have required 33-bit ptrdif_t for those systems.

Scott Lurndal

unread,
Aug 9, 2021, 9:12:19 PM8/9/21
to
Richard Damon <Ric...@Damon-Family.org> writes:
>On 8/9/21 4:50 PM, Vir Campestris wrote:
>> On 09/08/2021 17:23, MrSpud_HG@_0b772d8ha3yjo0xb.edu wrote:
>>> I should have clarified in my origional post that I was refering to
>>> programming
>>> in grown up OS's on grown up CPUs. Not on DOS with 1970s x86
>>> segmentation.
>>
>> You're probably not aware then that the CD DS ES SS registers from that
>> 1970s x86 are still alive and well under the hood, even though they
>> often point to the same address space, and the offsets are 32 or 64 bit.
>>
>> What bothers me about ptr_diff_t is the size. The difference between two
>> 32-bit pointers is plus or minus 4GB. Which needs a 33 bit value :(
>>
>> It's the same with 64 bit.
>>
>> As it happens it's never been a practical problem as I've never had a 32
>> bit system with more than 2GB RAM, and don't expect the 64 bit limit to
>> be a problem any time soon.
>>
>> Andy
>
>Since for 16 bit size_t ptrdiff_t has a requirement to be at least 17
>bits, I think the committee decided that it was unlikely for an
>application with a 32-bit address space to have over half of its memory
>dedicated to a single char array (the only real case where you can have
>this issue).

In all the years that I've been using ptrdiff_t, every single case
has involved subtracting a base pointer from an element pointer;
the result is always positive and always within positive value range
of ptrdiff_t.

And given the split virtual address space of most modern operating
environments, the maximum unsigned difference for a user application will
generally be representable in 31 bits anyway absent buggy code.

It's not like programmers are willy-nilly subtracting random
pointers and expecting a meaningful result.

Richard Damon

unread,
Aug 9, 2021, 9:22:09 PM8/9/21
to
I have had cases where I subtract pointers to elements in the array
where the pointers might be in the reversed order, and thus I get a
negative difference.

I will admit, that it is less common than the case you describe, but it
does happen.

James Kuyper

unread,
Aug 9, 2021, 11:20:45 PM8/9/21
to
On 8/9/21 12:23 PM, MrSpud_HG@_0b772d8ha3yjo0xb.edu wrote:
> On Mon, 9 Aug 2021 12:16:46 -0400
> James Kuyper <james...@alumni.caltech.edu> wrote:
>> On 8/9/21 4:19 AM, MrSpud_ifhov@nldls6_1kg3nl2qnwv.biz wrote:
>>> On Sun, 8 Aug 2021 09:23:46 -0400
>>> James Kuyper <james...@alumni.caltech.edu> wrote:
>>> Then don't use ptrdiff_t. I suspect like most people I wasn't even aware
>>> it existed.
>>
>> I hope you're wrong about that - it's a fairly basic aspect of C, like
>> [u]intptr_t or size_t. But my degree was in Physics, not CS, so I don't
>> have any idea how bad the average CS major's education might have been.
>
> Ooo look at you, supercilious and patronising all in one go. Well done,have
> a scooby snack.

I'm sorry - people who confess to being unfamiliar with fairly basic
aspects of C tend to produce feelings of superiority in me.

>> If you never need to store the result of pointer subtractions, there's
>> no need to use ptrdiff_t. If you're calculating the difference between
>> pointers, and know enough about the calculation to at least roughly
>
> Any parser beyond the most basic needs to do pointer arithmetic and I've
> written a LOT of them.

Unless your parsers were successfully ported to the kinds of platforms I
was talking about, that's not particularly relevant to the point I was
making.

>> type to store the result. ptrdiff_t is needed only if you have no other
>> information to go on about the size of a pointer difference that you
>> need to store - and if ptrdiff_t cannot represent such a value, it's
>> likely to be the case that there is no integer type supported by that
>> implementation that can. If there were such a type, it would have been
>> used as ptrdiff_t.
>
> I should have clarified in my origional post that I was refering to programming
> in grown up OS's on grown up CPUs. Not on DOS with 1970s x86 segmentation.

As I said, I'm not personally familiar with such platforms, but the
impression I get is that they tend to be small embedded CPUs - which
would explain my total lack of familiarity with them. Embedded
programming is a large and rapidly growing part of the C/C++ programming
world, but one that has never played any part in any job I've ever held.

>>> So long as mathematical operations can be done on the pointer types (which
>>> is a given or they'd be no use as pointers) then they are de facto integer
>>> types and that statement is wrong.
>>
>> Yes, but when these issues come into play, the relevant mathematical
>> operations cannot be done on pointer types. The result of a pointer
>
> They can in *nix which is good enough for me.

Yes, but the intended scope of the C++ standard is considerably broader
than what's good enough for you.

Juha Nieminen

unread,
Aug 10, 2021, 1:22:50 AM8/10/21
to
Alf P. Steinbach <alf.p.s...@gmail.com> wrote:
> True, but. `std::less` & family impose a total order on pointers of the
> same type, where for directly comparable pointers (i.e. within the same
> array or single-object-as-size-one-array) that order is the same as the
> `<` order. That implies that internally these functions compute some
> absolute representation of pointers, mapping segment selectors to lower
> level addresses as necessary.

Since pointers, like any object, by necessity have a bit representation,
they can be compared and strictly ordered. However, that doesn't mean
that their difference is meaningful (or that ptrdiff_t needs to be
unambiguous for every single pair of pointers you subtract from
each other).

Juha Nieminen

unread,
Aug 10, 2021, 1:29:30 AM8/10/21
to
Vir Campestris <vir.cam...@invalid.invalid> wrote:
> On 09/08/2021 17:23, MrSpud_HG@_0b772d8ha3yjo0xb.edu wrote:
>> I should have clarified in my origional post that I was refering to programming
>> in grown up OS's on grown up CPUs. Not on DOS with 1970s x86 segmentation.
>
> You're probably not aware then that the CD DS ES SS registers from that
> 1970s x86 are still alive and well under the hood, even though they
> often point to the same address space, and the offsets are 32 or 64 bit.

I might be completely wrong on this, but I remember reading somewhere that
thread-local variables are often implemented (in x86 systems) by using
one of the segment registers, and having it different for each thread.
This way all the code can address these thread-local variables with the
exact same memory address, but they will still be pointing to different
variables.

If that's the case then it means that the segment registers are still
actually useful.

MrSpud_p...@6urvt16b9fax.gov.uk

unread,
Aug 10, 2021, 3:23:22 AM8/10/21
to
On Mon, 9 Aug 2021 21:50:20 +0100
Vir Campestris <vir.cam...@invalid.invalid> wrote:
>On 09/08/2021 17:23, MrSpud_HG@_0b772d8ha3yjo0xb.edu wrote:
>> I should have clarified in my origional post that I was refering to
>programming
>> in grown up OS's on grown up CPUs. Not on DOS with 1970s x86 segmentation.
>
>You're probably not aware then that the CD DS ES SS registers from that
>1970s x86 are still alive and well under the hood, even though they

And invisible in 32 bit protected mode and not used at all in 64 bit so
irrelevant to this discussion.


MrSpud_m...@u6w.biz

unread,
Aug 10, 2021, 3:27:12 AM8/10/21
to
On Mon, 9 Aug 2021 23:20:27 -0400
James Kuyper <james...@alumni.caltech.edu> wrote:
>On 8/9/21 12:23 PM, MrSpud_HG@_0b772d8ha3yjo0xb.edu wrote:
>> On Mon, 9 Aug 2021 12:16:46 -0400
>> James Kuyper <james...@alumni.caltech.edu> wrote:
>>> On 8/9/21 4:19 AM, MrSpud_ifhov@nldls6_1kg3nl2qnwv.biz wrote:
>>>> On Sun, 8 Aug 2021 09:23:46 -0400
>>>> James Kuyper <james...@alumni.caltech.edu> wrote:
>>>> Then don't use ptrdiff_t. I suspect like most people I wasn't even aware
>>>> it existed.
>>>
>>> I hope you're wrong about that - it's a fairly basic aspect of C, like
>>> [u]intptr_t or size_t. But my degree was in Physics, not CS, so I don't
>>> have any idea how bad the average CS major's education might have been.
>>
>> Ooo look at you, supercilious and patronising all in one go. Well done,have
>> a scooby snack.
>
>I'm sorry - people who confess to being unfamiliar with fairly basic
>aspects of C tend to produce feelings of superiority in me.

In 25 years I've never ever seen that type used so spare me your BS. The only
ones with misplaced feelings of superiority is you.

>> Any parser beyond the most basic needs to do pointer arithmetic and I've
>> written a LOT of them.
>
>Unless your parsers were successfully ported to the kinds of platforms I
>was talking about, that's not particularly relevant to the point I was
>making.

They've been used on x86 and ARM on various OS's.

>As I said, I'm not personally familiar with such platforms, but the
>impression I get is that they tend to be small embedded CPUs - which
>would explain my total lack of familiarity with them. Embedded
>programming is a large and rapidly growing part of the C/C++ programming
>world, but one that has never played any part in any job I've ever held.

Unless you're using some prehistoric 16 bit (or less) PIC then all pointers in
embedded C will be 32 bit linear.


MrSpud_1...@bya886olr5o8dhu9.ac.uk

unread,
Aug 10, 2021, 3:28:29 AM8/10/21
to
I don't see how that'll work in 32 or particularly 64 bit.

Juha Nieminen

unread,
Aug 10, 2021, 3:52:19 AM8/10/21
to
MrSpud_m...@u6w.biz wrote:
> In 25 years I've never ever seen that type used so spare me your BS. The only
> ones with misplaced feelings of superiority is you.

I think that you know perfectly well that with that kind of hostile language
and attitude you are not going to persuade anybody, nor are you going to get
much support, neither from the person you are talking to, nor pretty much
anybody else. Thus, I think you know perfectly well that by using that kind
of language and attitude, you are making a pariah of yourself here.

You could express your statements in a neutral way, but instead you
willingly choose to denigrate people and be very antagonistic.

So I have to wonder why. What psychological issue makes you want to become
a hated pariah? Why do you willingly antagonize people? Why do you want
them to find you disgusting and unlikeable? Is it to get some kind of
sense of being a victim, a martyr?

Perhaps some self-reflection could do you some good. In the long run,
being nicer will make also you yourself happier.

MrSpud_...@lxz.tv

unread,
Aug 10, 2021, 3:59:41 AM8/10/21
to
On Tue, 10 Aug 2021 07:51:58 -0000 (UTC)
Juha Nieminen <nos...@thanks.invalid> wrote:
>MrSpud_m...@u6w.biz wrote:
>> In 25 years I've never ever seen that type used so spare me your BS. The
>only
>> ones with misplaced feelings of superiority is you.
>
>I think that you know perfectly well that with that kind of hostile language
>and attitude you are not going to persuade anybody, nor are you going to get
>much support, neither from the person you are talking to, nor pretty much
>anybody else. Thus, I think you know perfectly well that by using that kind
>of language and attitude, you are making a pariah of yourself here.

Ah, a nice bit of early morning irony to go with my coffee :)

>You could express your statements in a neutral way, but instead you
>willingly choose to denigrate people and be very antagonistic.

I suggest you read what he wrote. I'm simply replying in kind.

>So I have to wonder why. What psychological issue makes you want to become
>a hated pariah? Why do you willingly antagonize people? Why do you want
>them to find you disgusting and unlikeable? Is it to get some kind of
>sense of being a victim, a martyr?

If you wish to try out your cod psychology I would suggest you get at least
a vague clue first.

>Perhaps some self-reflection could do you some good. In the long run,
>being nicer will make also you yourself happier.

Aww, bless you :)

David Brown

unread,
Aug 10, 2021, 5:21:12 AM8/10/21
to
That makes it clear that you are so ignorant about the world outside of
*nix that you have no idea how ignorant you are. If all your
programming world is within the specific segment of *nix systems, that's
fine - lucky you, some might say. But please understand there is a
world outside of that, where C and C++ are heavily used but many of the
assumptions you make do not hold.

You'd do well to learn from James here - he knows little about the world
of small-system embedded programming, but he /knows/ he knows little
about it - he knows it is important, and knows it can be different from
the systems he usually works with, and knows it is one of the reasons
for some of the flexibilities in the C and C++ standards.

Juha Nieminen

unread,
Aug 10, 2021, 8:25:05 AM8/10/21
to
MrSpud_...@lxz.tv wrote:
> If you wish to try out your cod psychology I would suggest you get at least
> a vague clue first.
>
>>Perhaps some self-reflection could do you some good. In the long run,
>>being nicer will make also you yourself happier.
>
> Aww, bless you :)

It's not surprising that you would struggle against this kind of advise,
but perhaps some time in the next years you will think about it more
seriously.

Being nice and polite to people is genuinely more rewarding and gives
yourself more happiness in the long run than being rude, aggressive and
confrontational, which will just make you miserable in the long run.
Mockery might give you immediate satisfaction, but in the long run it's
just going to destroy your own happiness. You might not believe it now,
but you will believe it eventually.

Just think about it. It's never too late to learn and change.

Tim Rentsch

unread,
Aug 10, 2021, 9:11:55 AM8/10/21
to
Vir Campestris <vir.cam...@invalid.invalid> writes:

[..stuff about 64 bit systems removed..]

> What bothers me about ptr_diff_t is the size. The difference between
> two 32-bit pointers is plus or minus 4GB. Which needs a 33 bit value
> :(
>
> As it happens it's never been a practical problem as I've never had a
> 32 bit system with more than 2GB RAM,

It isn't necessary for there to be more than 2GB of RAM for
problems with ptrdiff_t to manifest (in 32-bit linux). A call to
malloc() will gladly return a memory area larger than all of RAM
if there is swap space to hold it.

Tim Rentsch

unread,
Aug 10, 2021, 9:41:37 AM8/10/21
to
MrSpud_HG@_0b772d8ha3yjo0xb.edu writes:

> On Mon, 9 Aug 2021 12:16:46 -0400
> James Kuyper <james...@alumni.caltech.edu> wrote:
>
>> On 8/9/21 4:19 AM, MrSpud_ifhov@nldls6_1kg3nl2qnwv.biz wrote:

[...]

>>> So long as mathematical operations can be done on the pointer
>>> types (which is a given or they'd be no use as pointers) then
>>> they are de facto integer types and that statement is wrong.
>>
>> Yes, but when these issues come into play, the relevant
>> mathematical operations cannot be done on pointer types.
>> The result of a pointer
>
> They can in *nix which is good enough for me.

A few comments...

One, in many cases C pointers are represented internally as what
are basically integers, but the C standard says pointer types are
distinct from integer types, and the rules for operations on
pointer types, in particular subtraction of pointer values, are
specifed in terms of pointers and arrays and not in terms of
integer values. The rules for pointer subtraction depend on the
range of the implementation-chosen type ptrdiff_t.

Two, the specific values used for things like ptrdiff_t are
determined by the particular C implementation being used, not the
operating system. The target OS may influence some choices made
by the implementation, but it is still the implementation's
choice whether to observe those influences.

Three, I can tell you from first-hand experience that problems
related to the range of ptrdiff_t can and do occur in ordinary C
code running on a 32-bit linux system, using gcc to compile.

Bo Persson

unread,
Aug 10, 2021, 10:13:31 AM8/10/21
to
There still has to be a contiguous address space that is free of already
allocated memory blocks and not in a range reserved by the operating
system.

I'm no Linux expert, but on Windows configured for LargeAddressAware
programs, with 3 GB user + 1 GB OS, it is extremely unlikely to find a
2.x GB hole for the heap. Or that you have a program that needs exactly
1 such block for byte sized operations.


Tim Rentsch

unread,
Aug 10, 2021, 10:14:19 AM8/10/21
to
Richard Damon <Ric...@Damon-Family.org> writes:

> On 8/9/21 4:50 PM, Vir Campestris wrote:

[...]

>> As it happens it's never been a practical problem as I've never had a 32
>> bit system with more than 2GB RAM, and don't expect the 64 bit limit to
>> be a problem any time soon.
>
> Since for 16 bit size_t ptrdiff_t has a requirement to be at least 17
> bits, I think the committee decided that it was unlikely for an
> application with a 32-bit address space to have over half of its memory
> dedicated to a single char array (the only real case where you can have
> this issue).

A few weeks ago I happened to write a small C program that does
part of its work in a single large dynamically allocated memory
area. In that memory area there are character arrays, character
pointers, and various offset values (unsigned integers). The
program runs just fine on a 64-bit linux system (in one case the
memory area allocated was about 180 GB).

Prompted by this discussion, I took the program and tried
compiling and running it on a 32-bit linux system. The memory
area allocated was a little over 2 GB, and the program fell down
miserably. Limiting the size of the memory area allocated to
PTRDIFF_MAX, with no other changes, got it working again. So
problems due to the limited range of ptrdiff_t definitely can
occur in practical programs.

> [...]
>
> From what I remember of minumum limits, a machine with only 64k
> of address space can't really be fully conforming, [...]

The rule for being able to have a 64 KB object applies only to
hosted implementations. It's easy to make a fully conforming
freestanding implementation for a machine with limited address
space, even one much less than 64 KB.

James Kuyper

unread,
Aug 10, 2021, 10:27:34 AM8/10/21
to
On 8/10/21 3:26 AM, MrSpud_m...@u6w.biz wrote:
> On Mon, 9 Aug 2021 23:20:27 -0400
> James Kuyper <james...@alumni.caltech.edu> wrote:
>> On 8/9/21 12:23 PM, MrSpud_HG@_0b772d8ha3yjo0xb.edu wrote:
...
>>> Any parser beyond the most basic needs to do pointer arithmetic and I've
>>> written a LOT of them.
>>
>> Unless your parsers were successfully ported to the kinds of platforms I
>> was talking about, that's not particularly relevant to the point I was
>> making.
>
> They've been used on x86 and ARM on various OS's.

So, not the kinds of platforms I was talking about.
Keep in mind that you'd only run into trouble doing pointer arithmetic
on pointers into arrays with more than PTRDIFF_MAX elements. Did your
parsers ever need to parse something that big? PTRDIFF_MAX is required
to be at least 65535, but it's the actual value on the platform you're
compiling for that matters.

James Kuyper

unread,
Aug 10, 2021, 10:28:03 AM8/10/21
to
I was hoping someone with more relevant experience would respond.
However, I was, in particular, hoping someone would respond with an
example. Do you know of any particular modern system, preferably as
widely used as possible, where ptrdiff_t was not big enough to store all
possible pointer differences, or where [u]intptr_t is not supported?

Mike Terry

unread,
Aug 10, 2021, 10:34:31 AM8/10/21
to
One of the segment registers is configured by Windows to address the
thread environment block (TEB) which is thread specific. I'm pretty
sure that it was FS register on 32-bit windows, but maybe that's changed
for 64-bit. The scheduler context switches between threads saves and
restores the segment registers, so the TEB will always be correctly
addressed as threads switch.

For example, for 32-bit Windows, you may notice as part of the function
prolog code, the pointer at FS:0 is updated, then restored on return -
that's the structured exception handling frame pointer, which is
per-thread as you'd expect. (Similarly the last error stored by many
APIs and retrieved through Get/SetLastError() is somewhere in the TEB.


Regards,
Mike.

Scott Lurndal

unread,
Aug 10, 2021, 10:55:14 AM8/10/21
to
Juha Nieminen <nos...@thanks.invalid> writes:
>Alf P. Steinbach <alf.p.s...@gmail.com> wrote:
>> True, but. `std::less` & family impose a total order on pointers of the
>> same type, where for directly comparable pointers (i.e. within the same
>> array or single-object-as-size-one-array) that order is the same as the
>> `<` order. That implies that internally these functions compute some
>> absolute representation of pointers, mapping segment selectors to lower
>> level addresses as necessary.
>
>Since pointers, like any object, by necessity have a bit representation,
>they can be compared and strictly ordered.

I would argue against the latter part of your
statement by referring to various extant
and future architectures where your statement is not true, some of
which even have C compilers.

I'm familiar with one extant architecture (Clearpath) where pointers
are not as you describe, and one potential future architecture
(which is currently under NDA, but similar to the CHERI research
project) where the pointers are not simple
offsets from the start of memory.

Scott Lurndal

unread,
Aug 10, 2021, 10:57:57 AM8/10/21
to
Juha Nieminen <nos...@thanks.invalid> writes:
>Vir Campestris <vir.cam...@invalid.invalid> wrote:
>> On 09/08/2021 17:23, MrSpud_HG@_0b772d8ha3yjo0xb.edu wrote:
>>> I should have clarified in my origional post that I was refering to programming
>>> in grown up OS's on grown up CPUs. Not on DOS with 1970s x86 segmentation.
>>
>> You're probably not aware then that the CD DS ES SS registers from that
>> 1970s x86 are still alive and well under the hood, even though they
>> often point to the same address space, and the offsets are 32 or 64 bit.
>
>I might be completely wrong on this, but I remember reading somewhere that
>thread-local variables are often implemented (in x86 systems) by using
>one of the segment registers, and having it different for each thread.

That is the case for linux on x86 %fs is used for the user-mode
thread-local storage base and %gs is used for the kernel-model
per-cpu storage.


>
>If that's the case then it means that the segment registers are still
>actually useful.

The modern architectures (e.g. x86_64) treat them as general purpose
registers for all intents and purposes - the descriptor tables (gdt/ldt)
don't come into play (see, for example, the SWAPGS instruction).

Scott Lurndal

unread,
Aug 10, 2021, 10:59:20 AM8/10/21
to
If you download a copy of the Intel (or AMD) processor manual set,
you'll find sufficient information within to educate you on this
particular topic. It works and is widely used.

Or, in the usenet vernacular, RTFM.

Scott Lurndal

unread,
Aug 10, 2021, 11:02:59 AM8/10/21
to
Funny, I recall a processor feature addition from 2005 which
honored the DS limit register in long mode. Added specifically
to support XEN (recall that AMD added long mode and Intel adopted it
later) before SVM was introduced to the Opterons.

Scott Lurndal

unread,
Aug 10, 2021, 11:04:05 AM8/10/21
to
Do recall the split address space in 32-bit intel/amd systems, which
by default, limit the application to 2GB of virtual address space.

Malloc can't return more than the available user-mode VA space
allows.

David Brown

unread,
Aug 10, 2021, 11:44:58 AM8/10/21
to
I know of a system where ptrdiff_t is, like size_t, 16-bit - but where
there are pointer types that are 24-bit. The gcc port for the AVR is my
usual first choice of example here, since it is a gcc target and the
microcontrollers concerned are in common use, with new devices being
developed on a regular basis. (i.e., they are not some brain-dead
outdated cpu with only sort-of-C compilers such as the 8051 or 8086 -
these are modern devices with modern C and C++ tools, albeit with only
limited language support libraries for C++). In particular, comparing
(or subtracting) pointers from different address spaces is completely
meaningless, and if you have two 24-bit "__memx" pointers that target
different physical memory types in the chip, then subtracting them is
not going to fit in a 16-bit ptrdiff_t.

<https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html>


For most C and C++ implementations, ptrdiff_t is a signed type of the
same width as the full address space of the device (ignoring devices
like the AVR with multiple independent address spaces). If you have
full control over the linking setup and other aspects of making a binary
for the device - as you often do for freestanding embedded code - you
can easily construct arrays that are bigger than half the address space.
Subtracting a pointer to the first element of such an array from a
pointer to its last object will give you an integer result that is too
big for ptrdiff_t. There is no constraint error, but it is undefined
behaviour (6.5.6p9).

A quick test using godbolt.org with 32-bit ARM compilers shows that gcc
will not compile "extern char data[0xc0000000];", but clang for these
devices accepts it. ptrdiff_t, however, is not big enough to store the
differences between all pointers to elements within that one array.


I don't know of any systems where uintptr_t does not exist, sorry.
Perhaps some "capability pointer" systems would count, but I am not at
all familiar with them.



MrSpud_8...@l6zhsgj_v6t3dbj.tv

unread,
Aug 10, 2021, 11:50:30 AM8/10/21
to
On Tue, 10 Aug 2021 11:20:52 +0200
David Brown <david...@hesbynett.no> wrote:
>On 10/08/2021 09:26, MrSpud_m...@u6w.biz wrote:
>> On Mon, 9 Aug 2021 23:20:27 -0400
>> James Kuyper <james...@alumni.caltech.edu> wrote:
>
>>> As I said, I'm not personally familiar with such platforms, but the
>>> impression I get is that they tend to be small embedded CPUs - which
>>> would explain my total lack of familiarity with them. Embedded
>>> programming is a large and rapidly growing part of the C/C++ programming
>>> world, but one that has never played any part in any job I've ever held.
>>
>> Unless you're using some prehistoric 16 bit (or less) PIC then all pointers
>in
>> embedded C will be 32 bit linear.
>>
>
>That makes it clear that you are so ignorant about the world outside of
>*nix that you have no idea how ignorant you are. If all your

Oh ok. I guess all my PIC development not to mention arduino counts for
nothing then? If you say so.


It is loading more messages.
0 new messages