Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Is this undefined behavior?

289 views
Skip to first unread message

Chris M. Thomasson

unread,
Jun 15, 2020, 3:04:04 AM6/15/20
to
I am worried about the header::get_buf member function:

struct header
{
std::size_t m_size;

char* get_buf()
{
return reinterpret_cast<char*>(this + 1);
}
};


Here is the full code:
__________________
#include <iostream>
#include <cstddef>
#include <cstdlib>
#include <cstring>


struct header
{
std::size_t m_size;

char* get_buf()
{
return reinterpret_cast<char*>(this + 1);
}
};


header* header_alloc(std::size_t size)
{
header* const h = reinterpret_cast<header*>(std::malloc(sizeof(*h) +
size));

if (h)
{
h->m_size = size;
}

return h;
}


void header_free(header* h)
{
std::free(h);
}


int main()
{
{
header* h = header_alloc(123);

if (h)
{
char* buf1 = h->get_buf();

std::strcpy(buf1, "Hello");

std::cout << "buf1 = " << buf1 << "\n";
std::cout << "h->m_size = " << h->m_size << "\n\n";


char* buf2 = h->get_buf();

std::strcat(buf2, " World");

std::cout << "buf2 = " << buf2 << "\n";
std::cout << "h->m_size = " << h->m_size << "\n\n";

header_free(h);

if ((h = header_alloc(42)))
{
buf1 = h->get_buf();

std::strcpy(buf1, "FortyTwo");

std::cout << "buf1 = " << buf1 << "\n";
std::cout << "h->m_size = " << h->m_size << "\n";

header_free(h);
}
}
}

return 0;
}
______________________

How infested with UB is this? Any bugs?

Sam

unread,
Jun 15, 2020, 7:07:19 AM6/15/20
to
Chris M. Thomasson writes:

> I am worried about the header::get_buf member function:
>
> struct header
> {
> std::size_t m_size;
>
> char* get_buf()
> {
> return reinterpret_cast<char*>(this + 1);
> }
> };
>
> header* header_alloc(std::size_t size)
> {
> header* const h = reinterpret_cast<header*>(std::malloc(sizeof(*h) +
> size));
>
> How infested with UB is this? Any bugs?

Eh, there's probably some teeny amount of UB here, but I would not worry
about it too much, the compilers' behavior should not be too surprising here.

And gcc even gives its formal blessing to the following:

https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html



Juha Nieminen

unread,
Jun 15, 2020, 12:45:02 PM6/15/20
to
Chris M. Thomasson <chris.m.t...@gmail.com> wrote:
> header* const h = reinterpret_cast<header*>(std::malloc(sizeof(*h) +
> size));

I can't think of any reason why it would have UB.

std::malloc() is guaranteed to allocate with an alignment that's
fine for even the largest elementary type (which std::size_t
usually is), so it should just work. (If it didn't, then malloc
couldn't be used to allocate anything at all safely.)

I can't think of any reason why "overallocating" like that would
be UB. After all, std::malloc() just takes a byte amount as a
parameter. It doesn't even know what the type you are allocating
it for is.

Since you are using a char* to point to the extra space, I can't
think of any reason for UB either (as long as you stay inbounds
of the allocated block, obviously). Even if you used a std::size_t*
(or any type that's an exact fraction of that) it should be fine
as well.

The only thing that bothers me about your code is not that it would
be incorrect, just that it's error-prone, as you are using raw
pointers, requiring you to manually free the allocated memory
(although it may well be that you deliberately extracted this code
from its RAII context to simplify it for the sake of example).

It's not wrong per se to use raw pointers and having to manually
free() the allocated memory. It's just that it's very error-prone
(especially if you have exceptions enabled, as they could cause
functions to be exited from pretty much anywhere).

For very small programs it may be completely fine, though. No need
to over-engineer it if you never intend to use it in a larger program.

Paavo Helde

unread,
Jun 15, 2020, 1:10:29 PM6/15/20
to
15.06.2020 10:03 Chris M. Thomasson kirjutas:
> I am worried about the header::get_buf member function:
>
> struct header
> {
>   std::size_t m_size;
>
>   char* get_buf()
>   {
>     return reinterpret_cast<char*>(this + 1);
>   }
> };

Interpreting a malloc'ed piece of memory as an array of objects is a
very old practice and standards have been carefully worded to allow that.

One slight problem might appear if size<sizeof(header). In that case
there is no room in the buffer for an header object at address this+1,
meaning that dereferencing the this+1 pointer would be UB. I'm not quite
sure if converting it to a char* pointer (plus later dereferencing it)
would be formally UB or not, the standard seems to speak only about
conversions of pointers pointing to some object.

To be on the safe side you could write instead:

return reinterpret_cast<char*>(this) + sizeof(*this);

I think there is no UB here.

> header* header_alloc(std::size_t size)
> {
>   header* const h = reinterpret_cast<header*>(std::malloc(sizeof(*h) +
> size));

static_cast suffices here, no need to use reinterpret_cast.

For more complicated classes one would need to construct the object in
the buffer with placement new and take care to use the return value of
placement new for referring to the object (or use std::launder() on the
original buffer):

header* header_alloc(std::size_t size)
{
void* p = std::malloc(sizeof(header) + size);
if (p)
{
return new (p) header {size};
} else {
return nullptr;
}
}

Scott Newman

unread,
Jun 15, 2020, 1:15:18 PM6/15/20
to
> return reinterpret_cast<char*>(this) + sizeof(*this);

LOL, only esoteric idiots around here.
If that would make any difference than the code shown before ...
On which machine ? Never !

Paavo Helde

unread,
Jun 15, 2020, 1:28:43 PM6/15/20
to
We are talking about formal correctness here, the original code would
likely work fine with all currently existing implementations. And I
admit I'm not sure if there is a formal difference at all.

I hope you are aware there is a difference between

&vector[0] + vector.size()

and

&vector[vector.size()]

(assuming a non-empty vector).

The latter form will abort the program with some existing C++
implementations, so it's not only a formal difference.

Scott Newman

unread,
Jun 15, 2020, 1:43:03 PM6/15/20
to
> We are talking about formal correctness here, the original code would
> likely work fine with all currently existing implementations. And I
> admit I'm not sure if there is a formal difference at all.
>
> I hope you are aware there is a difference between
> &vector[0] + vector.size()
> and
> &vector[vector.size()]
> (assuming a non-empty vector).
> The latter form will abort the program with some existing C++
> implementations, so it's not only a formal difference.

That's not what we were discussing. It's just this:
> return reinterpret_cast<char*>(this) + sizeof(*this);
vs.
> return reinterpret_cast<char*>(this + 1);

Even if you use C-style casts in the latter case there won't be ever
any implementation which would misbehave here. Your code-experiments
are esoterics.

Chris M. Thomasson

unread,
Jun 15, 2020, 6:53:01 PM6/15/20
to
On 6/15/2020 9:44 AM, Juha Nieminen wrote:
> Chris M. Thomasson <chris.m.t...@gmail.com> wrote:
>> header* const h = reinterpret_cast<header*>(std::malloc(sizeof(*h) +
>> size));
>
> I can't think of any reason why it would have UB.
>
> std::malloc() is guaranteed to allocate with an alignment that's
> fine for even the largest elementary type (which std::size_t
> usually is), so it should just work. (If it didn't, then malloc
> couldn't be used to allocate anything at all safely.)
>
> I can't think of any reason why "overallocating" like that would
> be UB. After all, std::malloc() just takes a byte amount as a
> parameter. It doesn't even know what the type you are allocating
> it for is.
>
> Since you are using a char* to point to the extra space, I can't
> think of any reason for UB either (as long as you stay inbounds
> of the allocated block, obviously). Even if you used a std::size_t*
> (or any type that's an exact fraction of that) it should be fine
> as well.
>
> The only thing that bothers me about your code is not that it would
> be incorrect, just that it's error-prone, as you are using raw
> pointers, requiring you to manually free the allocated memory
> (although it may well be that you deliberately extracted this code
> from its RAII context to simplify it for the sake of example).

I need at least one raii object that can wrap the existing code, to call
header_alloc on ctor, and header_free on dtor. The power of RAII is that
it can most likely be adapted to existing explicit apis, create/destroy.
Automate it. One of my favorite examples is good ol' ScopeGuard. :^)


>
> It's not wrong per se to use raw pointers and having to manually
> free() the allocated memory. It's just that it's very error-prone
> (especially if you have exceptions enabled, as they could cause
> functions to be exited from pretty much anywhere).
>
> For very small programs it may be completely fine, though. No need
> to over-engineer it if you never intend to use it in a larger program.
>

raii would not hurt here, imagine that the code is as it is. Well, we
can wrap it up. Imvho, its a nice convenient feature of C++.

Chris M. Thomasson

unread,
Jun 15, 2020, 7:04:36 PM6/15/20
to
For some reason this reminds me of some of my older memory allocators
where the header was aligned on a large boundary. I could do it two
ways. One was round down to get at the header, and another experiment
rounded down then subtracted the size of the header.

Take a cache line allocator, if the header fits in a cache line, then it
can be the first element in an array of lines.

Chris M. Thomasson

unread,
Jun 15, 2020, 7:25:20 PM6/15/20
to
On 6/15/2020 4:04 PM, Chris M. Thomasson wrote:
> On 6/15/2020 3:52 PM, Chris M. Thomasson wrote:
>> On 6/15/2020 9:44 AM, Juha Nieminen wrote:
>>> Chris M. Thomasson <chris.m.t...@gmail.com> wrote:
[...]
> For some reason this reminds me of some of my older memory allocators
> where the header was aligned on a large boundary. I could do it two
> ways. One was round down to get at the header, and another experiment
> rounded down then subtracted the size of the header.
>
> Take a cache line allocator, if the header fits in a cache line, then it
> can be the first element in an array of lines.

Check this shi% out:
_____________________________
#include <iostream>
#include <new>
#include <cassert>
#include <cstdlib>
#include <cstddef>
#include <cstdint>


// Doctor Hackinstein!
#define CT_RALLOC_ALIGN_UP(mp_ptr, mp_align) \
((unsigned char*)( \
(((std::uintptr_t)(mp_ptr)) + ((mp_align) - 1)) \
& ~(((mp_align) - 1)) \
))

#define CT_RALLOC_ALIGN_ASSERT(mp_ptr, mp_align) \
(((unsigned char*)(mp_ptr)) == CT_RALLOC_ALIGN_UP(mp_ptr, mp_align))


// Hackish indeed!
template<std::size_t T_size>
struct ct_local_mem
{
unsigned char m_bytes[T_size];

template<typename T>
unsigned char* align_mem()
{
return align_mem<T>(alignof(T));
}

template<typename T>
unsigned char* align_mem(unsigned long align)
{
if (!align) align = alignof(T);

unsigned char* base = m_bytes;
unsigned char* aligned = CT_RALLOC_ALIGN_UP(base, align);

assert(CT_RALLOC_ALIGN_ASSERT(aligned, align));

std::size_t size = aligned - m_bytes;

if (size + sizeof(T) + align > T_size)
{
throw;
}

return aligned;
}
};



// A test program...
struct foo
{
int m_a;
int m_b;

foo(int a, int b) : m_a(a), m_b(b)
{
std::cout << this << "->foo::foo.m_a = " << m_a << "\n";
std::cout << this << "->foo::foo.m_b = " << m_b << "\n";
}

~foo()
{
std::cout << this << "->foo::~foo.m_a = " << m_a << "\n";
std::cout << this << "->foo::~foo.m_b = " << m_b << "\n";
}
};


int main()
{
{
// create some memory on the stack
ct_local_mem<4096> local = { '\0' };


// create a foo f
std::cout << "Naturally aligned...\n";
foo* f = new (local.align_mem<foo>(alignof(foo))) foo(1, 2);

// destroy f
f->~foo();



// create a foo f aligned on a large byte boundary
std::size_t alignment = 2048;
std::cout << "\n\nForced aligned on a " << alignment << " byte
boundary...\n";

// ensure the alignment of foo is okay with the boundary
assert((alignment % alignof(foo)) == 0);


f = new (local.align_mem<foo>(alignment)) foo(3, 4);

assert(CT_RALLOC_ALIGN_ASSERT(f, alignment));

// destroy f
f->~foo();
}

return 0;
}
_____________________________

;^)

Chris Vine

unread,
Jun 15, 2020, 7:25:48 PM6/15/20
to
On Mon, 15 Jun 2020 20:10:15 +0300
Paavo Helde <ees...@osa.pri.ee> wrote:
> 15.06.2020 10:03 Chris M. Thomasson kirjutas:
> > I am worried about the header::get_buf member function:
> >
> > struct header
> > {
> >   std::size_t m_size;
> >
> >   char* get_buf()
> >   {
> >     return reinterpret_cast<char*>(this + 1);
> >   }
> > };
>
> Interpreting a malloc'ed piece of memory as an array of objects is a
> very old practice and standards have been carefully worded to allow that.

Unfortunately they haven't. It is a major fail in the standard that
pointer arithmetic on memory obtained by malloc rather than by the
new char[] expression is technically undefined behaviour. See
https://stackoverflow.com/questions/60465235/does-stdunitialized-copy-have-undefined-behavior
However practical implementations will allow it for reasonable uses.

Technically, constructing the 'header' object in the malloc'ed buffer
is also reputed to be undefined behaviour if you do it otherwise that
through placement new, even though 'header' is a trivial type. As a
consequence 'h->m_size = size' in header_alloc is technically defective
although I imagine most compilers will accept it. There was an
exchange on this newsgroup about it a year or so ago. Having
constructed 'header' by placement new, I think 'this + 1' in get_buf()
becomes valid (a single object is treated as an array of one element
for this purpose). But even so dereferencing the result of 'this + 1'
is I think technically also undefined behaviour if the buffer at that
address has not been constructed there by placement new[], but it will
probably work in practice. Having constructed the buffer with
placement new[], pointer arithmetic (such as with the std::strcat in
the example code) becomes valid.

Altogether the code looks too tricksy for its own good.

Paavo Helde

unread,
Jun 16, 2020, 5:16:09 AM6/16/20
to
16.06.2020 02:26 Chris Vine kirjutas:
> On Mon, 15 Jun 2020 20:10:15 +0300
> Paavo Helde <ees...@osa.pri.ee> wrote:
>> 15.06.2020 10:03 Chris M. Thomasson kirjutas:
>>> I am worried about the header::get_buf member function:
>>>
>>> struct header
>>> {
>>>   std::size_t m_size;
>>>
>>>   char* get_buf()
>>>   {
>>>     return reinterpret_cast<char*>(this + 1);
>>>   }
>>> };
>>
>> Interpreting a malloc'ed piece of memory as an array of objects is a
>> very old practice and standards have been carefully worded to allow that.
>
> Unfortunately they haven't. It is a major fail in the standard that
> pointer arithmetic on memory obtained by malloc rather than by the
> new char[] expression is technically undefined behaviour. See
> https://stackoverflow.com/questions/60465235/does-stdunitialized-copy-have-undefined-behavior
> However practical implementations will allow it for reasonable uses.

Huh, good to know 99% of C programs formally contain UB when compiled as
C++.

>
> Technically, constructing the 'header' object in the malloc'ed buffer
> is also reputed to be undefined behaviour if you do it otherwise that
> through placement new, even though 'header' is a trivial type. As a
> consequence 'h->m_size = size' in header_alloc is technically defective
> although I imagine most compilers will accept it. There was an
> exchange on this newsgroup about it a year or so ago. Having
> constructed 'header' by placement new, I think 'this + 1' in get_buf()
> becomes valid (a single object is treated as an array of one element
> for this purpose). But even so dereferencing the result of 'this + 1'
> is I think technically also undefined behaviour if the buffer at that
> address has not been constructed there by placement new[],

Just to clarify: are you speaking about dereferencing 'this+1' as
another header object? Or about converting 'this+1' to a char* pointer
and dereferencing this? For the latter, at least the dereferencing
should be kosher if the initial buffer was allocated by new char[].

Manfred

unread,
Jun 16, 2020, 8:26:14 AM6/16/20
to
On 6/16/2020 11:15 AM, Paavo Helde wrote:
> 16.06.2020 02:26 Chris Vine kirjutas:
>> On Mon, 15 Jun 2020 20:10:15 +0300
>> Paavo Helde <ees...@osa.pri.ee> wrote:
>>> 15.06.2020 10:03 Chris M. Thomasson kirjutas:
>>>> I am worried about the header::get_buf member function:
>>>>
>>>> struct header
>>>> {
>>>>     std::size_t m_size;
>>>>
>>>>     char* get_buf()
>>>>     {
>>>>       return reinterpret_cast<char*>(this + 1);
>>>>     }
>>>> };
>>>
>>> Interpreting a malloc'ed piece of memory as an array of objects is a
>>> very old practice and standards have been carefully worded to allow
>>> that.
>>
>> Unfortunately they haven't.  It is a major fail in the standard that
>> pointer arithmetic on memory obtained by malloc rather than by the
>> new char[] expression is technically undefined behaviour.  See
>> https://stackoverflow.com/questions/60465235/does-stdunitialized-copy-have-undefined-behavior
>>
>> However practical implementations will allow it for reasonable uses.
>
> Huh, good to know 99% of C programs formally contain UB when compiled as
> C++.
>

This sounds strange to me as well. Especially since it was an explicit
goal of Bjarne that most valid C code would be valid C++ code as well.

I think I had already seen the proposal (P0593R6), and it puzzled me
that they had to go through such length to try and fix this.
To my ignorance, why shouldn't a type cast on the pointer returned by
malloc() suffice?
Cast operators are explicitly meant to tell the compiler: look, this
pointer, irrespective of where it comes from, is a pointer to T, deal
with it.
The rationale for the ugly notation of reinterpret_cast<> was explicitly
that of not making this statement accidental, so that the compiler knows
the programmer thought about it when writing the thing.

I know there are a lot of details for the devil to hide in with these
things, however what puzzles me is why this wasn't taken care of
properly in the standard in the first place.

>>
>> Technically, constructing the 'header' object in the malloc'ed buffer
>> is also reputed to be undefined behaviour if you do it otherwise that
>> through placement new, even though 'header' is a trivial type.  As a
>> consequence 'h->m_size = size' in header_alloc is technically defective
>> although I imagine most compilers will accept it.  There was an
>> exchange on this newsgroup about it a year or so ago.  Having
>> constructed 'header' by placement new, I think 'this + 1' in get_buf()
>> becomes valid (a single object is treated as an array of one element
>> for this purpose).  But even so dereferencing the result of 'this + 1'
>> is I think technically also undefined behaviour if the buffer at that
>> address has not been constructed there by placement new[],
>
> Just to clarify: are you speaking about dereferencing 'this+1' as
> another header object? Or about converting 'this+1' to a char* pointer
> and dereferencing this? For the latter, at least the dereferencing
> should be kosher if the initial buffer was allocated by new char[].

Why shouldn't a memory area returned by malloc() be good enough, as new
char[] supposedly is?
I mean, I understand that this may not currently be according to
standard (I didn't check), but even so I fail to understand what's the
problem with malloc.

Chris Vine

unread,
Jun 16, 2020, 8:34:13 AM6/16/20
to
On Tue, 16 Jun 2020 12:15:59 +0300
Paavo Helde <ees...@osa.pri.ee> wrote:
> 16.06.2020 02:26 Chris Vine kirjutas:
> > On Mon, 15 Jun 2020 20:10:15 +0300
> > Paavo Helde <ees...@osa.pri.ee> wrote:
> >> 15.06.2020 10:03 Chris M. Thomasson kirjutas:
> >>> I am worried about the header::get_buf member function:
> >>>
> >>> struct header
> >>> {
> >>>   std::size_t m_size;
> >>>
> >>>   char* get_buf()
> >>>   {
> >>>     return reinterpret_cast<char*>(this + 1);
> >>>   }
> >>> };
> >>
> >> Interpreting a malloc'ed piece of memory as an array of objects is a
> >> very old practice and standards have been carefully worded to allow that.
> >
> > Unfortunately they haven't. It is a major fail in the standard that
> > pointer arithmetic on memory obtained by malloc rather than by the
> > new char[] expression is technically undefined behaviour. See
> > https://stackoverflow.com/questions/60465235/does-stdunitialized-copy-have-undefined-behavior
> > However practical implementations will allow it for reasonable uses.
>
> Huh, good to know 99% of C programs formally contain UB when compiled as
> C++.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html
does shake your confidence in the competence of those who produce
the standard. You cannot produce a sensible home made std::vector
implementation in standard C++, nor implement an equivalent to
std::memcpy() for yourself, nor can you use std::uninitialized_copy
with a defined effect. The issue is still unresolved as of C++20. If
you want to see further ranting at deficiencies in the standard this is
quite amusing:
https://www.youtube.com/watch?v=IAdLwUXRUvg&list=PLHTh1InhhwT6KhvViwRiTR7I5s09dLCSw&index=32&t=0s
I love the tinfoil hat.

> > Technically, constructing the 'header' object in the malloc'ed buffer
> > is also reputed to be undefined behaviour if you do it otherwise that
> > through placement new, even though 'header' is a trivial type. As a
> > consequence 'h->m_size = size' in header_alloc is technically defective
> > although I imagine most compilers will accept it. There was an
> > exchange on this newsgroup about it a year or so ago. Having
> > constructed 'header' by placement new, I think 'this + 1' in get_buf()
> > becomes valid (a single object is treated as an array of one element
> > for this purpose). But even so dereferencing the result of 'this + 1'
> > is I think technically also undefined behaviour if the buffer at that
> > address has not been constructed there by placement new[],
>
> Just to clarify: are you speaking about dereferencing 'this+1' as
> another header object? Or about converting 'this+1' to a char* pointer
> and dereferencing this? For the latter, at least the dereferencing
> should be kosher if the initial buffer was allocated by new char[].

I am talking about converting 'this+1' to char* and dereferencing it as
char*. That would be OK if you applied placement new[] to that
region of memory first to establish an array of char there and then went
through std::launder (but see the tinfoil hat presentation for why in
fact placement new[], as opposed to placement new, is unusable). It
would also be OK if you abandoned std::malloc and constructed all your
memory with 'new unsigned char[sz]' or 'new std::byte[sz]', and cast
(this+1) to char* (I think that cast would be OK - I find the rules
quite opaque and clearly many on the standard committee share my lack
of a complete understanding).

James Kuyper

unread,
Jun 16, 2020, 10:02:29 AM6/16/20
to
On 6/16/20 5:15 AM, Paavo Helde wrote:
> 16.06.2020 02:26 Chris Vine kirjutas:
...
>> Unfortunately they haven't. It is a major fail in the standard that
>> pointer arithmetic on memory obtained by malloc rather than by the
>> new char[] expression is technically undefined behaviour. See
>> https://stackoverflow.com/questions/60465235/does-stdunitialized-copy-have-undefined-behavior
>> However practical implementations will allow it for reasonable uses.
>
> Huh, good to know 99% of C programs formally contain UB when compiled as
> C++.

They don't have to be compiled as C++ for that to come up. Pointer
arithmetic in C is defined solely in terms of positions in arrays. Until
dynamically allocated memory acquires an effective type that is an array
type, technically the only position in that memory that you can create a
pointer to is the first one.

C doesn't have new[], but in C there are four other ways to give
dynamically allocated memory an effective type that is an array type.
One is to write the entire array in one assignment expression using a
lvalue of a struct or union type that contains an array as a member. If
you only wrote individual members of that array, those members would
acquire the effective type of the array element, but that would not give
the entire block of memory an array type. You could also use memmove(),
memcpy(), or explicitly copying something as an array of char, despite
the fact that in all three of those cases the behavior is defined in
terms of pointer arithmetic accessing the elements of an array. That's
because those three methods are explicitly described as giving an object
with no declared type an effective type that matches the effective type
of the object being copied (C2011 6.5p6).

In practice, it's generally understood that dynamically allocated memory
can, for the purposes of pointer arithmetic, be treated as if it were an
array of the pointed-at type - but there's nothing in either standard
that actually says so.

Bo Persson

unread,
Jun 16, 2020, 10:12:31 AM6/16/20
to
Sorry, no. The compiler has seen all your code and can tell that nowhere
does it create a T object. So how could there be a pointer to T, when
there are no Ts?


Bo Persson

Scott Newman

unread,
Jun 16, 2020, 11:12:35 AM6/16/20
to
Am 15.06.2020 um 09:03 schrieb Chris M. Thomasson:
> I am worried about the header::get_buf member function:
> struct header
> {
>   std::size_t m_size;
>   char* get_buf()
>   {
>     return reinterpret_cast<char*>(this + 1);
>   }
> };

Don't care for the spec. It works with any compiler and
it will work with any compiler that will ever exist.

Manfred

unread,
Jun 16, 2020, 11:23:11 AM6/16/20
to
Short answer: because I say so:
T* p = reinterpret_cast<T*>(ptr);

Longer answer: I think you are missing my point (that you snipped); I
understand that this may be what the standard says (again, I didn't
check in detail) and how the language formally works.

However, my remark is about /why/ the standard states the rules of
object creation and lifetime so as to deny this possibility. In other
words, why did they make it this way?
I may add that at a first look it appears that allowing the following to
work for a non trivial type would be problematic:
T* p = const_cast<T*>(malloc(sizeof(*p));

But I fail to see this as an impossible problem when T is a C-style
struct, or anyhow a POD or a trivial type.
(Which means that yes, allowing this for a POD would make valid C code
of this kind valid C++ code as well)

By the way, denying the above would make reinterpret_cast (and even
static_cast from void* to T*) effectively unusable when it would be
actually needed.


>
>     Bo Persson
>

Paavo Helde

unread,
Jun 16, 2020, 11:29:51 AM6/16/20
to
16.06.2020 15:26 Manfred kirjutas:

> Cast operators are explicitly meant to tell the compiler: look, this
> pointer, irrespective of where it comes from, is a pointer to T, deal
> with it.

The proposal to legislate malloc and friends for object creation
"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html"
contains explicit remark, stressing the opposite:

"Note that a pointer reinterpret_cast is not considered sufficient to
trigger implicit object creation."

I gather all this fuss is about allowing type-based alias analysis.
Reinterpret_cast works directly against this idea.

Bo Persson

unread,
Jun 16, 2020, 11:53:45 AM6/16/20
to
Yes, the idea was to not have to decorate every other pointer with a
'restrict' keyword (like some other language does).


Bo Persson

Manfred

unread,
Jun 16, 2020, 11:54:08 AM6/16/20
to
That's why I find the proposal going in the direction of causing more
trouble than help.
True, reinterpret_cast breaks type-based alias analysis. However, it has
been part of the standard since day one, and since you can't do type
analysis on a buffer returned by malloc anyway, what's the problem with
that?

By the way, I managed to look into Bjarne's book (p. 1260):
"Note that malloc() etc. does not invoke constructors and free() doesn't
invoke destructors. Do not use these functions for types with
constructors and destructors. Aldo, memset() should never be used for
any type with a constructor".

That makes perfectly sense.
I wonder why the standard managed to come up with something that denies
/any/ object allocation with malloc (at least according to the first
example in the proposal).

Paavo Helde

unread,
Jun 16, 2020, 12:34:49 PM6/16/20
to
16.06.2020 18:53 Manfred kirjutas:
> On 6/16/2020 5:29 PM, Paavo Helde wrote:
>> 16.06.2020 15:26 Manfred kirjutas:
>>
>>> Cast operators are explicitly meant to tell the compiler: look, this
>>> pointer, irrespective of where it comes from, is a pointer to T, deal
>>> with it.
>>
>> The proposal to legislate malloc and friends for object creation
>> "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html"
>> contains explicit remark, stressing the opposite:
>>
>> "Note that a pointer reinterpret_cast is not considered sufficient to
>> trigger implicit object creation."
>>
>> I gather all this fuss is about allowing type-based alias analysis.
>> Reinterpret_cast works directly against this idea.
>
> That's why I find the proposal going in the direction of causing more
> trouble than help.
> True, reinterpret_cast breaks type-based alias analysis. However, it has
> been part of the standard since day one, and since you can't do type
> analysis on a buffer returned by malloc anyway, what's the problem with
> that?

With malloc there is no conceptual problem because it produces "fresh"
memory where there are no existing objects which might alias something
else. But this is a property of malloc, not a property of any cast. The
proposal attempts to get this bit correct.


Chris Vine

unread,
Jun 16, 2020, 12:44:31 PM6/16/20
to
On Tue, 16 Jun 2020 17:53:56 +0200
Manfred <non...@add.invalid> wrote:
[snip]
> That makes perfectly sense.
> I wonder why the standard managed to come up with something that denies
> /any/ object allocation with malloc (at least according to the first
> example in the proposal).

It doesn't say you cannot construct objects in raw memory provided by
malloc (or by operator new for that matter). It says that to do it you
have to use placement new. (The problem with pointer arithmetic on
non-arrays is a different but also annoying fault.)

I think this insistence on using placement new is wrong. If an object
is of trivial type it does not have a constructor or destructor which
does anything. It is a C-like type. For trivial types, I see no
reason why the standard cannot take the dynamic type of the memory to
be the type of the first trivial object placed in it, say by memcpy or
by assignment, which is how the C standard determines the "effective
type" of memory allocated by malloc. A standard which regards
inter-operability with C as important ought to provide for this.

In C++ this has now been overlain since C++17 with std::launder. Prior
to C++17 it used to be assumed that the strict aliasing rules regarding
dereferencing pointers were what you needed to comply with. Put
shortly, it used to be that if there really was an object of type T
properly constructed at address n, then you could reinterpret_cast n
to pointer to T and dereference that pointer. That seems fair and
logical. However this no longer applies generally. Even if you are
fully compliant with the strict aliasing rules, unless you fall within
one of the cases described as "pointer-interconvertible" then a
reinterpret_cast of pointers is no longer enough. You also have to use
std::launder. Amongst other things, a pointer to the first element of
an array is not pointer-interconvertible to pointer to array even
though they are mandated by the standard to have the same address
(there is an implicit cast in the reverse direction). You have to use
std::launder as well as a reinterpret_cast. Likewise you have to use
std::launder to access an object allocated in a buffer by placement new
other than through the pointer returned by placement new. What's the
point of this? The strict aliasing rules should be enough.

And it seems from P0593 that the standard has problems on other matters
of memory allocation. How on earth, in a serious technical standard,
can you mandate implementers to provided functions such as
std::unitialized_copy which, if implemented according to the
specification, have undefined behaviour?

Chris M. Thomasson

unread,
Jun 16, 2020, 6:34:15 PM6/16/20
to
Wrt the paper, a natural C program example:

struct X { int a, b; };
X *make_x() {
X *p = (X*)malloc(sizeof(struct X));
p->a = 1;
p->b = 2;
return p;
}

Would would one need to cast the return value of malloc to (X*)?

Chris M. Thomasson

unread,
Jun 16, 2020, 6:39:36 PM6/16/20
to
Why would one need to cast the return value? void* can do it as is,
right? In pure C...

Chris Vine

unread,
Jun 16, 2020, 7:04:55 PM6/16/20
to
On Tue, 16 Jun 2020 15:39:25 -0700
"Chris M. Thomasson" <chris.m.t...@gmail.com> wrote:
> > Wrt the paper, a natural C program example:
> >
> > struct X { int a, b; };
> > X *make_x() {
> >   X *p = (X*)malloc(sizeof(struct X));
> >   p->a = 1;
> >   p->b = 2;
> >   return p;
> > }
> >
> > Would would one need to cast the return value of malloc to (X*)?
>
> Why would one need to cast the return value? void* can do it as is,
> right? In pure C...

It is C-style code which will compile in C++ but has undefined
behaviour in C++ - the latter (undefined behaviour) being the point of
the example. As it happens it won't compile in C because it is missing
the struct tag before the various references to X* and there is no
typedef. The cast of the return value of malloc is not needed in C.

I don't think the example is intended as a guide to writing good C.

Chris M. Thomasson

unread,
Jun 16, 2020, 7:11:42 PM6/16/20
to
Touche! So, basically, I need to use placement new and explicit dtor via
calling the dtor ~Type(). Then it becomes a "real object" in C++. I
thought that POD would be different in a sense.

Chris Vine

unread,
Jun 16, 2020, 7:23:57 PM6/16/20
to
Yes, although you don't actually need to call the destructor because
your types are trivial.

But there are other issues with your code which I have alluded to in
other posts. I should use 'new unsigned char[...]' instead of
std::malloc, placement new your 'header' struct into it and (as you do
at present) cast the buffer part to char* if you really want char*
instead of unsigned char* for the buffer. As I have indicated in those
posts I think that cast is valid but you never know with C++17/20: if
the committee don't understand the rules who are we to say.

As I have also mentioned in other posts I agree with your sentiments
about PODs/trivial types. That seems to me to be another fail in the
standard. It ought to be valid in my view, but it isn't.

Juha Nieminen

unread,
Jun 17, 2020, 2:36:34 AM6/17/20
to
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> Technically, constructing the 'header' object in the malloc'ed buffer
> is also reputed to be undefined behaviour if you do it otherwise that
> through placement new, even though 'header' is a trivial type.

Thinking about it, it might actually have merit to worry about such things
being UB, no matter how "technically" and how obscure the rule may be.

One could easily just think like "who cares if it's "technically" UB?
There's no practical implementation where it would cause anything else
than intended behavior."

The problem is, UB allows the compiler to do whatever it wants. Including
not doing what the programmer "intended" for it to do. Tehcnically speaking
if the compiler detects UB, it's allowed to think "this is UB, I don't need
to do anything here, I'll just skip the whole thing and optimize it all
away". Suddenly you might find yourself with an incredibly obscure
"compiler bug" where the compiler isn't generating the code you wrote...
when in fact it's not a compiler bug at all.

I remember a particularly nasty bug many years ago in the Linux kernel that
was caused precisely by this kind of thing. That part of the kernel code
was technically UB... and the compiler did whatever it wanted and not the
thing that the code was "intending" it to do (and, IIRC, it optimized that
part away, which caused the intended thing to not happen. I don't remember
now the exact reason, but might have had something to do with deliberately
dererencing a null pointer, which is UB, and which the compiler was
"optimizing" away because the standard allowed it to.)

Juha Nieminen

unread,
Jun 17, 2020, 2:37:49 AM6/17/20
to
Scott Newman <sco...@gmail.com> wrote:
> Don't care for the spec. It works with any compiler and
> it will work with any compiler that will ever exist.

The problem is that if you trigger UB, the compiler is allowed to do
whatever it wants with it. Including not doing what you want.

Scott Newman

unread,
Jun 17, 2020, 2:42:16 AM6/17/20
to
>> Don't care for the spec. It works with any compiler and
>> it will work with any compiler that will ever exist.

> The problem is that if you trigger UB, the compiler is allowed to
> do whatever it wants with it. Including not doing what you want.

Not in this case.

Juha Nieminen

unread,
Jun 17, 2020, 2:47:33 AM6/17/20
to
What do you mean? The compiler is *always* allowed to do whatever it wants
if something is UB.

Scott Newman

unread,
Jun 17, 2020, 3:06:14 AM6/17/20
to
>> Not in this case.

> What do you mean? The compiler is *always* allowed to do whatever it wants
> if something is UB.

There's no UB with what Chris does initially.

Chris Vine

unread,
Jun 17, 2020, 5:49:31 AM6/17/20
to
Yes I agree that is the problem with undefined behaviour. Which I
suppose is one reason why the standard committee should be careful
before spraying it over the standard and undermining programmers' well
established past practices.

Manfred

unread,
Jun 17, 2020, 9:27:39 AM6/17/20
to
In C yes, in C++ a type cast is required. A static_cast would suffice.

This is why in my other posts I wonder why such a type cast could not be
enough (at least for PODs) to overcome the issue about object creation
and lifetime that is described in the proposal.
Instead of going down some contrived path like requiring malloc to "look
into the future" and such.

Manfred

unread,
Jun 17, 2020, 9:35:18 AM6/17/20
to
If you look at the quote from Bjarne's "The C++ Programming Language"
that I posted earlier you see that he himself does not require using
placement new for malloc on a type with no constructor like this.
Apparently, for some reason, the standard landed somewhere else.

bol...@nowhere.co.uk

unread,
Jun 17, 2020, 11:20:45 AM6/17/20
to
A static_cast is just a more verbose C cast and the two are interchangable.

Manfred

unread,
Jun 17, 2020, 11:41:45 AM6/17/20
to
On 6/17/2020 5:20 PM, bol...@nowhere.co.uk wrote:
> On Wed, 17 Jun 2020 15:27:26 +0200
> Manfred <non...@add.invalid> wrote:
>> On 6/17/2020 12:39 AM, Chris M. Thomasson wrote:
>>> On 6/16/2020 3:33 PM, Chris M. Thomasson wrote:
>>>> On 6/16/2020 8:29 AM, Paavo Helde wrote:
>>>>> 16.06.2020 15:26 Manfred kirjutas:
>>>>>
>>>>>> Cast operators are explicitly meant to tell the compiler: look, this
>>>>>> pointer, irrespective of where it comes from, is a pointer to T,
>>>>>> deal with it.
>>>>>
>>>>> The proposal to legislate malloc and friends for object creation
>>>>> "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html"
>> contains
>>>>> explicit remark, stressing the opposite:
>>>>>
>>>>> "Note that a pointer reinterpret_cast is not considered sufficient to
>>>>> trigger implicit object creation."
>>>>>
>>>>> I gather all this fuss is about allowing type-based alias analysis.
>>>>> Reinterpret_cast works directly against this idea.
>>>>
>>>> Wrt the paper, a natural C program example:
>>>>
>>>> struct X { int a, b; };
>>>> X *make_x() {
>>>>    X *p = (X*)malloc(sizeof(struct X));
>>>>    p->a = 1;
>>>>    p->b = 2;
>>>>    return p;
>>>> }
>>>>
>>>> Would would one need to cast the return value of malloc to (X*)?
>>>
>>> Why would one need to cast the return value? void* can do it as is,
>>> right? In pure C...
>>
>> In C yes, in C++ a type cast is required. A static_cast would suffice.
>
> A static_cast is just a more verbose C cast and the two are interchangable.
>

No, they are not.

https://en.cppreference.com/w/cpp/language/expressions#Conversions

bol...@nowhere.co.uk

unread,
Jun 17, 2020, 11:46:10 AM6/17/20
to
On Wed, 17 Jun 2020 17:41:33 +0200
Its close enough as substitute in 99% of circumstances. Though personally
I prefer a C cast, C++ has enough verbiage as it is.

Manfred

unread,
Jun 17, 2020, 1:13:42 PM6/17/20
to
On 6/17/2020 5:46 PM, bol...@nowhere.co.uk wrote:
> On Wed, 17 Jun 2020 17:41:33 +0200
> Manfred <non...@add.invalid> wrote:
>> On 6/17/2020 5:20 PM, bol...@nowhere.co.uk wrote:
>>> On Wed, 17 Jun 2020 15:27:26 +0200
>>> Manfred <non...@add.invalid> wrote:
>>>> On 6/17/2020 12:39 AM, Chris M. Thomasson wrote:
>>>>> On 6/16/2020 3:33 PM, Chris M. Thomasson wrote:
>>>>>> On 6/16/2020 8:29 AM, Paavo Helde wrote:
>>>>>>> 16.06.2020 15:26 Manfred kirjutas:
>>>>>>>
>>>>>>>> Cast operators are explicitly meant to tell the compiler: look, this
>>>>>>>> pointer, irrespective of where it comes from, is a pointer to T,
>>>>>>>> deal with it.
>>>>>>>
>>>>>>> The proposal to legislate malloc and friends for object creation
>>>>>>> "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html"
>>>> contains
>>>>>>> explicit remark, stressing the opposite:
>>>>>>>
>>>>>>> "Note that a pointer reinterpret_cast is not considered sufficient to
>>>>>>> trigger implicit object creation."
>>>>>>>
>>>>>>> I gather all this fuss is about allowing type-based alias analysis.
>>>>>>> Reinterpret_cast works directly against this idea.
>>>>>>
>>>>>> Wrt the paper, a natural C program example:
>>>>>>
>>>>>> struct X { int a, b; };
>>>>>> X *make_x() {
>>>>>>    X *p = (X*)malloc(sizeof(struct X));
>>>>>>    p->a = 1;
>>>>>>    p->b = 2;
>>>>>>    return p;
>>>>>> }
>>>>>>
>>>>>> Would would one need to cast the return value of malloc to (X*)?
>>>>>
>>>>> Why would one need to cast the return value? void* can do it as is,
>>>>> right? In pure C...
>>>>
>>>> In C yes, in C++ a type cast is required. A static_cast would suffice.
>>>
>>> A static_cast is just a more verbose C cast and the two are interchangable.
>>>
>>
>> No, they are not.
>>
>> https://en.cppreference.com/w/cpp/language/expressions#Conversions
>
> Its close enough as substitute in 99% of circumstances. Though personally
> I prefer a C cast, C++ has enough verbiage as it is.
>

Read more carefully:
A C-style cast can perform a reinterpret_cast (possibly even combined
with a const_cast). This is a major difference with static_cast.

Moreover, verbosity has been widely publicized as intentional for cast
operators, for obvious reasons.

Chris Vine

unread,
Jun 17, 2020, 1:32:14 PM6/17/20
to
On Wed, 17 Jun 2020 15:35:06 +0200
Manfred <non...@add.invalid> wrote:
[snip]
> If you look at the quote from Bjarne's "The C++ Programming Language"
> that I posted earlier you see that he himself does not require using
> placement new for malloc on a type with no constructor like this.
> Apparently, for some reason, the standard landed somewhere else.

He even gives an example (4th edition) of the dangers of realloc, while
constructing trivial types in malloc'ed memory as part of the
explanation, on the basis that using malloc for trivial types is fine.

The fact is that until C++17's new object lifetime rules everyone
believed this usage was OK with trivial types, and it was commonplace to
static_cast malloc'ed memory to a pointer to such types and manipulate
them. The first time I heard that this was now thought to be incorrect
was about 2 years ago. There will be shed-loads of code out there
which does this. Similarly the new requirement in C++17 for
std::launder as a supplement to reinterpret_cast and the strict
aliasing rules for some cases.

The standard committee probably managed to persuade themselves that
pre-C++17 usage was undefined in some way (you can pretty much persuade
yourself of anything in the C++ standard if you try hard enough).
But instead of supporting ancient usage they decided to deliberately
break it.

I really can't respect that kind of thing, and it has certainly
affected my approach when considering programming languages. I heard
tell that Stroustrup was opposed to the object lifetime changes in
C++17: if so, he lost the argument.

James Kuyper

unread,
Jun 17, 2020, 4:05:46 PM6/17/20
to
On Wednesday, June 17, 2020 at 11:46:10 AM UTC-4, bol...@nowhere.co.uk wrote:
> On Wed, 17 Jun 2020 17:41:33 +0200
> Manfred <non...@add.invalid> wrote:
> >On 6/17/2020 5:20 PM, bol...@nowhere.co.uk wrote:
...
> >> A static_cast is just a more verbose C cast and the two are interchangable.
> >>
> >
> >No, they are not.
> >
> >https://en.cppreference.com/w/cpp/language/expressions#Conversions
>
> Its close enough as substitute in 99% of circumstances. Though personally
> I prefer a C cast, C++ has enough verbiage as it is.

No, they are not interchangeable - they're not even close to being 99%
interchangeable.

"The conversions performed by
(4.1) — a const_cast (8.2.11),
(4.2) — a static_cast (8.2.9),
(4.3) — a static_cast followed by a const_cast,
(4.4) — a reinterpret_cast (8.2.10), or
(4.5) — a reinterpret_cast followed by a const_cast,
can be performed using the cast notation of explicit type conversion.
(4.6) — a pointer to an object of derived class type or an lvalue or
rvalue of derived class type may be explicitly converted to a pointer or
reference to an unambiguous base class type, respectively;
(4.7) — a pointer to member of derived class type may be explicitly
converted to a pointer to member of an unambiguous non-virtual base
class type;
(4.8) — a pointer to an object of an unambiguous non-virtual base class
type, a glvalue of an unambiguous non-virtual base class type, or a
pointer to member of an unambiguous non-virtual base class type may be
explicitly converted to a pointer, a reference, or a pointer to member of a derived class type, respectively." (8.4)

Therefore, if you run into a C-style cast, there's 7 different types of
conversions it might be performing that cannot be performed by a
static_cast<> alone. I've no hard figures on this, but I suspect that
the number of C-style casts that perform one of those 7 other
conversions is a lot higher than 1%, and those casts cannot be replaced
by a static_cast<>.

Replacing a working static_cast<> with a C-style cast always results in
working code, but is a bad idea. However, your comment about preferring
C-style casts suggests that you might not agree. I'll explain why it is
for the benefit of other people - feel free to stop reading here.

Stroustruap split the C style cast into multiple different named casts
precisely because accidentally using a C style cast to perform a
conversion more dangerous than the one you thought you were performing
was one of the most common mistakes made by C programmers. Every
conversion that can be performed by a C-style cast that cannot be done
by a static_cast<> is a more dangerous conversion than a static_cast<>.
Stroustrup's idea was to have multiple different named casts, each of
which could only do some of the things that a C-style cast could do, so
you would only get one of the more dangerous conversions if you
explicitly requested it. It is an error requiring a diagnostic to try to
use one of those casts to perform a conversion that it couldn't do.
Those diagnostics are the key benefit that comes proper use of the named
casts.

Juha Nieminen

unread,
Jun 18, 2020, 2:52:05 AM6/18/20
to
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> Yes I agree that is the problem with undefined behaviour. Which I
> suppose is one reason why the standard committee should be careful
> before spraying it over the standard and undermining programmers' well
> established past practices.

Maybe they should use two categories of UB, one that means "how this should
be compiled is completely undefined and therefore the compiler can do
whatever it wants", and another that means "the compiler must do exactly
what the code is telling to, even if the results are undefined and most
probably incorrect".

Dereferencing a null pointer would be of the latter category: Sure, it's
"undefined behavior", but the compiler would still have to do exactly as
told: Read what's at address 0 (or whatever a "null pointer" points to).
Don't optimize it away or do anything else.

Much of the leeway that the C++ standard gives compilers comes from the
same leeway that the C standard gives compilers, which originates from
the principle that the languages should be as portable as possible, even
to extremely esoteric hypothetical computers that work completely
differently from your common computer.

This principle might have been sound in the late 70's and early 80's,
when there was a huge variety of computer architectures and it was a
mystery which direction computer architectures would go to. Nowadays,
however, there's very little need to have theoretical support for
hypothetical esoteric architectures that don't exist and are unlikely
to ever exist.

(I think the standardization committee is slowly moving towards narrowing
these things. I think there has been discussion that 2's complement
arithmetic should become mandated by the standard, because there's no
practical need to keep supporting anything else.)

Chris Vine

unread,
Jun 18, 2020, 7:09:12 AM6/18/20
to
That's an interesting idea. However the compiler writers seem to have
significant input into the C++ standard (I think that is where the
rather odd requirements concerning the std::launder optimization
barrier came from) and I doubt you would get these ideas past their
desire for optimization opportunities.

I think my general view is that if you want to write close-to-the-metal
code these days you are better doing it in C rather than C++. There is
no way in the world that the C committee would break long-standing
practices with changes like the malloc/trivial type changes we have been
discussing with C++17. With C++, you have the worry that the committee
is going to saw your low-level code off at the knees again when the next
standard comes out.

And for higher level stuff where maximum efficiency is not needed,
garbage collected languages can do the business. Since nearly all such
languages have C FFIs, the sweet spot these days for some cases can be
C for any code which needs to have maximum efficiency and/or interact
with the hardware, with a garbage collected language on top of that for
the rest. However I imagine lots of people will disagree with that:
garbage collection does have disadvantages in terms of memory
requirements and latency and you have to be disciplined when using the
FFIs for such languages if interacting with malloc'ed memory at the C
end. Possibly C for the low-level stuff and C++ for the higher level
stuff may be an alternative choice for those who don't want garbage
collection.

Juha Nieminen

unread,
Jun 18, 2020, 7:35:00 AM6/18/20
to
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> I think my general view is that if you want to write close-to-the-metal
> code these days you are better doing it in C rather than C++. There is
> no way in the world that the C committee would break long-standing
> practices with changes like the malloc/trivial type changes we have been
> discussing with C++17. With C++, you have the worry that the committee
> is going to saw your low-level code off at the knees again when the next
> standard comes out.
>
> And for higher level stuff where maximum efficiency is not needed,
> garbage collected languages can do the business. Since nearly all such
> languages have C FFIs, the sweet spot these days for some cases can be
> C for any code which needs to have maximum efficiency and/or interact
> with the hardware, with a garbage collected language on top of that for
> the rest.

I don't agree. C is too simplistic for large complex projects (mainly
due to its lack of RAII), and C code tends to be really complex,
hard-to-read and error-prone (no matter how much C programmers claim
otherwise).

Especially nowadays that data-oriented-programming (and design principles
closely related to it) is becoming more and more popular, which sees
programs becoming significantly faster and more efficient at number-crunching,
it's important to have a language that both allows for a very low-level
approach at handling and manipulating data, and at the same time provides
a way to make your code more abstract, safer and easier to use.

Bo Persson

unread,
Jun 18, 2020, 7:41:34 AM6/18/20
to
On 2020-06-18 at 08:51, Juha Nieminen wrote:
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>> Yes I agree that is the problem with undefined behaviour. Which I
>> suppose is one reason why the standard committee should be careful
>> before spraying it over the standard and undermining programmers' well
>> established past practices.
>
> Maybe they should use two categories of UB, one that means "how this should
> be compiled is completely undefined and therefore the compiler can do
> whatever it wants", and another that means "the compiler must do exactly
> what the code is telling to, even if the results are undefined and most
> probably incorrect".
>
> Dereferencing a null pointer would be of the latter category: Sure, it's
> "undefined behavior", but the compiler would still have to do exactly as
> told: Read what's at address 0 (or whatever a "null pointer" points to).
> Don't optimize it away or do anything else.
>
> Much of the leeway that the C++ standard gives compilers comes from the
> same leeway that the C standard gives compilers, which originates from
> the principle that the languages should be as portable as possible, even
> to extremely esoteric hypothetical computers that work completely
> differently from your common computer.

The rules were not formulated for hypothetical computers, but for the
(then) mainstream members of the 68000 family, as well as segmented
memory on Intel 286.

Loading an invalid pointer into an address register would trap at the
hardware level. As would trying to load a segment descriptor for a
deallocated segment.

>
> This principle might have been sound in the late 70's and early 80's,
> when there was a huge variety of computer architectures and it was a
> mystery which direction computer architectures would go to. Nowadays,
> however, there's very little need to have theoretical support for
> hypothetical esoteric architectures that don't exist and are unlikely
> to ever exist.

There could be an opportunity to revise this now, but why would we like
to allow new code to misbehave in ways that were not possible earlier?

>
> (I think the standardization committee is slowly moving towards narrowing
> these things. I think there has been discussion that 2's complement
> arithmetic should become mandated by the standard, because there's no
> practical need to keep supporting anything else.)
>

This is more than discussions, and already part of the upcoming C++20
standard. The compiler writers couldn't name any 1's complement hardware
they intended to support. :-)


Bo Persson

James Kuyper

unread,
Jun 18, 2020, 8:51:58 AM6/18/20
to
No, they aren't. Even though "undefined behavior" is defined as
"behavior for which this international standard imposes no requirements"
(3.27), there is still always the requirement that the implementation
produce the behavior that Scott Newman expects it to produce. What
happens when implementations violate Scott's expectations is unclear -
so far as I know, whatever it is hasn't happened yet - but he's made it
quite clear that it's not allowed.

James Kuyper

unread,
Jun 18, 2020, 8:52:00 AM6/18/20
to
On 6/17/20 5:49 AM, Chris Vine wrote:
> On Wed, 17 Jun 2020 06:36:24 +0000 (UTC)
> Juha Nieminen <nos...@thanks.invalid> wrote:
...
>> I remember a particularly nasty bug many years ago in the Linux kernel that
>> was caused precisely by this kind of thing. That part of the kernel code
>> was technically UB... and the compiler did whatever it wanted and not the
>> thing that the code was "intending" it to do (and, IIRC, it optimized that
>> part away, which caused the intended thing to not happen. I don't remember
>> now the exact reason, but might have had something to do with deliberately
>> dererencing a null pointer, which is UB, and which the compiler was
>> "optimizing" away because the standard allowed it to.)
>
> Yes I agree that is the problem with undefined behaviour. Which I
> suppose is one reason why the standard committee should be careful
> before spraying it over the standard and undermining programmers' well
> established past practices.

The thing is, if those past practices were indeed well-established, that
was a serious problem. The particular case Juha was talking about
involved C code that dereferenced a pointer that might have been null,
and not bothering to test whether or not it was null until afterwards.

Dereferencing a pointer has never had defined behavior in any version of
the C standard, not even K&R C (though particular implementations of C
have sometimes defined it). In this particular case, the behavior
defined by the implementation for such a dereference was that it would
cause that pointer to be treated as if it was guaranteed to be non-null
until such time as it was next changed. This allowed the compiler to
speed up the program skipping tests of whether or not the pointer was
null, not even generating any code for the branch that was skipped. That
was optional behavior for that compiler, not the default behavior, and
from what I've heard, the developers had deliberately turned on that option.

They had the false idea that the hardware defined what the behavior
would be, and didn't bother to check. It's always the implementation
that defines the behavior, not the hardware. Persistent rumor to the
contrary notwithstanding, C is not a "portable assembly language". An
implementation is under no obligation to generate the same machine code
that you naively expect it to, so long as the code it actually generates
has the required observable behavior - and there is by definition no
required observable behavior when the code has UB.

Good general rule: don't write code with behavior that the relevant
standard fails to define, unless something else defines the behavior.
And if something else does define the behavior, check to make sure that
the definition it provides is what you want it to be. It's entirely your
own fault if something goes wrong due to the implementation producing
exactly the behavior it documents.

Chris Vine

unread,
Jun 18, 2020, 9:40:13 AM6/18/20
to
On Wed, 17 Jun 2020 09:53:31 -0400
James Kuyper <james...@alumni.caltech.edu> wrote:
> On 6/17/20 5:49 AM, Chris Vine wrote:
> > On Wed, 17 Jun 2020 06:36:24 +0000 (UTC)
> > Juha Nieminen <nos...@thanks.invalid> wrote:
> ...
> >> I remember a particularly nasty bug many years ago in the Linux kernel that
> >> was caused precisely by this kind of thing. That part of the kernel code
> >> was technically UB... and the compiler did whatever it wanted and not the
> >> thing that the code was "intending" it to do (and, IIRC, it optimized that
> >> part away, which caused the intended thing to not happen. I don't remember
> >> now the exact reason, but might have had something to do with deliberately
> >> dererencing a null pointer, which is UB, and which the compiler was
> >> "optimizing" away because the standard allowed it to.)
> >
> > Yes I agree that is the problem with undefined behaviour. Which I
> > suppose is one reason why the standard committee should be careful
> > before spraying it over the standard and undermining programmers' well
> > established past practices.
>
> The thing is, if those past practices were indeed well-established, that
> was a serious problem. The particular case Juha was talking about
> involved C code that dereferenced a pointer that might have been null,
> and not bothering to test whether or not it was null until afterwards.
[snip]

My reference to past practices wasn't about derefencing a null pointer,
and you have (I hope unintentionally) snipped the posting to look as if
it was. The particular context of my reference was about constructing a
trivial type in malloc'ed memory without placement new. Prior to C++17
that was generally thought to be OK and appears in Stroustrup's TC++PL
fourth edition.

The point with which I was agreeing, was that compilers have liberty
to, for example, optimize out undefined behaviour instead of, in the
case of a null pointer dereference, immediately crashing with a segfault
(when crashing would be kinder). They do indeed have that liberty.
Possibly all undefined behaviour is the programmer's fault. Possibly it
isn't - I would suggest that it isn't the programmer's fault that the
functions in §23.10.10.2 to §23.10.10.6 of C++17 and the equivalents in
C++20 all have undefined behaviour, not that further undefined
behaviour arises if you attempt to access other than the first item of
the returned collection.

Either way, Juha's point was that it would be nicer to do literally what
the programmer says in a case of undefined behaviour. It would be
nicer, but as I subsequently said such a rule is unlikely to happen.
It is unlikely to happen because compiler writers like to optimize and
more to the point programmers are programming to a virtual machine and
not to a hardware specification.

bol...@nowhere.co.uk

unread,
Jun 18, 2020, 11:20:09 AM6/18/20
to
On Thu, 18 Jun 2020 11:34:47 +0000 (UTC)
Juha Nieminen <nos...@thanks.invalid> wrote:
>Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>> I think my general view is that if you want to write close-to-the-metal
>> code these days you are better doing it in C rather than C++. There is
>> no way in the world that the C committee would break long-standing
>> practices with changes like the malloc/trivial type changes we have been
>> discussing with C++17. With C++, you have the worry that the committee
>> is going to saw your low-level code off at the knees again when the next
>> standard comes out.
>>
>> And for higher level stuff where maximum efficiency is not needed,
>> garbage collected languages can do the business. Since nearly all such
>> languages have C FFIs, the sweet spot these days for some cases can be
>> C for any code which needs to have maximum efficiency and/or interact
>> with the hardware, with a garbage collected language on top of that for
>> the rest.
>
>I don't agree. C is too simplistic for large complex projects (mainly

Linus Torvalds would disagree with you. And so would the kernel teams at
Microsoft and Apple.

>due to its lack of RAII), and C code tends to be really complex,
>hard-to-read and error-prone (no matter how much C programmers claim
>otherwise).

Error prone, yes and no depending on the developer. Hard to read? Compared to
modern C++ its simple and elegant.

James Kuyper

unread,
Jun 18, 2020, 11:20:38 AM6/18/20
to
On 6/18/20 2:51 AM, Juha Nieminen wrote:
...
> Maybe they should use two categories of UB, one that means "how this should
> be compiled is completely undefined and therefore the compiler can do
> whatever it wants", and another that means "the compiler must do exactly
> what the code is telling to, even if the results are undefined and most
> probably incorrect".

There's a fundamental problem with that - in order for any such
specification to make sense, either the standard or the implementation
needs to define what "the code is telling [the compiler] to [do]" means.
If something does define that, then the behavior is no longer undefined.
It's standard-defined or implementation-defined, respectively.

> Dereferencing a null pointer would be of the latter category: Sure, it's
> "undefined behavior", but the compiler would still have to do exactly as
> told: Read what's at address 0 (or whatever a "null pointer" points to).
> Don't optimize it away or do anything else.

That's a good example of what I'm talking about - the standard does NOT
currently say that dereferencing a null pointer causes it to "read
what's at address 0 (or whatever a "null pointer" points to". What it
actually says is much more abstract:

"... the null pointer value of that type ... is distinguishable from
every other value of object pointer or function pointer type." (7.11p1)

Section 8.3.1p1 describes the semantics of dereferencing a pointer: "the
result is an lvalue referring to the object or function to which the
expression points.". Since a null pointer cannot point at any function
or object, neither of those cases apply, and the behavior of
dereferencing a null pointer is undefined because "... this
International Standard omits any explicit definition of behavior ..."
(3.27).

This means that the null pointer of a given type could be something
quite distinct from any other pointer to that type - it could, for
instance, refer to the address 0xFFFFFFFFFFFFFFFF on a machine where
valid address are required to be less than 0x8000000000000000 - there
isn't any actual memory at that location. However, on many (most?)
implementations, the null pointer of any given type does point at a
specific address in memory, and dereferencing a null pointer has the
effect of treating the memory in that location as though it contained an
object or function of that type. This is permitted, since the behavior
of such code is undefined, but only if the C++ implementation makes sure
that any object or function whose address can be obtained by well-formed
code must be distinguishable from the corresponding null pointer.
It might, for instance, reserve that address for some other use that is
hidden to user code with defined behavior.

...
> This principle might have been sound in the late 70's and early 80's,
> when there was a huge variety of computer architectures and it was a
> mystery which direction computer architectures would go to. Nowadays,
> however, there's very little need to have theoretical support for
> hypothetical esoteric architectures that don't exist and are unlikely
> to ever exist.

I think the one of the LEAST likely possibilities is that all hardware
in the future will always have an architecture fully compatible with
that of our current machines. One of the committees explicit goals is to
avoid unnecessarily restricting the range of platforms where it's
possible to create a efficient fully-conforming implementation of C++.

James Kuyper

unread,
Jun 18, 2020, 12:07:18 PM6/18/20
to
I kept everything you wrote, and the entire paragraph written by Juha
that immediately preceded it. If that wasn't sufficient context to
clarify what you were referring to, you should have been more specific.
I certainly had no idea that you were referring to anything that wasn't
mentioned in Juha's paragraph that I quoted.

> ... The particular context of my reference was about constructing a
> trivial type in malloc'ed memory without placement new.
Prior to C++17
> that was generally thought to be OK and appears in Stroustrup's TC++PL
> fourth edition.
>
> The point with which I was agreeing, was that compilers have liberty
> to, for example, optimize out undefined behaviour instead of, in the
> case of a null pointer dereference, immediately crashing with a segfault
> (when crashing would be kinder). They do indeed have that liberty.
> Possibly all undefined behaviour is the programmer's fault. Possibly it
> isn't - I would suggest that it isn't the programmer's fault that the
> functions in �23.10.10.2 to �23.10.10.6 of C++17 and the equivalents in
> C++20 all have undefined behaviour, not that further undefined
> behaviour arises if you attempt to access other than the first item of
> the returned collection.
>
> Either way, Juha's point was that it would be nicer to do literally what
> the programmer says in a case of undefined behaviour.

"What the programmers says" when the programmer writes some C++ code is
what the C++ standard says that the code means. When the behavior is
undefined, the standard explicitly fails to say what the code means, so
"literally what the programmer says" becomes meaningless.

For example, consider E1 << E2 for a case where the promoted operands
both have signed integral or unscoped enumeration types. In that case,
no operator overloads apply, just the following paragraphs from section 8.8:

"... The behavior is undefined if the right operand is
negative, or greater than or equal to the length in bits of the promoted
left operand."
"The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are zero-filled. ... If E1 has a signed type and non-negative value, and
E1 × 2 E2 is representable in the corresponding unsigned type of the
result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined."

Note: in the actual standard, "2 E2" is actually a 2 followed by a
superscript E2, formatting that doesn't survive cut-and-paste into my
newsreader.

The behavior of E1 << E2 is NOT defined as using the target platform's
native shift instruction (if it even has one for the appropriate integer
type). Only the required behavior is specified, not how that behavior is
to be achieved, and that behavior is not specified when

* E2 is negative
* E2 is greater than or equal to the length in bits
* E1 is negative
* E1*2^E2 isn't representable in the corresponding unsigned type.

Therefore, despite claims that some people have made to the contrary,
you cannot deduce that an implementation should produce the same results
in any of the undefined cases that would have occurred if it had used
that instruction.

> ... It would be
> nicer, but as I subsequently said such a rule is unlikely to happen.
> It is unlikely to happen because compiler writers like to optimize

Compiler writers like to optimize because many compiler users like their
code optimized. You might not like any particular optimization, but I
guarantee you that if it is supported, someone does like it. If it's
optional (as many are), you can also deduce that there's someone who
doesn't like it.

> and
> more to the point programmers are programming to a virtual machine and
> not to a hardware specification.

I agree - that is very much "more to point".


Chris Vine

unread,
Jun 18, 2020, 1:04:39 PM6/18/20
to
On Thu, 18 Jun 2020 12:07:04 -0400
You should have had more than an idea.

You cut out this of mine:

'Technically, constructing the 'header' object in the malloc'ed buffer
is also reputed to be undefined behaviour if you do it otherwise that
through placement new, even though 'header' is a trivial type.'

You cut out this from Juha:

'Thinking about it, it might actually have merit to worry about such
things being UB, no matter how "technically" and how obscure the
rule may be.

One could easily just think like "who cares if it's "technically" UB?
There's no practical implementation where it would cause anything
else than intended behavior."

The problem is, UB allows the compiler to do whatever it wants.
Including not doing what the programmer "intended" for it to do.
Tehcnically speaking if the compiler detects UB, it's allowed to
think "this is UB, I don't need to do anything here, I'll just skip
the whole thing and optimize it all away". Suddenly you might find
yourself with an incredibly obscure "compiler bug" where the compiler
isn't generating the code you wrote... when in fact it's not a
compiler bug at all'

which is exactly what I was referring to when I said I agree that is
"the problem with undefined behaviour". As a matter of comprehension,
my reference to "the problem with undefined behaviour" was clearly not
concerned only with his single paragraph of corroborative detail
concerning a null dereference in the linux kernel.

Your explanation is implausible.

> > ... The particular context of my reference was about constructing a
> > trivial type in malloc'ed memory without placement new.
> Prior to C++17
> > that was generally thought to be OK and appears in Stroustrup's TC++PL
> > fourth edition.
> >
> > The point with which I was agreeing, was that compilers have liberty
> > to, for example, optimize out undefined behaviour instead of, in the
> > case of a null pointer dereference, immediately crashing with a segfault
> > (when crashing would be kinder). They do indeed have that liberty.
> > Possibly all undefined behaviour is the programmer's fault. Possibly it
> > isn't - I would suggest that it isn't the programmer's fault that the
> > functions in §23.10.10.2 to §23.10.10.6 of C++17 and the equivalents in
> > C++20 all have undefined behaviour, not that further undefined
> > behaviour arises if you attempt to access other than the first item of
> > the returned collection.
> >
> > Either way, Juha's point was that it would be nicer to do literally what
> > the programmer says in a case of undefined behaviour.
>
> "What the programmers says" when the programmer writes some C++ code is
> what the C++ standard says that the code means. When the behavior is
> undefined, the standard explicitly fails to say what the code means, so
> "literally what the programmer says" becomes meaningless.
[more of same snipped]

I know all that. I was explicitly commenting on his identification of
"the problem with undefined behaviour". I was not commenting on Juha's
proposal for what to do about it. I have already told you I don't
think that would work.

By the way, your time stamps seem to be wrong. Do you want to check
that you have the right time zone set?

Chris Vine

unread,
Jun 18, 2020, 1:20:21 PM6/18/20
to
On Thu, 18 Jun 2020 18:05:04 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
[snip]
> I know all that. I was explicitly commenting on his identification of
> "the problem with undefined behaviour". I was not commenting on Juha's
> proposal for what to do about it. I have already told you I don't
> think that would work.

In case help with context is needed on this, I hope you will have
deduced that I mean "I was explicitly commenting 'in my posting to Juha
to which you have referred' on his identification ...". I have of
course commented to you on whether his proposal would work or not (and
to him in a different posting): I don't want to start another series of
posts on that aspect which miss a point which would otherwise be
contextually clear.

James Kuyper

unread,
Jun 18, 2020, 1:55:30 PM6/18/20
to
On 6/18/20 1:05 PM, Chris Vine wrote:
> On Thu, 18 Jun 2020 12:07:04 -0400
> James Kuyper <james...@alumni.caltech.edu> wrote:
...
> By the way, your time stamps seem to be wrong. Do you want to check
> that you have the right time zone set?

The time stamp in the message you were responding to was

Date: Thu, 18 Jun 2020 12:07:04 -0400

I've got my system set to automatically set the date and time based upon
an external time source - I'm not sure which external time source it's
using - it should be configurable, but I haven't found that
configuration option.
I have the time zone set to EDT, and it doesn't change when I tell it to
use an automatically assigned time zone. When I did a Google for "time
zone", at the top of the page Google identified EDT as the correct time
zone for my area (which it identifies down to the city level by name).
It also says that EDT is GMT-4, which matches the -0400 in my timestamp.
The time of 12:07:04 for that message is consistent with what I remember
about the time when I posted that message.
I'm curious - what specific feature of my time stamp made you think it
might be wrong?

What I see on <https://www.time.gov/> is consistent with what my
computer says. Shortly before sending this message, that site said that
the current time is 1:55 PM.

Chris Vine

unread,
Jun 18, 2020, 2:13:37 PM6/18/20
to
On Thu, 18 Jun 2020 13:55:19 -0400
On looking further at the timestamps, they are OK but your posting of
Wed, 17 Jun 2020 09:53:31 -0400 was not injected into the network
until Thu, 18 Jun 2020 12:51:53 -0000, nearly 23 hours later. So I
didn't reply before you sent it as I thought, but nearly 24 hours after
you sent it. You may have a problem with your posting server.

James Kuyper

unread,
Jun 18, 2020, 3:45:23 PM6/18/20
to
On 6/18/20 2:14 PM, Chris Vine wrote:
> On Thu, 18 Jun 2020 13:55:19 -0400
> James Kuyper <james...@alumni.caltech.edu> wrote:
>> On 6/18/20 1:05 PM, Chris Vine wrote:
>>> On Thu, 18 Jun 2020 12:07:04 -0400
>>> James Kuyper <james...@alumni.caltech.edu> wrote:
>> ...
>>> By the way, your time stamps seem to be wrong. Do you want to check
>>> that you have the right time zone set?
...
>> What I see on <https://www.time.gov/> is consistent with what my
>> computer says. Shortly before sending this message, that site said that
>> the current time is 1:55 PM.
>
> On looking further at the timestamps, they are OK but your posting of
> Wed, 17 Jun 2020 09:53:31 -0400 was not injected into the network
> until Thu, 18 Jun 2020 12:51:53 -0000, nearly 23 hours later. So I
> didn't reply before you sent it as I thought, but nearly 24 hours after
> you sent it. You may have a problem with your posting server.

Oh yes, now that I know what you're talking I understand part of what's
going on. I tried to post two different messages yesterday, both of
which resulted in messages saying that there was an "NNTP error" of some
kind. I told Thunderbird to send them later, and it periodically tried
and failed to send those messages. It didn't actually succeed until
nearly a day after the problem first happened. What I don't know is why
sending of those messages failed.

Chris M. Thomasson

unread,
Jun 18, 2020, 4:10:51 PM6/18/20
to
On 6/16/2020 4:24 PM, Chris Vine wrote:
> On Tue, 16 Jun 2020 16:11:31 -0700
> "Chris M. Thomasson" <chris.m.t...@gmail.com> wrote:
>> So, basically, I need to use placement new and explicit dtor via
>> calling the dtor ~Type(). Then it becomes a "real object" in C++. I
>> thought that POD would be different in a sense.
>
> Yes, although you don't actually need to call the destructor because
> your types are trivial.

Okay. For some reason I was thinking if a ctor is called, then a dtor
must be called, or the object is still considered in a constructed
state? Was wondering about UB if a dtor is not called.


> But there are other issues with your code which I have alluded to in
> other posts. I should use 'new unsigned char[...]' instead of
> std::malloc, placement new your 'header' struct into it and (as you do
> at present) cast the buffer part to char* if you really want char*
> instead of unsigned char* for the buffer.

Okay.


> As I have indicated in those
> posts I think that cast is valid but you never know with C++17/20: if
> the committee don't understand the rules who are we to say.

Damn. Just a little rant, joking in a sense: I wonder if they consider
an unsigned char an object that needs to have its ctor called?

So I create an array of unsigned char from std::malloc, does each
element needs its ctor called? The array form of placement new?


> As I have also mentioned in other posts I agree with your sentiments
> about PODs/trivial types. That seems to me to be another fail in the
> standard. It ought to be valid in my view, but it isn't.

Imvvho, the POD case should be special in a sense. Just like C.

Chris Vine

unread,
Jun 18, 2020, 5:12:02 PM6/18/20
to
On Thu, 18 Jun 2020 13:10:37 -0700
"Chris M. Thomasson" <chris.m.t...@gmail.com> wrote:
> On 6/16/2020 4:24 PM, Chris Vine wrote:
> > On Tue, 16 Jun 2020 16:11:31 -0700
> > "Chris M. Thomasson" <chris.m.t...@gmail.com> wrote:
> >> So, basically, I need to use placement new and explicit dtor via
> >> calling the dtor ~Type(). Then it becomes a "real object" in C++. I
> >> thought that POD would be different in a sense.
> >
> > Yes, although you don't actually need to call the destructor because
> > your types are trivial.
>
> Okay. For some reason I was thinking if a ctor is called, then a dtor
> must be called, or the object is still considered in a constructed
> state? Was wondering about UB if a dtor is not called.

No, the destructor of a trivial type does not need to have its
(non-existent) destructor called. Its lifetime ends when its storage's
lifetime ends or the storage is reused, by rule.

> > But there are other issues with your code which I have alluded to in
> > other posts. I should use 'new unsigned char[...]' instead of
> > std::malloc, placement new your 'header' struct into it and (as you do
> > at present) cast the buffer part to char* if you really want char*
> > instead of unsigned char* for the buffer.
>
> Okay.
>
> > As I have indicated in those
> > posts I think that cast is valid but you never know with C++17/20: if
> > the committee don't understand the rules who are we to say.
>
> Damn. Just a little rant, joking in a sense: I wonder if they consider
> an unsigned char an object that needs to have its ctor called?
>
> So I create an array of unsigned char from std::malloc, does each
> element needs its ctor called? The array form of placement new?

The point here is that std::malloc does not of itself create anything
you can lawfully iterate over (ridiculous I know, but there we are). If
you use malloc to provide storage for an array you have to use array
placement new to establish the array of type in that memory, whether
trivial (such as your array of char) or non-trivial, post C++17.
Without that you cannot iterate over it lawfully, let alone dereference.

There is however a significant problem with this requirement relating
to array placement new. (Read with P0593 you could barely make it up:
readers may think I am being hard on the curators of the C++ standard
with my various posts on its inadequacies and breakages, but I am not.)
This problem is that the new[] expression is allowed by the standard to
request additional memory of operator new[] than the size of the array
constructed in it in order to accomodate a cookie, say for storing the
array size so that the correct number of destructors can be called by
any subsequent delete[] expression. The delete[] expression is never
applied to objects alloced by placement new - you call destructors by
hand - but the additional storage for cookies is still required. With
g++ under the Itanium ABI this cookie is the size of size_t (8 on
64bit, 4 on 32bit) for arrays of non-trivial types, and 0 for arrays of
trivial types. So with arrays of non-trivial types you need a formally
unknowable (but happily ABI defined though not implementation defined)
additional storage for the new[] expression, including placement new[]
expression. You can see where this is going.

So if you must use array placement new, add sizeof(size_t) for your
buffer size, and for preference (2 x size_t) to take account of possible
padding, and hope for the best.

But you should instead do what I suggested. Either use C (see below)
or allocate your memory as 'new unsigned char[sz]' instead of
'std::malloc(sz)'. Then it _is_ an array at its inception and you can
lawfully iterate over it by char. Also the standard explicitly allows
you to construct other objects in the array of unsigned char using
placement new (but to reference that object through the buffer with
reinterpret_cast you have to use std::launder).

> > As I have also mentioned in other posts I agree with your sentiments
> > about PODs/trivial types. That seems to me to be another fail in the
> > standard. It ought to be valid in my view, but it isn't.
>
> Imvvho, the POD case should be special in a sense. Just like C.

To be honest, maybe the optimal approach is to put your specialist
allocator in a .c file, compile it with gcc, put an extern C guard
around its headers and call it up in your C++ code. It is then
guaranteed by the C standard to work, save for the still unresolved
problems for C++ about non-arrays in P0593 for which no compiler vendor
in the world is going to break your code, because none of their system
libraries would be usable in C++ if they did and they would disappear
as quickly as Ratner's prawn sandwiches.

Chris M. Thomasson

unread,
Jun 18, 2020, 5:26:39 PM6/18/20
to
> constructed in it in order to accomodate a cookie, [...]

Wait a minute. Sorry for the quick response, working on some other
things right now... However, thank you Chris Vine, I now remember that
the array version of new seems to create a header of its own. It knows
how many dtors to call! Way back, many years ago, I stumbled on a
compiler error. I posted about it. Will try to find the post. Thanks
again. Will have more time tonight to give you a proper response.

The error was that MSVC failed to give the original allocation size.

[...]

Chris Vine

unread,
Jun 18, 2020, 5:46:43 PM6/18/20
to
On Thu, 18 Jun 2020 14:26:27 -0700
"Chris M. Thomasson" <chris.m.t...@gmail.com> wrote:
[snip]
> Wait a minute. Sorry for the quick response, working on some other
> things right now... However, thank you Chris Vine, I now remember that
> the array version of new seems to create a header of its own. It knows
> how many dtors to call! Way back, many years ago, I stumbled on a
> compiler error. I posted about it. Will try to find the post. Thanks
> again. Will have more time tonight to give you a proper response.
>
> The error was that MSVC failed to give the original allocation size.

That's a different point. MSVC's version of the delete[] expression
passes operator delete[] the wrong size of memory block to be
released - it doesn't comply with the standard, but they can't fix it
because that would break existing code, more particularly the window
kernel.

Placement operations don't use the delete[] expression so that does not
impinge on the issues I mentioned.

Öö Tiib

unread,
Jun 18, 2020, 7:03:36 PM6/18/20
to
Can you elaborate that question more clearly? I do not understand how
allowing compilers less leeway is equal with allowing new code to
misbehave.

By my naive logic it is directly opposite:
1) Allowing compiler to generate code that does something else instead
of trying to access address null on case of null pointer dereference is
potentially allowing some new black magic code to misbehave in rather
spectacular ways.
2)Forcing it to generate code that attempts to access address null on
case of null pointer dereference is causing the program to segfault
on majority of platforms.

Ralf Goertz

unread,
Jun 19, 2020, 3:20:00 AM6/19/20
to
Am Thu, 18 Jun 2020 15:45:08 -0400
schrieb James Kuyper <james...@alumni.caltech.edu>:


> Oh yes, now that I know what you're talking I understand part of
> what's going on. I tried to post two different messages yesterday,
> both of which resulted in messages saying that there was an "NNTP
> error" of some kind. I told Thunderbird to send them later, and it
> periodically tried and failed to send those messages. It didn't
> actually succeed until nearly a day after the problem first happened.
> What I don't know is why sending of those messages failed.

Speeking of this, might that also be the reason why the paragraph
characters § you quoted from Chris' message ended up as �? That seemed
odd given that the header of your message said "charset=utf-8". Or did
only I see the wrong character? One message downthread that particular
quote was restored correctly.

Jorgen Grahn

unread,
Jun 19, 2020, 3:31:43 AM6/19/20
to
On Thu, 2020-06-18, Chris Vine wrote:
...
> I think my general view is that if you want to write close-to-the-metal
> code these days you are better doing it in C rather than C++. There is
> no way in the world that the C committee would break long-standing
> practices with changes like the malloc/trivial type changes we have been
> discussing with C++17. With C++, you have the worry that the committee
> is going to saw your low-level code off at the knees again when the next
> standard comes out.

Why not just use C++14, then? Compiler writers seem to rarely remove
support for older standards.

The way C++ changes these days bothers me too, but I'm not going to go
back to the Stone Age just to spite the ISO committee.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Juha Nieminen

unread,
Jun 19, 2020, 5:25:14 AM6/19/20
to
James Kuyper <james...@alumni.caltech.edu> wrote:
> Persistent rumor to the
> contrary notwithstanding, C is not a "portable assembly language". An
> implementation is under no obligation to generate the same machine code
> that you naively expect it to, so long as the code it actually generates
> has the required observable behavior

One of the defenses I have seen several times for the C language over
other languages (including C++) is the claim that you know exactly
what kind of machine code every C statement produces.

I think this just goes to show how so many old-school C coders still
seem to live in the early 90's.

In the 80's and well into the 90's it might have indeed been so:
Most C compilers were extremely straightforward, and did a very
simplistic and rudimentary direct C-to-asm conversion (in the worst
cases you could hardly even call it a C "compiler", more like a
"C to asm transpiler"). Many compilers didn't do even the most
rudimentary optimizations (like converting a division by 2 of an
unsigned int into a right-shift.)

However, rather obviously this hasn't been the case for like two
decades now. C compilers (like gcc, clang, icc and even MSVC) do
all kinds of optimizations that can make the result asm not resemble
the original C source code at all. They will unroll loops, they will
optimize things away, they will do as many calculations at compile
time as possible, and especially nowadays they are becoming better
and better at autovectorization (ie. creating SIMD opcodes that
operate on more than one value at a time). What looked like a simple
loop that does some arithmetic on some array may end up in asm as
a series of SSE opcodes that operate on 8 or even 16 values at a time
and have very little resemblance to the original C code.

Juha Nieminen

unread,
Jun 19, 2020, 5:32:56 AM6/19/20
to
bol...@nowhere.co.uk wrote:
>>I don't agree. C is too simplistic for large complex projects (mainly
>
> Linus Torvalds would disagree with you.

I have read his famous anti-C++ rant. It makes absolutely no sense,
is full of errors, and proves that he still thinks we live in 1992.

It is *possible* to create enormous projects in C. However, the code
pretty much invariably ends up being a lot more complicated and
following needless coding conventions than it would have to be,
for little to no benefit (eg. in terms of efficiency or features).

Jorgen Grahn

unread,
Jun 19, 2020, 6:37:10 AM6/19/20
to
On Fri, 2020-06-19, Juha Nieminen wrote:
> James Kuyper <james...@alumni.caltech.edu> wrote:
>> Persistent rumor to the
>> contrary notwithstanding, C is not a "portable assembly language". An
>> implementation is under no obligation to generate the same machine code
>> that you naively expect it to, so long as the code it actually generates
>> has the required observable behavior
>
> One of the defenses I have seen several times for the C language over
> other languages (including C++) is the claim that you know exactly
> what kind of machine code every C statement produces.
>
> I think this just goes to show how so many old-school C coders still
> seem to live in the early 90's.
>
> In the 80's and well into the 90's it might have indeed been so:
> Most C compilers were extremely straightforward, and did a very
> simplistic and rudimentary direct C-to-asm conversion

It was also more common IME that you saw your code as tied to one OS
and one CPU -- e.g. the 68000 or x86. I didn't stop thinking that way
until I began doing more Unix work. That world was more diverse, and
came with weird RISC CPUs which you weren't supposed to know in
detail.

bol...@nowhere.co.uk

unread,
Jun 19, 2020, 7:02:09 AM6/19/20
to
On Fri, 19 Jun 2020 09:32:45 +0000 (UTC)
Juha Nieminen <nos...@thanks.invalid> wrote:
>bol...@nowhere.co.uk wrote:
>>>I don't agree. C is too simplistic for large complex projects (mainly
>>
>> Linus Torvalds would disagree with you.
>
>I have read his famous anti-C++ rant. It makes absolutely no sense,
>is full of errors, and proves that he still thinks we live in 1992.

His rant was a long time ago when frankly C++ was an awful language.

>It is *possible* to create enormous projects in C. However, the code
>pretty much invariably ends up being a lot more complicated and
>following needless coding conventions than it would have to be,
>for little to no benefit (eg. in terms of efficiency or features).

The point you are apparently unaware of is that a lot of the nice features
in modern C++ can't be used at the bare metal level because the flow of control
and memory/stack allocation has to be absolutely explicit. For the same reason
you won't find malloc() or free() in kernel code you wouldn't find any of the
STL or shared pointers or virtual functions or frankly anything that makes C++
useful.

Öö Tiib

unread,
Jun 19, 2020, 9:00:30 AM6/19/20
to
Several things in every sentence wrong.

Nothing needs to be absolutely explicit. The whole point of
structured programming is to abstract away the uninteresting details
that are always same every time. Even all assemblers have macros
to get rid of how tediously repetitive absolutely explicit garbage is.

std::pair, std::tuple, std::array, std::string_view, std::variant,
std::initializer_list etc do not do any dynamic allocations if the elements
don't. C++20 aims to add std::vector and std::string to those.

Where there are no malloc nor free needed in C there also are no dynamic
allocations in C++ are needed, so what is the difference?

The std::shared_ptr is typical in newbie code anyway, someone from
Java world attempting to write C++. In actual C++ I see it rarely.

Virtual functions are about as common in embedded C++ as function
pointers in embedded C.

Paavo Helde

unread,
Jun 19, 2020, 9:35:59 AM6/19/20
to
There is no rule that for OS programming everything must be explicit.
One just has to know what happens when one writes a line of code. But
this is not specific to kernel, it's just the consequences are nastier
with OS code when you do write a line which you don't understand what it
does.

In kernel code you do not indeed call malloc. Instead of it there is
kmalloc with lots of additional options. But in principle there is
nothing which would prohibit one to use STL containers with memory
allocators using kmalloc and having the same options.

RAII classes might indeed not work well in situations like thread
context switch. But such code constitutes only a little piece of kernel.

In a similar fashion, in a normal user-space C++ program one has to be
extremely careful what one writes in a signal handler or between fork()
and exec() (hint: a malloc call or a std::string construction is a big
no-no!). You will not decide to switch your multi-million line C++
program over to C just because occasionally you want to launch child
processes via fork and exec.



bol...@nowhere.co.uk

unread,
Jun 19, 2020, 10:52:14 AM6/19/20
to
On Fri, 19 Jun 2020 06:00:13 -0700 (PDT)
=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Friday, 19 June 2020 14:02:09 UTC+3, bol...@nowhere.co.uk wrote:
>> The point you are apparently unaware of is that a lot of the nice features
>> in modern C++ can't be used at the bare metal level because the flow of
>control
>> and memory/stack allocation has to be absolutely explicit. For the same
>reason
>> you won't find malloc() or free() in kernel code you wouldn't find any of
>the
>> STL or shared pointers or virtual functions or frankly anything that makes
>C++
>> useful.
>
>Several things in every sentence wrong.
>
>Nothing needs to be absolutely explicit. The whole point of
>structured programming is to abstract away the uninteresting details

You can't abstract away when writing kernel code, you need to know whats
going on.

>that are always same every time. Even all assemblers have macros
>to get rid of how tediously repetitive absolutely explicit garbage is.

Macro != container.

>The std::shared_ptr is typical in newbie code anyway, someone from
>Java world attempting to write C++. In actual C++ I see it rarely.

You don't see much modern C++ then.

Manfred

unread,
Jun 19, 2020, 11:21:33 AM6/19/20
to
True, but still I find this a weak argument.
If the committee broke some feature (in the sense that made it UB) in
C++17, as of today you can still do with C++14, especially since the
delta between C++14 and C++17 is limited, however hopefully new useful
features will be added in the future, which you won't be able to use
unless you drop malloc() for POD objects, for example.

I may add that the use of malloc() might not seem a big deal, but it
seems to me that the same issues apply more generally to manipulating
raw memory, which is a big deal in systems programming.

>
> /Jorgen
>

Jorgen Grahn

unread,
Jun 19, 2020, 12:11:38 PM6/19/20
to
On Fri, 2020-06-19, Manfred wrote:
> On 6/19/2020 9:31 AM, Jorgen Grahn wrote:
>> On Thu, 2020-06-18, Chris Vine wrote:
>> ...
>>> I think my general view is that if you want to write close-to-the-metal
>>> code these days you are better doing it in C rather than C++. There is
>>> no way in the world that the C committee would break long-standing
>>> practices with changes like the malloc/trivial type changes we have been
>>> discussing with C++17. With C++, you have the worry that the committee
>>> is going to saw your low-level code off at the knees again when the next
>>> standard comes out.
>>
>> Why not just use C++14, then? Compiler writers seem to rarely remove
>> support for older standards.
>>
>> The way C++ changes these days bothers me too, but I'm not going to go
>> back to the Stone Age just to spite the ISO committee.
>
> True, but still I find this a weak argument.

Well, it wasn't meant as an argument for the committee to break stuff.
I just meant, let's not overreact by going back to the 1970s.

Jorgen Grahn

unread,
Jun 19, 2020, 12:44:36 PM6/19/20
to
On Fri, 2020-06-19, bol...@nowhere.co.uk wrote:
> On Fri, 19 Jun 2020 06:00:13 -0700 (PDT) <oot...@hot.ee> wrote:
...
>>The std::shared_ptr is typical in newbie code anyway, someone from
>>Java world attempting to write C++. In actual C++ I see it rarely.
>
> You don't see much modern C++ then.

Stroustrup:

"... does make the lifetime of the shared object hard to predict.
Use shared_ptr only if you actually need shared ownership."

That's after he warns against pointers of all kinds and complains that
people use 'new' too much.

Öö Tiib

unread,
Jun 19, 2020, 5:08:54 PM6/19/20
to
On Friday, 19 June 2020 17:52:14 UTC+3, bol...@nowhere.co.uk wrote:
> On Fri, 19 Jun 2020 06:00:13 -0700 (PDT)
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
> >On Friday, 19 June 2020 14:02:09 UTC+3, bol...@nowhere.co.uk wrote:
> >> The point you are apparently unaware of is that a lot of the nice features
> >> in modern C++ can't be used at the bare metal level because the flow of
> >control
> >> and memory/stack allocation has to be absolutely explicit. For the same
> >reason
> >> you won't find malloc() or free() in kernel code you wouldn't find any of
> >the
> >> STL or shared pointers or virtual functions or frankly anything that makes
> >C++
> >> useful.
> >
> >Several things in every sentence wrong.
> >
> >Nothing needs to be absolutely explicit. The whole point of
> >structured programming is to abstract away the uninteresting details
>
> You can't abstract away when writing kernel code, you need to know whats
> going on.

When writing whatever code we need to know what is going on. Kernel
is not different in any way. Abstracting away is not for to lose
awareness about what is going on but for to lose pointless need to
address every of well-known details explicitly.

> >that are always same every time. Even all assemblers have macros
> >to get rid of how tediously repetitive absolutely explicit garbage is.
>
> Macro != container.

Writing pointless truisms? Are you confused in dark and trying
to understand what is what? Similarly template is not container
and pointer is not function.
I gave partial list of standard library containers that do not
even involve dynamic allocations that you snipped.
>
> >The std::shared_ptr is typical in newbie code anyway, someone from
> >Java world attempting to write C++. In actual C++ I see it rarely.
>
> You don't see much modern C++ then.

Newbie C++ does not mean that it is modern but that it is clumsy,
naive, defective and inefficient.
Most C++ I see is relatively modern. Big parts of it won't compile
on C++11 compiler as there are C++14 or even C++17 features used.

Alf P. Steinbach

unread,
Jun 19, 2020, 6:02:29 PM6/19/20
to
On 19.06.2020 11:25, Juha Nieminen wrote:
> James Kuyper <james...@alumni.caltech.edu> wrote:
>> Persistent rumor to the
>> contrary notwithstanding, C is not a "portable assembly language". An
>> implementation is under no obligation to generate the same machine code
>> that you naively expect it to, so long as the code it actually generates
>> has the required observable behavior
>
> One of the defenses I have seen several times for the C language over
> other languages (including C++) is the claim that you know exactly
> what kind of machine code every C statement produces.

No, not the exact kind of machine code, but knowing that no support
machinery is added, such as

* C++ dynamic initialization of global lifetime variables.
* C++ exception support.
* C++ hidden array size prefix increasing allocation size.

There is one major case of not knowing the kind of machine code in C++,
discussed a number of times in this group:

* C++ permission to implement an iterative function as recursive.

While I'm happy that there is a language that has the tree first C++
features (one can always use C for critical parts where those features
are undesired), I'm very unhappy about the lunacy in the standard that
formally permits this last point, that the /compiler/ is permitted to
/introduce/ undefined behavior.

From a pedantically theoretical perspective that means that /every/ C++
program has UB, and thus, that the compiler can do whatever the hark it
wants and still be formally correct. Of course, that possibility is also
there by just silently omitting to blink the NumLock indicator, as a
diagnostic about language extension, after which again the compiler can
do pretty much whatever it wants and still be formally correct. But.


> I think this just goes to show how so many old-school C coders still
> seem to live in the early 90's.

I don't think /any/ C coders believe they know the exact kind of machine
code, except that possibly the C standard requires source code loops to
be machine code loops. Essentially a requirement that O(1) resource
usage in the source code is not translated to O(n) resource usage in the
machine code. I don't know if C does, indeed I suspect it's as lunatic
as C++ here, but it would be really nice if one could rely on C.

Original Jensen & Wirth Pascal was designed to support manual
compilation by students, it was a design goal.

But then, J&W Pascal didn't even have dynamic length arrays.

[snip]


- Alf

Ian Collins

unread,
Jun 19, 2020, 7:43:00 PM6/19/20
to
On 20/06/2020 09:08, Öö Tiib wrote:
> On Friday, 19 June 2020 17:52:14 UTC+3, bol...@nowhere.co.uk wrote:
>>
>> You can't abstract away when writing kernel code, you need to know whats
>> going on.
>
> When writing whatever code we need to know what is going on. Kernel
> is not different in any way. Abstracting away is not for to lose
> awareness about what is going on but for to lose pointless need to
> address every of well-known details explicitly.

There be trolls.

--
Ian.

James Kuyper

unread,
Jun 20, 2020, 1:02:35 AM6/20/20
to
On 6/19/20 6:02 PM, Alf P. Steinbach wrote:
...
> * C++ permission to implement an iterative function as recursive.

I agree that C++, like C, gives implementations permission to convert an
iterative function into an equivalent recursive one. But "equivalent" is
the key word here. If the C++ standard requires that the original code
has certain observable behavior, the converted code must meet that same
requirement.

> While I'm happy that there is a language that has the tree first C++
> features (one can always use C for critical parts where those features
> are undesired), I'm very unhappy about the lunacy in the standard that
> formally permits this last point, that the /compiler/ is permitted to
> /introduce/ undefined behavior.

I have no idea what in the world you're referring to. Whether or not the
behavior is undefined is determined entirely by the source code. What
the consequences are when the behavior is undefined is entirely up to
the implementation. In what context do you believe that the standard
permits an implementation to take code that does not have undefined
behavior, and do something (and please identify what that something is)
that makes the behavior undefined?

> From a pedantically theoretical perspective that means that /every/ C++
> program has UB, and thus, that the compiler can do whatever the hark it
> wants and still be formally correct. Of course, that possibility is also
> there by just silently omitting to blink the NumLock indicator, as a
> diagnostic about language extension,

An implementation is required to document how diagnostics can be
identified as such. Would you voluntarily use a compiler whose
documentation specifies that method for identifying a diagnostic?

>> I think this just goes to show how so many old-school C coders still
>> seem to live in the early 90's.
>
> I don't think /any/ C coders believe they know the exact kind of machine
> code, ...

People who believe that falsehood post regularly to comp.lang.c, usually
complaining about the failure of their expectations to be met by a
particular compiler. People who believe the corresponding falsehood
about C++ post right here, regularly.

> ... except that possibly the C standard requires source code loops to
> be machine code loops.

No, it imposes no such requirement. Like C++, the only requirements are
imposed on the observable behavior - there's no requirements on how that
behavior is achieved.

> ... Essentially a requirement that O(1) resource
> usage in the source code is not translated to O(n) resource usage in the
> machine code. I don't know if C does, indeed I suspect it's as lunatic
> as C++ here, but it would be really nice if one could rely on C.

The C standard absolutely positively fails to impose any performance
requirements on the generated code. That's entirely a matter of QoI. An
implementation that generates code for "int main(void) {}" which takes 2
billion years to execute could still be fully conforming, so long as,
during those 2 billion years, it produces exactly the required
observable behavior (in this case, none). It would be hard to convince
anybody to use such a compiler - but that's what QoI is about.

C++ actually does slightly better: many templates call functions
provided by the user (usually member functions of classes provided as
template type arguments), and many of those templates have requirements
on how many calls to those functions will occur. However, it still fails
to impose any requirements on how long those calls actually take. Even
when the standard requires that an operation take O(1) time, such as
std::forward_list::splice_after() (26.3.9.6p8), a fully conforming
implementation of the standard library could still take 2 billion years
to execute that function - so long as it takes approximately the same
amount of time regardless of how many items there are in the list.


Manfred

unread,
Jun 20, 2020, 11:50:28 AM6/20/20
to
On 6/20/2020 7:02 AM, James Kuyper wrote:
> On 6/19/20 6:02 PM, Alf P. Steinbach wrote:
> ...
>> * C++ permission to implement an iterative function as recursive.
>
> I agree that C++, like C, gives implementations permission to convert an
> iterative function into an equivalent recursive one. But "equivalent" is
> the key word here. If the C++ standard requires that the original code
> has certain observable behavior, the converted code must meet that same
> requirement.
>

I am no expert on this point, rather I am curious: does the standard say
something explicit (if so, a pointer would be appreciated) about
converting iterations into a recursive function, or is this "permission"
merely based on the lack of prohibition under the umbrella of the "as
if" rule?

>> While I'm happy that there is a language that has the tree first C++
>> features (one can always use C for critical parts where those features
>> are undesired), I'm very unhappy about the lunacy in the standard that
>> formally permits this last point, that the /compiler/ is permitted to
>> /introduce/ undefined behavior.
>
> I have no idea what in the world you're referring to. Whether or not the
> behavior is undefined is determined entirely by the source code.

The way I see it, the point is that a recursive function has different
requirements and may be more demanding in terms of resources (not just
time) than iterative ones, so the source code may have well defined
behavior in the iterative form, but run into UB in the recursive form
due to resource exhaustion.

What
> the consequences are when the behavior is undefined is entirely up to
> the implementation. In what context do you believe that the standard
> permits an implementation to take code that does not have undefined
> behavior, and do something (and please identify what that something is)
> that makes the behavior undefined?
>

[...]

>
> The C standard absolutely positively fails to impose any performance
> requirements on the generated code. That's entirely a matter of QoI. An
> implementation that generates code for "int main(void) {}" which takes 2
> billion years to execute could still be fully conforming, so long as,
> during those 2 billion years, it produces exactly the required
> observable behavior (in this case, none). It would be hard to convince
> anybody to use such a compiler - but that's what QoI is about.

Alf mentioned resource usage, not just time. In this case running out of
stack space is a typical risk to be managed, which does not apply to
iterative loops.

James Kuyper

unread,
Jun 21, 2020, 1:09:25 AM6/21/20
to
On 6/20/20 11:50 AM, Manfred wrote:
> On 6/20/2020 7:02 AM, James Kuyper wrote:
>> On 6/19/20 6:02 PM, Alf P. Steinbach wrote:
>> ...
>>> * C++ permission to implement an iterative function as recursive.
>>
>> I agree that C++, like C, gives implementations permission to convert an
>> iterative function into an equivalent recursive one. But "equivalent" is
>> the key word here. If the C++ standard requires that the original code
>> has certain observable behavior, the converted code must meet that same
>> requirement.
>>
>
> I am no expert on this point, rather I am curious: does the standard say
> something explicit (if so, a pointer would be appreciated) about
> converting iterations into a recursive function, ...

No.

> ... or is this "permission"
> merely based on the lack of prohibition under the umbrella of the "as
> if" rule?

In the context of C or C++, "observable behavior" is a special piece of
jargon defined by the relevant standard - that definition is not
"behavior which is observable". When that phrase is used by someone who
understands that definition, it is always in reference to the as-if
rule, because the relevant standard uses that term only in the context
of that rule.

>>> While I'm happy that there is a language that has the tree first C++
>>> features (one can always use C for critical parts where those features
>>> are undesired), I'm very unhappy about the lunacy in the standard that
>>> formally permits this last point, that the /compiler/ is permitted to
>>> /introduce/ undefined behavior.
>>
>> I have no idea what in the world you're referring to. Whether or not the
>> behavior is undefined is determined entirely by the source code.
>
> The way I see it, the point is that a recursive function has different
> requirements and may be more demanding in terms of resources (not just
> time) than iterative ones, so the source code may have well defined
> behavior in the iterative form, but run into UB in the recursive form
> due to resource exhaustion.

A program that exceeds an implementation's limit may not be successfully
processed (See 4.1p2 and Annex B). However, the standard sets no minimum
value on any of those limits (the values shown in Annex B are merely
recommendations and guidelines). It also fails to impose any limits on
how much memory any particular piece of code may require. Therefore, any
program might exceed those limits, whether a loop is handled iteratively
or recursively.
It's entirely a matter of QoI whether an implementation decides to
implement any code in any manner that doesn't exceed it's own limits.
During one memorable discussion, the person I was talking with was
actually complained about the supposed "misinterpretation" of the
standard as allowing an optimization that greatly reduced the likelihood
of resource exhaustion (though he never acknowledged the truth of the
claim that it would have had that effect - he just ignored that
assertion completely).

>> The C standard absolutely positively fails to impose any performance
>> requirements on the generated code. That's entirely a matter of QoI. An
>> implementation that generates code for "int main(void) {}" which takes 2
>> billion years to execute could still be fully conforming, so long as,
>> during those 2 billion years, it produces exactly the required
>> observable behavior (in this case, none). It would be hard to convince
>> anybody to use such a compiler - but that's what QoI is about.
>
> Alf mentioned resource usage, not just time.

My comment was about all such resources, I just used time as an example.

> ... In this case running out of
> stack space is a typical risk to be managed, which does not apply to
> iterative loops.

Any program, even "int main(void) {}", may be implemented in such a
fashion as to exceed an implementation's resource limits. It's only a
matter of QoI whether or not that actually happens.

Juha Nieminen

unread,
Jun 21, 2020, 9:54:43 AM6/21/20
to
bol...@nowhere.co.uk wrote:
> The point you are apparently unaware of is that a lot of the nice features
> in modern C++ can't be used at the bare metal level because the flow of control
> and memory/stack allocation has to be absolutely explicit. For the same reason
> you won't find malloc() or free() in kernel code you wouldn't find any of the
> STL or shared pointers or virtual functions or frankly anything that makes C++
> useful.

Actually many of the utilities provided by the C++ standard library can
be very useful even if you are required to avoid dynamic memory allocation.

For example, std::unique_ptr is *not* restricted to simply manage
dynamically allocated memory for you. It can perfectly well (and
relatively easily) be used to manage other resources as well. For
example, you can use a `std::unique_ptr` to automatically close
a FILE* that you have std::fopen()'ed, when the pointer goes out
of scope. No dynamic memory allocation involved here (unless
std::fopen() itself does).

Also, in fact, all the dynamic data containers provided by the
standard library can be used without them calling the default
'new'. You can provide them with your own allocator, if you so
wish. This allocator could manage memory in a static array,
for example. There may be situations where this could be useful.

There's also, of course, a huge amount of utilities in the standard
library that are not related to dynamic memory allocation, such
as the utilities in <algorithm> and many of the other headers.

It is perfectly possible to get a lot of benefit from C++ without
having to go to dynamic memory allocation and virtual functions.

Öö Tiib

unread,
Jun 21, 2020, 12:59:36 PM6/21/20
to
On Sunday, 21 June 2020 16:54:43 UTC+3, Juha Nieminen wrote:
> bol...@nowhere.co.uk wrote:
> > The point you are apparently unaware of is that a lot of the nice features
> > in modern C++ can't be used at the bare metal level because the flow of control
> > and memory/stack allocation has to be absolutely explicit. For the same reason
> > you won't find malloc() or free() in kernel code you wouldn't find any of the
> > STL or shared pointers or virtual functions or frankly anything that makes C++
> > useful.
>
> Actually many of the utilities provided by the C++ standard library can
> be very useful even if you are required to avoid dynamic memory allocation.
>
> For example, std::unique_ptr is *not* restricted to simply manage
> dynamically allocated memory for you. It can perfectly well (and
> relatively easily) be used to manage other resources as well. For
> example, you can use a `std::unique_ptr` to automatically close
> a FILE* that you have std::fopen()'ed, when the pointer goes out
> of scope. No dynamic memory allocation involved here (unless
> std::fopen() itself does).

That is generally good example but it coincides with one thing that
most programs handle lousily. If file was fopened for writing then
the very rare cases when fclose can fail are likely quite important
to handle properly. Unhandled failure may mean that users did lose
their data. That problem does not magically disappear when
fclose is abstracted away into deleter of unique_ptr.


Jorgen Grahn

unread,
Jun 21, 2020, 2:02:48 PM6/21/20
to
On Sun, 2020-06-21, Öö Tiib wrote:
> On Sunday, 21 June 2020 16:54:43 UTC+3, Juha Nieminen wrote:
...
>> For example, std::unique_ptr is *not* restricted to simply manage
>> dynamically allocated memory for you. It can perfectly well (and
>> relatively easily) be used to manage other resources as well. For
>> example, you can use a `std::unique_ptr` to automatically close
>> a FILE* that you have std::fopen()'ed, when the pointer goes out
>> of scope. No dynamic memory allocation involved here (unless
>> std::fopen() itself does).
>
> That is generally good example but it coincides with one thing that
> most programs handle lousily. If file was fopened for writing then
> the very rare cases when fclose can fail are likely quite important
> to handle properly. Unhandled failure may mean that users did lose
> their data. That problem does not magically disappear when
> fclose is abstracted away into deleter of unique_ptr.

And it doesn't disappear if you use a std::fstream.

(Not that I always handle the error myself.)

Chris M. Thomasson

unread,
Jun 21, 2020, 3:52:42 PM6/21/20
to
Damn. I almost need to create part of it in C. Still not sure how to use
the placement array version of new, without getting its "real" size
before hand. Wrt the act of having to call the ctor of each element.

Say, std::malloc's a 1000 unsigned chars, and the name placement new
array adds in some metadata. Well, the size is not big enough to hold
the damn header! Shi%!

What am I missing? ;^o

Öö Tiib

unread,
Jun 21, 2020, 6:14:30 PM6/21/20
to
Yes. When the data is valuable then I prefer to close the stream
explicitly anyway regardless of RAII. Issues that are discovered in
destructor are inconvenient to handle.

bol...@nowhere.co.uk

unread,
Jun 22, 2020, 5:04:32 AM6/22/20
to
On 19 Jun 2020 16:44:26 GMT
Jorgen Grahn <grahn...@snipabacken.se> wrote:
>On Fri, 2020-06-19, bol...@nowhere.co.uk wrote:
>> On Fri, 19 Jun 2020 06:00:13 -0700 (PDT) <oot...@hot.ee> wrote:
>....
>>>The std::shared_ptr is typical in newbie code anyway, someone from
>>>Java world attempting to write C++. In actual C++ I see it rarely.
>>
>> You don't see much modern C++ then.
>
>Stroustrup:
>
> "... does make the lifetime of the shared object hard to predict.
> Use shared_ptr only if you actually need shared ownership."
>
>That's after he warns against pointers of all kinds and complains that
>people use 'new' too much.

Presumably stroustup loves Java then with its approach of allocating huge
amounts of memory up front instead of as and when its needed.

bol...@nowhere.co.uk

unread,
Jun 22, 2020, 5:08:18 AM6/22/20
to
On Fri, 19 Jun 2020 14:08:43 -0700 (PDT)
=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Friday, 19 June 2020 17:52:14 UTC+3, bol...@nowhere.co.uk wrote:
>> On Fri, 19 Jun 2020 06:00:13 -0700 (PDT)
>> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>> >On Friday, 19 June 2020 14:02:09 UTC+3, bol...@nowhere.co.uk wrote:
>> >> The point you are apparently unaware of is that a lot of the nice features
>
>> >> in modern C++ can't be used at the bare metal level because the flow of
>> >control
>> >> and memory/stack allocation has to be absolutely explicit. For the same
>> >reason
>> >> you won't find malloc() or free() in kernel code you wouldn't find any of
>> >the
>> >> STL or shared pointers or virtual functions or frankly anything that makes
>
>> >C++
>> >> useful.
>> >
>> >Several things in every sentence wrong.
>> >
>> >Nothing needs to be absolutely explicit. The whole point of
>> >structured programming is to abstract away the uninteresting details
>>
>> You can't abstract away when writing kernel code, you need to know whats
>> going on.
>
>When writing whatever code we need to know what is going on. Kernel
>is not different in any way. Abstracting away is not for to lose
>awareness about what is going on but for to lose pointless need to
>address every of well-known details explicitly.

If only life were that simple. To use the containers in the STL usefully in
kernel code every single one would need a custom allocator and containers like
map and set that goes away and rebalance when it feels like it couldn't be used
at all. Ditto unordered_* with hash tables that will need a resize, possibly
at an inconvenient moment.

>> >that are always same every time. Even all assemblers have macros
>> >to get rid of how tediously repetitive absolutely explicit garbage is.
>>
>> Macro != container.
>
>Writing pointless truisms? Are you confused in dark and trying
>to understand what is what? Similarly template is not container
>and pointer is not function.
>I gave partial list of standard library containers that do not
>even involve dynamic allocations that you snipped.

Any major STL functionality requires some form of memory allocation.

>> >The std::shared_ptr is typical in newbie code anyway, someone from
>> >Java world attempting to write C++. In actual C++ I see it rarely.
>>
>> You don't see much modern C++ then.
>
>Newbie C++ does not mean that it is modern but that it is clumsy,
>naive, defective and inefficient.

ITYM you've never written complex enough code where shared_ptr is useful. I
suspect you're in a minority.

bol...@nowhere.co.uk

unread,
Jun 22, 2020, 5:10:22 AM6/22/20
to
On Sun, 21 Jun 2020 13:54:34 +0000 (UTC)
Juha Nieminen <nos...@thanks.invalid> wrote:
>bol...@nowhere.co.uk wrote:
>> The point you are apparently unaware of is that a lot of the nice features
>> in modern C++ can't be used at the bare metal level because the flow of
>control
>> and memory/stack allocation has to be absolutely explicit. For the same
>reason
>> you won't find malloc() or free() in kernel code you wouldn't find any of
>the
>> STL or shared pointers or virtual functions or frankly anything that makes
>C++
>> useful.
>
>Actually many of the utilities provided by the C++ standard library can
>be very useful even if you are required to avoid dynamic memory allocation.
>
>For example, std::unique_ptr is *not* restricted to simply manage
>dynamically allocated memory for you. It can perfectly well (and
>relatively easily) be used to manage other resources as well. For
>example, you can use a `std::unique_ptr` to automatically close
>a FILE* that you have std::fopen()'ed, when the pointer goes out

I didn't know that. What other standard types is it overloaded for? Can it
close file descriptors allocated by open() or socket() for example or is it
pointers only?

>Also, in fact, all the dynamic data containers provided by the
>standard library can be used without them calling the default
>'new'. You can provide them with your own allocator, if you so
>wish. This allocator could manage memory in a static array,
>for example. There may be situations where this could be useful.

It all becomes a bit of a mess when you start making C++ explicit and IMO just
using plain C would be less convoluted. An opinion that appears to be shared
by kernel code authors.

Chris Vine

unread,
Jun 22, 2020, 6:16:28 AM6/22/20
to
On Sun, 21 Jun 2020 12:52:30 -0700
"Chris M. Thomasson" <chris.m.t...@gmail.com> wrote:
[snip]
> Damn. I almost need to create part of it in C. Still not sure how to use
> the placement array version of new, without getting its "real" size
> before hand. Wrt the act of having to call the ctor of each element.
>
> Say, std::malloc's a 1000 unsigned chars, and the name placement new
> array adds in some metadata. Well, the size is not big enough to hold
> the damn header! Shi%!

The header is OK, as this uses non-array placement new which only
occupies a 'sizeof(header)' memory block. The subsequent allocation
area created by array placement new is the problem, because the memory
block passed in may not be big enough for it, and there is no formal
method under the standard (as opposed to your particular compiler's
ABI) to be sure whether that is the case. In addition, with array
placement new the address of the returned memory block (your allocation
area) may be different from the address of the block passed in,
depending on whether array placement new stores its cookie at the
beginning or the end. This in turn means that although your 'this + 1'
expression is valid, its result may not point to the beginning of the
allocation area, thereby causing get_buf() to return the wrong address.

> What am I missing? ;^o

As I said, either (i) use 'new unsigned char[sz]' to allocate the
block and work from there, remembering to use std::launder as required,
or (ii) use C.

Bo Persson

unread,
Jun 22, 2020, 8:12:02 AM6/22/20
to
On 2020-06-22 at 11:10, bol...@nowhere.co.uk wrote:
> On Sun, 21 Jun 2020 13:54:34 +0000 (UTC)
> Juha Nieminen <nos...@thanks.invalid> wrote:
>
>> Actually many of the utilities provided by the C++ standard library can
>> be very useful even if you are required to avoid dynamic memory allocation.
>>
>> For example, std::unique_ptr is *not* restricted to simply manage
>> dynamically allocated memory for you. It can perfectly well (and
>> relatively easily) be used to manage other resources as well. For
>> example, you can use a `std::unique_ptr` to automatically close
>> a FILE* that you have std::fopen()'ed, when the pointer goes out
>
> I didn't know that. What other standard types is it overloaded for?

It is not even overloaded, it just happens (by design) that
unique_ptr<FILE, fclose> will use fclose instead of operator delete when
it is time to dispose of the pointer.

> Can it
> close file descriptors allocated by open() or socket() for example or is it
> pointers only?
>

Not really, unique_ptr is supposed to handle pointers. However, it is
not that hard to write a small wrapper class that holds your descriptor
and closes it in the destructor. RAII is one of C++'s power tools!


Bo Persson

Chris Vine

unread,
Jun 22, 2020, 10:52:14 AM6/22/20
to
On Mon, 22 Jun 2020 14:11:50 +0200
Bo Persson <b...@bo-persson.se> wrote:
> On 2020-06-22 at 11:10, bol...@nowhere.co.uk wrote:
> > On Sun, 21 Jun 2020 13:54:34 +0000 (UTC)
> > Juha Nieminen <nos...@thanks.invalid> wrote:
> >
> >> Actually many of the utilities provided by the C++ standard library can
> >> be very useful even if you are required to avoid dynamic memory allocation.
> >>
> >> For example, std::unique_ptr is *not* restricted to simply manage
> >> dynamically allocated memory for you. It can perfectly well (and
> >> relatively easily) be used to manage other resources as well. For
> >> example, you can use a `std::unique_ptr` to automatically close
> >> a FILE* that you have std::fopen()'ed, when the pointer goes out
> >
> > I didn't know that. What other standard types is it overloaded for?
>
> It is not even overloaded, it just happens (by design) that
> unique_ptr<FILE, fclose> will use fclose instead of operator delete when
> it is time to dispose of the pointer.

This form would need to be std::unique_ptr<FILE, decltype(&fclose)>,
instantiated such as by:

std::unique_ptr<FILE, decltype(&fclose)> p{f, &fclose};

It's a bit of a mouthful, and somewhat subversive of the type system
in that you could move into p any unique_ptr for FILE* having the
same deleter function signature as fclose, even if it is not fclose.
When using deleters from the C standard library it can be better to make
a deleter struct such as:

struct Fclose {
void operator()(void* f){std::fclose((std::FILE*)f);}
};
std::unique_ptr<FILE, Fclose> p{f};

bol...@nowhere.co.uk

unread,
Jun 22, 2020, 10:58:59 AM6/22/20
to
On Mon, 22 Jun 2020 14:11:50 +0200
Bo Persson <b...@bo-persson.se> wrote:
>On 2020-06-22 at 11:10, bol...@nowhere.co.uk wrote:
>> On Sun, 21 Jun 2020 13:54:34 +0000 (UTC)
>> Juha Nieminen <nos...@thanks.invalid> wrote:
>>
>>> Actually many of the utilities provided by the C++ standard library can
>>> be very useful even if you are required to avoid dynamic memory allocation.
>>>
>>> For example, std::unique_ptr is *not* restricted to simply manage
>>> dynamically allocated memory for you. It can perfectly well (and
>>> relatively easily) be used to manage other resources as well. For
>>> example, you can use a `std::unique_ptr` to automatically close
>>> a FILE* that you have std::fopen()'ed, when the pointer goes out
>>
>> I didn't know that. What other standard types is it overloaded for?
>
>It is not even overloaded, it just happens (by design) that
>unique_ptr<FILE, fclose> will use fclose instead of operator delete when
>it is time to dispose of the pointer.

Oh ok, you give your own deleter. I thought maybe they'd done some
template specialisation for certain pointer types.

>Not really, unique_ptr is supposed to handle pointers. However, it is
>not that hard to write a small wrapper class that holds your descriptor
>and closes it in the destructor. RAII is one of C++'s power tools!

Couldn't get it to work with unique_ptr but shared_ptr oblidges:

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <memory>

using namespace std;

int fd;

void func()
{
shared_ptr<int> fdptr(&fd,[](int *fd) { close(*fd); });
char c;
if (read(fd,&c,1) == -1) perror("read() 1");
}



int main()
{
if ((fd = open("/etc/hosts",O_RDONLY)) == -1)
{
perror("open()");
return 0;
}
func();

char c;
if (read(fd,&c,1) == -1) perror("read() 2");
return 0;
}


$ ./a.out
read() 2: Bad file descriptor


Bonita Montero

unread,
Jun 22, 2020, 2:36:58 PM6/22/20
to
> His rant was a long time ago when frankly C++ was an awful language.

Even C++98 wasn't more awful than C.

Tim Rentsch

unread,
Jul 7, 2020, 8:26:20 AM7/7/20
to
Manfred <non...@add.invalid> writes:

> On 6/16/2020 5:29 PM, Paavo Helde wrote:
>
>> 16.06.2020 15:26 Manfred kirjutas:
>>
>>> Cast operators are explicitly meant to tell the compiler: look,
>>> this pointer, irrespective of where it comes from, is a pointer to
>>> T, deal with it.
>>
>> The proposal to legislate malloc and friends for object creation
>> "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html"
>> contains explicit remark, stressing the opposite:
>>
>> "Note that a pointer reinterpret_cast is not considered sufficient
>> to trigger implicit object creation."
>>
>> I gather all this fuss is about allowing type-based alias
>> analysis. Reinterpret_cast works directly against this idea.
>
> That's why I find the proposal going in the direction of causing more
> trouble than help.
> True, reinterpret_cast breaks type-based alias analysis. However, it
> has been part of the standard since day one, and since you can't do
> type analysis on a buffer returned by malloc anyway, what's the
> problem with that?
>
> By the way, I managed to look into Bjarne's book (p. 1260):
> "Note that malloc() etc. does not invoke constructors and free()
> doesn't invoke destructors. Do not use these functions for types with
> constructors and destructors. Aldo, memset() should never be used for
> any type with a constructor".
>
> That makes perfectly sense.
> I wonder why the standard managed to come up with something that
> denies /any/ object allocation with malloc (at least according to the
> first example in the proposal).

Having now reviewed the linked document, I will offer a few
remarks and opinions.

First I think the given motivating example (with malloc()) is
wrong. Per basic.life (6.8 p1) the lifetime of an object has
started. Per c.malloc (23.10.11 p1,p2) the semantics of a call
to malloc() is specified by the C standard library. Per the
description of "Memory management functions" (7.22.3) in the
referenced C standard (which is C 2011), a (non-null) pointer
returned by malloc() or calloc() or realloc() "may be assigned to
a pointer to any type of object [not having an extended alignment
requirement] and then used to access such an object or an array
of such objects in the space allocated".

That last part seems pretty airtight. Whatever we might imagine
is implied by other parts of the C++ standard, given an explicit
and specific statement that such pointers may be used to access
objects or arrays of objects, it seem reasonable to conclude that
those pointers are indeed so usable. Indexing operations and
pointer arithmetic is part and parcel of that, those being
necessary to carry out the accessing of array elements (which
doesn't matter to the paper's example but seems worth pointing
out).

Second, notwithstanding the first comment, the proposal takes
what I think is a good direction in suggesting that (for most of
it) the topic be treated as a Defect Report, not changing the
meaning of the language but only correcting an oversight in
previous writings. So for what we thought was working all along
it is admitted that indeed it was meant to work all along, and
what is changing is only how the Standard is worded, not the
C++ language.

Third, forgive me for saying this so bluntly, but the writing in
the C++ standard is pretty awful, and unfortunately the paper's
proposed changes don't show any signs that that is improving.

Fourth, and probably most important, regarding TBAA. I see the
motivation for doing TBAA, and in fact I agree with it, but I
think the general approach C++ is taking to accommodate TBAA is
misguided. Little by little, C++ is building up an elaborate
and complicated set of rules, with an unfortunate consequence:

If your program does almost anything at all outside of
vanilla language usage, and the optimizer f**** you, then
it's your fault; or, instead it might be that

The compiler/optimizer screwed up, either in understanding
or in implementing the rules (but unfortunately it is very
hard to be sure, and even harder to convince the compiler
writers of the truth in many cases).

I'm sorry to say that the "implicit lifetime" proposal continues
this trend.

(Editorial comment: does anyone else find it funny that the
set of changes proposed gives so many explicit statements about
so-called "implicit" properties? After all that it hardly
seems right to call them "implicit".)

Tim Rentsch

unread,
Jul 8, 2020, 11:23:23 AM7/8/20
to
Manfred <non...@add.invalid> writes:

> On 6/20/2020 7:02 AM, James Kuyper wrote:
>
>> On 6/19/20 6:02 PM, Alf P. Steinbach wrote:
>> ...
>>
>>> * C++ permission to implement an iterative function as recursive.
>>
>> I agree that C++, like C, gives implementations permission to convert an
>> iterative function into an equivalent recursive one. But "equivalent" is
>> the key word here. If the C++ standard requires that the original code
>> has certain observable behavior, the converted code must meet that same
>> requirement.
>
> I am no expert on this point, rather I am curious: does the standard
> say something explicit (if so, a pointer would be appreciated) about
> converting iterations into a recursive function, or is this
> "permission" merely based on the lack of prohibition under the
> umbrella of the "as if" rule?

To a certain extent it is both. There is no specific statement in
the C++ standard (or the C standard either) that an iterative
function may be implemented using recursive object code, or vice
versa. But the C++ standard does say (in intro.execution, p1)
"[Conforming implementations] need not copy or emulate the
structure of the abstract machine." Any such mapping does fall
under the "as if" rule, so in that sense the freedom is implicit
rather than explicit. At the same time there is an explicit
freedom granted to disregard what the abstract machine would do
(provided of course the "as if" requirements are met). A C++
implementation can be conforming even if the source code is
"compiled" by translating it to pure Lisp and then running the Lisp
code. Pure Lisp doesn't have any way of iterating; all it has is
recursion. As long as the Lisp code produces the same output that
the abstract machine would, the implementation is conforming.
(Note: there are some other aspects of what is called "observable
behavior" that I've left out, but that doesn't change the key point
that compiling to a recursive-only environment such as pure Lisp
can still be conforming.)

>>> While I'm happy that there is a language that has the tree first C++
>>> features (one can always use C for critical parts where those features
>>> are undesired), I'm very unhappy about the lunacy in the standard that
>>> formally permits this last point, that the /compiler/ is permitted to
>>> /introduce/ undefined behavior.
>>
>> I have no idea what in the world you're referring to. Whether or not the
>> behavior is undefined is determined entirely by the source code.
>
> The way I see it, the point is that a recursive function has different
> requirements and may be more demanding in terms of resources (not just
> time) than iterative ones, so the source code may have well defined
> behavior in the iterative form, but run into UB in the recursive form
> due to resource exhaustion.

I know some people think that running out of stack space (as easily
might happen in a deeply recursive function) is necessarily
undefined behavior. That's not right. Running out of stack space,
whether because of recursion or otherwise, is a property of the
execution environment, and the implementation has no control over
that. Indeed the implementation might not even be able to discover
if it's about to happen. These things do not affect whether a
program has undefined behavior, which is determined solely by what
is specified (or not) to happen in the abstract machine.

I explained this stuff in more detail, sometime last year, in several
postings in this newsgroup. If you would like more explanation, I
can dig around and see if I can find some of that commentary, to
help answer further questions. Of course I reserve the right to
answer further questions directly, without making any reference to
my previous comments. ;)

Alf P. Steinbach

unread,
Jul 8, 2020, 6:59:20 PM7/8/20
to
On 08.07.2020 17:23, Tim Rentsch wrote:
> [snip]
> These things [like stack usage] do not affect whether a
> program has undefined behavior, which is determined solely by what
> is specified (or not) to happen in the abstract machine.


No. UB is not solely a static property of a program. It's also a dynamic
property, such as exceeding a buffer size, or an implementation limit.

I seem to have conceded your POV that a C or C++ compiler /can/
translate to recursive function implementation. But only by then
supposing that a compiler /can/ introduce likely dynamic UB, or at the
very least remove a conditional guarantee of well defined operation. So
this is now the point where we differ.

In my view having standards that allow that is very ungood. But it's a
thorny problem. The "conditional" I mention is because there are a
zillion possible dynamic UB sources, such as a `bool` variable: the
compiler is free to willy-nilly decide that in this particular
compilation `sizeof(bool)` is, say, 2M, and furthermore to know that its
implementation limit of stack size is less, then prove to itself that
hence the `main`, which here happens to have a `bool` temporary, would
incur UB, hence that any behavior added to `main` would be fine.

It's the anything goes.

And again, that happens because an informal, practically oriented
standard is treated as a precise formal work, which it clearly isn't.


- Alf

Öö Tiib

unread,
Jul 8, 2020, 7:33:19 PM7/8/20
to
On Thursday, 9 July 2020 01:59:20 UTC+3, Alf P. Steinbach wrote:
>
> And again, that happens because an informal, practically oriented
> standard is treated as a precise formal work, which it clearly isn't.

Some of members of committee also seem to treat it not like
specification of software development tool (C++ programming
language) but like legislative document about said tool.

Keith Thompson

unread,
Jul 8, 2020, 7:39:48 PM7/8/20
to
"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
> On 08.07.2020 17:23, Tim Rentsch wrote:
>> [snip]
>> These things [like stack usage] do not affect whether a
>> program has undefined behavior, which is determined solely by what
>> is specified (or not) to happen in the abstract machine.
>
> No. UB is not solely a static property of a program. It's also a
> dynamic property, such as exceeding a buffer size, or an
> implementation limit.

Who says the abstract machine specifies only static properties?

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Alf P. Steinbach

unread,
Jul 8, 2020, 8:50:28 PM7/8/20
to
On 09.07.2020 01:39, Keith Thompson wrote:
> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>> On 08.07.2020 17:23, Tim Rentsch wrote:
>>> [snip]
>>> These things [like stack usage] do not affect whether a
>>> program has undefined behavior, which is determined solely by what
>>> is specified (or not) to happen in the abstract machine.
>>
>> No. UB is not solely a static property of a program. It's also a
>> dynamic property, such as exceeding a buffer size, or an
>> implementation limit.
>
> Who says the abstract machine specifies only static properties?

No one so far.

Tim Rentsch maintains that UB is determined solely by the specification.
The specification is static.

He does that in order to eliminate dynamic UB as a concept, by
redefining the term. It's a common exercise by politicians seeking to
influence the unthinking public. Not so common in technical debates.


- Alf

Keith Thompson

unread,
Jul 8, 2020, 9:18:02 PM7/8/20
to
"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
> On 09.07.2020 01:39, Keith Thompson wrote:
>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>>> On 08.07.2020 17:23, Tim Rentsch wrote:
>>>> [snip]
>>>> These things [like stack usage] do not affect whether a
>>>> program has undefined behavior, which is determined solely by what
>>>> is specified (or not) to happen in the abstract machine.
>>>
>>> No. UB is not solely a static property of a program. It's also a
>>> dynamic property, such as exceeding a buffer size, or an
>>> implementation limit.
>>
>> Who says the abstract machine specifies only static properties?
>
> No one so far.
>
> Tim Rentsch maintains that UB is determined solely by the
> specification. The specification is static.

In what sense is it static? The specification (the language standard)
describes dynamic (run-time) behavior.

If your program does this (contrived example):
int n = INT_MAX;
n += rand() % 2;

the second line may or may not have undefined behavior depending on the
*dynamic* value returned by rand(). That can be inferred from the
specification.

> He does that in order to eliminate dynamic UB as a concept, by
> redefining the term. It's a common exercise by politicians seeking to
> influence the unthinking public. Not so common in technical debates.

He didn't use the word "static". Why did you introduce it?
It is loading more messages.
0 new messages