Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

std::hexfloat

538 views
Skip to first unread message

Daniel

unread,
May 17, 2019, 6:54:56 PM5/17/19
to
Consider

#include <sstream>
#include <iostream>
#include <iomanip>

int main()
{
double x = -10.0;

std::stringstream ss;
ss << std::hexfloat << x << "\n"; // (1)
std::cout << ss.str() << "\n";

double y;
ss >> y; // (2)

std::cout << y << "\n";
}

For (1), vs2017, gcc, and clang all produce something like

-0x1.4p+3

My understanding is that (2) is illegal, that a unary minus operator cannot
appear before a hex floating point representation, but can someone explain
why this is not allowed?

Thanks,
Daniel

Richard Damon

unread,
May 17, 2019, 9:46:08 PM5/17/19
to
The unary minus isn't part of the hex floating point constant, just as
it isn't in the value -1 (there are no negative integer literals)

That doesn't mean that inserters/extractors can't use that format for
negative numbers.

blt_8o...@32fkxnh90wz2bllefiqy.co.uk

unread,
May 18, 2019, 6:14:42 AM5/18/19
to
I didn't even know hexfloat existed. It seems a spectacularly useless
manipulator. What on earth is the point of it?

Bonita Montero

unread,
May 18, 2019, 7:15:25 AM5/18/19
to
> I didn't even know hexfloat existed. It seems a spectacularly
> useless manipulator. What on earth is the point of it?

You can serialize floating-point-values loss-free in ACII-files
with it.

Bo Persson

unread,
May 18, 2019, 8:28:00 AM5/18/19
to
The hexfloat is a bit odd, as it really only works for output...

For numeric input, the spec for the num_get facet specifically says that
the only characters allowed are "0123456789abcdefxABCDEFX+-".

https://en.cppreference.com/w/cpp/locale/num_get/get (see Stage 2)

So ss >> y would stop at the 'p' anyway.

The committee didn't want to break old code reading "1.0p" as one double
and one char, like in ss >> y >> ch.


Bo Persson

blt_ny...@93ho0557z0x.edu

unread,
May 18, 2019, 11:57:02 AM5/18/19
to
You can already do that with standard hex:

float f = 1.234;
cout << hex << *((long *)&f) << endl;

Bonita Montero

unread,
May 18, 2019, 12:03:34 PM5/18/19
to
>>> I didn't even know hexfloat existed. It seems a spectacularly
>>> useless manipulator. What on earth is the point of it?

>> You can serialize floating-point-values loss-free in ACII-files
>> with it.

> You can already do that with standard hex:
> float f = 1.234;
> cout << hex << *((long *)&f) << endl;

That's might not run on every platform and isn't
guaranteed to be portable.

Jorgen Grahn

unread,
May 18, 2019, 2:13:05 PM5/18/19
to
"Not guaranteed to be portable" is too mild: you might as well say
"might accidentally work". Unless there are special language rules
around long/float conversion which I'm unaware of.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Ben Bacarisse

unread,
May 18, 2019, 4:33:23 PM5/18/19
to
Jorgen Grahn <grahn...@snipabacken.se> writes:

> On Sat, 2019-05-18, Bonita Montero wrote:
>>>>> I didn't even know hexfloat existed. It seems a spectacularly
>>>>> useless manipulator. What on earth is the point of it?
>>
>>>> You can serialize floating-point-values loss-free in ACII-files
>>>> with it.
>>
>>> You can already do that with standard hex:
>>> float f = 1.234;
>>> cout << hex << *((long *)&f) << endl;
>>
>> That's might not run on every platform and isn't
>> guaranteed to be portable.
>
> "Not guaranteed to be portable" is too mild: you might as well say
> "might accidentally work". Unless there are special language rules
> around long/float conversion which I'm unaware of.

It's not technically a conversion, it's a re-interpretation of the bits
in the object. Not that that's any more portable! Not only are there
language issues that make the /code/ non-portable (for example, on my
system long is twice the size of float) but there are also endian issues
that make the output non-portable.

--
Ben.

Bonita Montero

unread,
May 19, 2019, 12:34:50 AM5/19/19
to
> It's not technically a conversion, it's a re-interpretation of the bits
> in the object. Not that that's any more portable! Not only are there
> language issues that make the /code/ non-portable (for example, on my
> system long is twice the size of float) but there are also endian issues
> that make the output non-portable.

No, there aren't any endianess-issues that makes the example non-por-
table because the long as well as the float have the same endianess.

blt_...@48h2n_3.edu

unread,
May 19, 2019, 5:21:08 AM5/19/19
to
That is a good point, I should have used int. Its almost always the same
size as a float. But nontheless, the point is that a float is simply a load
of bytes, there's nothing magical about it and it can be stored with no loss
of precision quite easily.

Bo Persson

unread,
May 19, 2019, 5:43:23 AM5/19/19
to
There are no guarantees about that in the language. But unless you use a
VAX (or some odd ARM systems), it might just accidentally work.

Bonita Montero

unread,
May 19, 2019, 5:45:00 AM5/19/19
to
Not because of the language. But there never has been an architecutre
that has split endianess for integers and floating points.

Bonita Montero

unread,
May 19, 2019, 5:49:13 AM5/19/19
to
> That is a good point, I should have used int. Its almost always the same
> size as a float. But nontheless, the point is that a float is simply a
> load of bytes, there's nothing magical about it and it can be stored with
> no loss of precision quite easily.

You can do this more reliable by storing it as a uint32_t. I think
there isn't any platform that is or will be supported by a current
C++-compiler and where the floating-point-types aren't IEEE-754-com-
patible, although there are some specialities like the Arduino-plat-
form where fload and double are both single-precision values.

Bo Persson

unread,
May 19, 2019, 6:36:13 AM5/19/19
to
Except for the ones I mentioned. :-)


https://en.wikipedia.org/wiki/Endianness#Floating_point

Alf P. Steinbach

unread,
May 19, 2019, 7:31:04 AM5/19/19
to
:-o Learned something new today, assuming that Wikipedia is correct here.

Wow, holy mackerel!


Cheers!,

- Alf

Ben Bacarisse

unread,
May 19, 2019, 7:40:16 AM5/19/19
to
That may be widely true now, but I recall problems in the past. I think
the VAX copied the PDP-11's strange mixed-endian order for 32-bit floats
whilst "fixing" the mixed byte-order for longs that the PDP-11 used.
Mind you, it might have been the other way round, with the PDP-11 having
a plain byte ordering for floats, but remember a miss-match when
transferring data in the way.

--
Ben.

David Brown

unread,
May 19, 2019, 7:47:09 AM5/19/19
to
Smaller processors can have "int" that is 16-bit. The most reasonable
assumption is that a float is the same size as a uint32_t - that will be
valid on all but the strangest of targets (I think there was a Cray
machine with 64-bit "short", which would not have a uint32_t type at
all, but you are unlikely to meet one in practice). But whatever
assumptions you make, state them in a compiler-checkable manner.

And use a union for type punning.

#include <stdint.h>

static_assert(sizeof(uint32_t) == sizeof(float),
"Checking type sizes");

union raw_float { float f; uint32_t raw; };

raw_float rf;
rf.f = 1.234;
cout << hex << rf.raw << endl;


blt_...@_cuytv9ga8ib68xe.net

unread,
May 19, 2019, 12:04:06 PM5/19/19
to
I can't imagine C++ gets used too often on 8 or 16 bit machines. Even getting
all of the core non C part of the language to work would be a struggle IMO (the
virtual function table would be fun to implement for a start) and the STL and
2011+ would be next to impossible.

>all, but you are unlikely to meet one in practice). But whatever
>assumptions you make, state them in a compiler-checkable manner.
>
>And use a union for type punning.

You could, but my example was 1 line which was the point.

Marcel Mueller

unread,
May 19, 2019, 12:33:32 PM5/19/19
to
Am 18.05.19 um 12:14 schrieb blt_8o...@32fkxnh90wz2bllefiqy.co.uk:
> I didn't even know hexfloat existed. It seems a spectacularly useless
> manipulator. What on earth is the point of it?

IBM used hex float format in hardware also. I.e. the exponent counts in
hex digits. But the resulting scaling precision has some disadvantages.


Marcel

David Brown

unread,
May 19, 2019, 3:10:57 PM5/19/19
to
Well, it turns out that C++ /is/ used regularly on such systems. People
tend to be careful about the use of some C++ features - exceptions and
RTTI are usually disabled, for example, and virtual functions will only
be used if they are really useful. And use of standard containers
(except perhaps std::array) will be limited. But there is a lot of C++
used nonetheless. (A great many C++11 features are zero cost at
run-time, at least with a reasonable compiler.)

>
>> all, but you are unlikely to meet one in practice). But whatever
>> assumptions you make, state them in a compiler-checkable manner.
>>
>> And use a union for type punning.
>
> You could, but my example was 1 line which was the point.
>

One /incorrect/ line is not very useful. It is better to have a few
lines that work than a single line that does not.

(And that's not just theory - faffing around with casting pointer types
as you did does not work on anything but the most limited of compilers,
and usually only with optimisations disabled.)

Alf P. Steinbach

unread,
May 19, 2019, 5:22:05 PM5/19/19
to
C supports type punning via unions.

C++ does not.

C++ does not, in the strictest interpretation of the formal, support
general type punning except by way of `memcpy`.

I disagree with that interpretation: it's totally impractical, so IMO it
can't be the /intent/.

However, the GCC folks have generally adopted the most silly,
impractically rigid literal interpretations of the formal rules, so I
would absolutely not be surprised if they assume the aforementioned one
too. I don't know of any way to tell the C++ compiler that look, these
two pointers are of different types but access the same bytes in memory.
And the thing about UB, even purely formal UB, is that the compiler can
/assume/ that it will not happen, e.g. it can assume that the code
dereferencing that 2nd pointer and accessing the pointee, will never be
executed. Then the compiler can optimize it away. That's what g++ does
in a number of cases, so perhaps also in this one, if one's unlucky.


Cheers!,

- Alf

Sal LO

unread,
May 19, 2019, 5:57:31 PM5/19/19
to

Paavo Helde

unread,
May 20, 2019, 12:19:51 AM5/20/19
to
And what makes this so impractical? IMO, a 'memcpy' is a great way to
tell the reader that one is doing something hackish. Also, memcpy is
recognized specially by compilers and can be optimized away where
appropriate.

You are against 'memcpy' only because you are *used to* do the type
punning by some other way and are now reluctant to change your habits.

From https://en.cppreference.com/w/cpp/language/reinterpret_cast :

"The purpose of strict aliasing and related rules is to enable
type-based alias analysis, which would be decimated if a program can
validly create a situation where two pointers to unrelated types (e.g.,
an int* and a float*) could simultaneously exist and both can be used to
load or store the same memory."

In C++20 we will have std::bit_cast() which will make the intent even
clearer than 'memcpy'.


David Brown

unread,
May 20, 2019, 2:15:41 AM5/20/19
to
This is actually not allowed in C90, IIRC.

>
> C++ does not.

You are right, I should have thought of that.

Using unions here is not strictly portable, but works on all practical
compilers. (Let me know if I am wrong here!) The code could never be
fully portable anyway.

>
> C++ does not, in the strictest interpretation of the formal, support
> general type punning except by way of `memcpy`.

Or other char* access.

>
> I disagree with that interpretation: it's totally impractical, so IMO it
> can't be the /intent/.

It is very rare that you need to mess with types like this. memcpy
would have worked fine. Modern compilers would optimise memcpy away
entirely in a situation like this. (But it can be a pain for older or
weaker compilers.)

>
> However, the GCC folks have generally adopted the most silly,
> impractically rigid literal interpretations of the formal rules, so I
> would absolutely not be surprised if they assume the aforementioned one
> too.

Nonsense - that's just an excuse people use when their incorrect code
fails on optimising compilers.

And in this case it is particularly wrong - gcc makes it clear that
"type punning via unions" is a common technique and supported by the
compiler (even without "-fno-strict-aliasing").

<https://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html>

<https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning>


> I don't know of any way to tell the C++ compiler that look, these
> two pointers are of different types but access the same bytes in memory.

There is no standard way that I know about.

(There are non-standard ways, such as gcc's "may_alias" attribute and
its "-fno-strict-aliasing" flag.)

> And the thing about UB, even purely formal UB, is that the compiler can
> /assume/ that it will not happen, e.g. it can assume that the code
> dereferencing that 2nd pointer and accessing the pointee, will never be
> executed. Then the compiler can optimize it away. That's what g++ does
> in a number of cases, so perhaps also in this one, if one's unlucky.
>

That is correct regarding UB. But if the behaviour is defined in the
documentation, it is not undefined.

clang follows gcc in such cases, but I can't be entirely sure that other
C++ compilers say they allow union-based type punning. I /believe/ they
do, but there may be exceptions.




blt_8...@eifgienqb.org

unread,
May 20, 2019, 5:19:39 AM5/20/19
to
On Sun, 19 May 2019 21:10:44 +0200
David Brown <david...@hesbynett.no> wrote:
>On 19/05/2019 18:03, blt_a937@_cuytv9ga8ib68xe.net wrote:
>> all of the core non C part of the language to work would be a struggle IMO
>(the
>> virtual function table would be fun to implement for a start) and the STL and
>
>> 2011+ would be next to impossible.
>
>Well, it turns out that C++ /is/ used regularly on such systems. People

I've worked with PICs. I've yet to see anyone use C++ on them.

>be used if they are really useful. And use of standard containers
>(except perhaps std::array) will be limited. But there is a lot of C++
>used nonetheless. (A great many C++11 features are zero cost at
>run-time, at least with a reasonable compiler.)

And a lot arn't.

>> You could, but my example was 1 line which was the point.
>>
>
>One /incorrect/ line is not very useful. It is better to have a few
>lines that work than a single line that does not.

Just because you don't like it doesn't make it incorrect.

>(And that's not just theory - faffing around with casting pointer types
>as you did does not work on anything but the most limited of compilers,
>and usually only with optimisations disabled.)

Oh rubbish. Its worked on every compiler I've ever used.

Bonita Montero

unread,
May 20, 2019, 5:20:31 AM5/20/19
to
I have to correct myself: floating-point-values are de-facto always
encoded with base 2 and every base-2-value, even with fractions, is
representable in base 10 because both share the prime-factor 2 of
the base.
So the advantage is simply that the value might be encoded shorter
than in base-10.

blt_...@5r_jdp24fz5.gov.uk

unread,
May 20, 2019, 5:23:37 AM5/20/19
to
Do you ever get the feeling that a lot of C++ people are afraid of pointers
and direct memory access? One wonders why they don't just use Java.

>In C++20 we will have std::bit_cast() which will make the intent even
>clearer than 'memcpy'.

Doesn't sound any clearer to me.

Ben Bacarisse

unread,
May 20, 2019, 7:29:28 AM5/20/19
to
David Brown <david...@hesbynett.no> writes:

> On 19/05/2019 23:21, Alf P. Steinbach wrote:
<cut>
>> C supports type punning via unions.
>
> This is actually not allowed in C90, IIRC.

I don't think that's true. Can you say more?

>> C++ does not.
>
> You are right, I should have thought of that.

Can someone point at the bit (or bits) in the standard that make the
difference here? I find the C++ standard just big enough that I'm never
sure I've seen all the relevant parts to answer any particular question!

<cut>
>> I don't know of any way to tell the C++ compiler that look, these two
>> pointers are of different types but access the same bytes in memory.
>
> There is no standard way that I know about.

That made me sit up! I think some words have got lost. I think you and
Alf are talking about pointers /declared/ to have different "target"
types, though I am not 100% sure. Given

union { char b[sizeof (float)]; float f; } u;

u.b and &u.f are pointers of different type that access the same bytes
in memory. It would be very odd if this were not allowed. Unions /are/
the standard way to tell the compiler that different lvalue expressions
will access the same memory.

<cut>
--
Ben.

Ben Bacarisse

unread,
May 20, 2019, 7:41:27 AM5/20/19
to
Small nit: they all use a power of 2 as the base (old IBM FP used 16).
The encoding is a bit of a red-herring. If some system used a power of
3, even if encoded with base 2, the values would not be finitely
representable in base 10 (without using tricks like "0.3...").

--
Ben.

Bo Persson

unread,
May 20, 2019, 7:44:28 AM5/20/19
to
You are allowed to access the different members, but not at the same
time. C++ specifies that an "active member" is the one that was most
resently written to. That is then the only member that can be read.

https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior


A complication is that the major compilers chose to implement the C
rules as an extension (for C compatibility?). So it might work anyway...


Bo Persson

David Brown

unread,
May 20, 2019, 8:00:26 AM5/20/19
to
On 20/05/2019 11:19, blt_8...@eifgienqb.org wrote:
> On Sun, 19 May 2019 21:10:44 +0200
> David Brown <david...@hesbynett.no> wrote:
>> On 19/05/2019 18:03, blt_a937@_cuytv9ga8ib68xe.net wrote:
>>> all of the core non C part of the language to work would be a struggle IMO
>> (the
>>> virtual function table would be fun to implement for a start) and the STL and
>>
>>> 2011+ would be next to impossible.
>>
>> Well, it turns out that C++ /is/ used regularly on such systems. People
>
> I've worked with PICs. I've yet to see anyone use C++ on them.
>

PICs (the traditional PICs - not things like the PIC32) are at the
bottom end of 8-bit devices. They can barely be used with normal C.

People use C++ on the AVRs, which are 8-bit, and have good gcc support.
(But no exceptions or RTTI, and much of the C++ standard library is
missing.) Vendors like IAR have C++ for a range of different 8-bit
microcontrollers.

16-bit processors are quite rare these days, but devices like the msp430
can be very useful. And they can be programmed in C++, again using gcc
or IAR.

I am not by any means saying that C++ is the most common choice of
language for 8-bit and 16-bit devices - nor am I saying that 8-bit and
16-bit are common targets for C++. I am merely saying that quite a lot
of people successfully and regularly use C++ on such devices.


>> be used if they are really useful. And use of standard containers
>> (except perhaps std::array) will be limited. But there is a lot of C++
>> used nonetheless. (A great many C++11 features are zero cost at
>> run-time, at least with a reasonable compiler.)
>
> And a lot arn't.

Looking briefly through a C++11 feature list (from the Wikipedia article
- it's a reasonable summary), I can see that /all/ the language features
are zero cost. Lambdas, auto, constexpr, range-based for, template
aliases, literals, static assertions, etc. - all zero cost. Threading
may have indirect costs (like thread-safe static initialisation).

Some standard library changes may have a cost, but many of these
actually improve efficiency.

Can you give examples of C++11 features that are costly for small
systems, compared to how they might have been written in C or pre-C++11?
Noting, as a I said, that things like the container libraries are often
not implemented on small C++ targets - and are rarely used even if they
/are/ implemented.

>
>>> You could, but my example was 1 line which was the point.
>>>
>>
>> One /incorrect/ line is not very useful. It is better to have a few
>> lines that work than a single line that does not.
>
> Just because you don't like it doesn't make it incorrect.

It is incorrect because it is incorrect - whether you or I like it or
not. In C and C++, you can't use a pointer to one type to access data
of a different type (outside certain exceptions) - that applies even if
you cast the pointer types. In the C standards, this is in 6.5p7. I
don't know where it is expressed in the C++ standards - perhaps someone
more familiar with them can say. (And I'm sure someone will say if I'm
wrong.)

>
>> (And that's not just theory - faffing around with casting pointer types
>> as you did does not work on anything but the most limited of compilers,
>> and usually only with optimisations disabled.)
>
> Oh rubbish. Its worked on every compiler I've ever used.
>

Then you haven't used many compilers - certainly not two of the three
most used C++ compilers (gcc and clang). The details of what they do
with type alias violations depends on the exact code and the option
flags - if it is just as efficient to give you the code you apparently
expect, then they usually will. But try this code - it's simple code so
you can use <https://godbolt.org> to test it with different compilers
and options:


int foo(void) {
float f = 31.234;
*(int*) &f = 123;
return f;
}

With -O2, gcc and clang both give:

xor eax, eax
ret

clang doesn't even give you a warning, even with -Wall -Wextra enabled.

It is quite simple - do not cast the types of pointers like this. The
result will, at best, be entirely dependent on the compiler and flags.
At worst, it will work "fine" until you make other changes to other
parts of the code, and knock-on effects cause the compilation here to
change.



David Brown

unread,
May 20, 2019, 9:07:50 AM5/20/19
to
On 20/05/2019 13:29, Ben Bacarisse wrote:
> David Brown <david...@hesbynett.no> writes:
>
>> On 19/05/2019 23:21, Alf P. Steinbach wrote:
> <cut>
>>> C supports type punning via unions.
>>
>> This is actually not allowed in C90, IIRC.
>
> I don't think that's true. Can you say more?

I didn't have a C90 reference handy, but I've looked it up now. In
3.3.2.3, accessing a member of a union when a value has been stored in a
different member is implementation defined. (In C99 and C11, in
6.5.2.3, there is a footnote saying exactly how "type punning" behaves.)

It would have been better to say that type punning via unions is not
clearly allowed by the C90 standard, than to say the standard disallows
it (since it is implementation defined, not undefined).

>
>>> C++ does not.
>>
>> You are right, I should have thought of that.
>
> Can someone point at the bit (or bits) in the standard that make the
> difference here? I find the C++ standard just big enough that I'm never
> sure I've seen all the relevant parts to answer any particular question!
>

I am in the same position regarding C++ - and it doesn't help that the
standard changes every three years. My understanding is that C++
followed C90 here, and did not update with the footnote and
clarification from C99.

I would also be happy to see a clear reference from the standards.

(en.cppreference.com is clear on the matter, and they are usually right.
Look under "Explanation" in
<https://en.cppreference.com/w/cpp/language/union> )


> <cut>
>>> I don't know of any way to tell the C++ compiler that look, these two
>>> pointers are of different types but access the same bytes in memory.
>>
>> There is no standard way that I know about.
>
> That made me sit up! I think some words have got lost. I think you and
> Alf are talking about pointers /declared/ to have different "target"
> types, though I am not 100% sure. Given
>
> union { char b[sizeof (float)]; float f; } u;
>
> u.b and &u.f are pointers of different type that access the same bytes
> in memory. It would be very odd if this were not allowed. Unions /are/
> the standard way to tell the compiler that different lvalue expressions
> will access the same memory.
>

Yes, I think we are talking about slightly different things. unions
tell the compiler that you have different objects at the same address,
and thus a piece of memory can be accessed by different lvalue
expressions. They don't necessarily say what happens when you access
something that was written as a different field in the union (I can't
find a description of that anywhere in the C++14 document).

But I don't think you can take a pointer-to-float, and a pointer-to-int,
and tell the compiler that they point to the same object in any standard
way.


Paavo Helde

unread,
May 20, 2019, 9:52:55 AM5/20/19
to
On 20.05.2019 15:00, David Brown wrote:
>
> int foo(void) {
> float f = 31.234;
> *(int*) &f = 123;
> return f;
> }
>
> With -O2, gcc and clang both give:
>
> xor eax, eax
> ret
>
> clang doesn't even give you a warning, even with -Wall -Wextra enabled.

This example does not demonstrate what you think it does. If it returned
31 then you could say the compilers have "unexpectedly" optimized away
the invalid "*(int*) &f = 123;" line. However, returning 0 is perfectly
expected because writing int 123 over a float produces a float value
1.724e-43#DEN which of course gets converted to 0 in the output.

So the compiler has dutifully carried out the type punning as written,
even if it was not obliged to. What's there to complain about?

blt...@ij4x0pfb0g9yrlo3p9w.com

unread,
May 20, 2019, 11:00:28 AM5/20/19
to
On Mon, 20 May 2019 13:44:18 +0200
Bo Persson <b...@bo-persson.se> wrote:
>A complication is that the major compilers chose to implement the C
>rules as an extension (for C compatibility?). So it might work anyway...

Given how many C++ programs #include C code, any C++ compiler that didn't
follow the rules of C - as long as they didn't conflict with those of C++ -
would soon be supplanted by something else.

Ben Bacarisse

unread,
May 20, 2019, 11:37:29 AM5/20/19
to
Sure, but isn't writing to a member consider to be "access"? It is in C
parlance.

> A complication is that the major compilers chose to implement the C
> rules as an extension (for C compatibility?). So it might work
> anyway...

I'm not talking about type punning. It was the blanket ban on access
using pointers of different types the made me take note.

--
Ben.

Ben Bacarisse

unread,
May 20, 2019, 11:47:35 AM5/20/19
to
David Brown <david...@hesbynett.no> writes:

> On 20/05/2019 13:29, Ben Bacarisse wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>>> On 19/05/2019 23:21, Alf P. Steinbach wrote:
>> <cut>
>>>> C supports type punning via unions.
>>>
>>> This is actually not allowed in C90, IIRC.
>>
>> I don't think that's true. Can you say more?
>
> I didn't have a C90 reference handy, but I've looked it up now. In
> 3.3.2.3, accessing a member of a union when a value has been stored in a
> different member is implementation defined. (In C99 and C11, in
> 6.5.2.3, there is a footnote saying exactly how "type punning"
> behaves.)

I consider that to be allowed. There's no way that C can't specify the
result, so this is as "defined" a construct as it can.

> It would have been better to say that type punning via unions is not
> clearly allowed by the C90 standard, than to say the standard disallows
> it (since it is implementation defined, not undefined).

C does not use the term allowed and disallowed. Did you mean it would
have been better to make it undefined? If so, I disagree (but that's
hardly important).

>>
>>>> C++ does not.
>>>
>>> You are right, I should have thought of that.
>>
>> Can someone point at the bit (or bits) in the standard that make the
>> difference here? I find the C++ standard just big enough that I'm never
>> sure I've seen all the relevant parts to answer any particular question!
>
> I am in the same position regarding C++ - and it doesn't help that the
> standard changes every three years. My understanding is that C++
> followed C90 here, and did not update with the footnote and
> clarification from C99.
>
> I would also be happy to see a clear reference from the standards.
>
> (en.cppreference.com is clear on the matter, and they are usually right.
> Look under "Explanation" in
> <https://en.cppreference.com/w/cpp/language/union> )

That says its undefined, but the C++ standard, unlike the C one, does
not have an annex listing all UB constructs with normative references.
Oh well... I don't doubt anyone's word about this, I'd just like to see
the wording.

<cut>
--
Ben.

Öö Tiib

unread,
May 20, 2019, 12:40:31 PM5/20/19
to
In C++ the [class.union] is relatively short IIRC. Union has at most
one /active/ non-static data member at any one time.
Most generally we must use placement new expressions (and
if there was previous active member then explicit destructor calls)
to change the active members. However if the previous active
member was of standard layout then there are no need to call
(pseudo)destructor to it and when newly active member is of
standard layout then we can simply assign to it instead of
that placement new. Reading from not active member
is undefined ... besides that standard layout members that have
common initial sequence can be used to inspect each other within
limits of that common initial sequence.

For common usage the std::variant is lot more convenient to use
than union (but takes more memory) so union can be perhaps
used as performance optimization or for compatibility with
other programming languages.

Chris Vine

unread,
May 20, 2019, 12:49:56 PM5/20/19
to
On Mon, 20 May 2019 14:00:16 +0200
David Brown <david...@hesbynett.no> wrote:
> On 20/05/2019 11:19, blt_8...@eifgienqb.org wrote:
[snip]
> > Just because you don't like it doesn't make it incorrect.
>
> It is incorrect because it is incorrect - whether you or I like it or
> not. In C and C++, you can't use a pointer to one type to access data
> of a different type (outside certain exceptions) - that applies even if
> you cast the pointer types. In the C standards, this is in 6.5p7. I
> don't know where it is expressed in the C++ standards - perhaps someone
> more familiar with them can say. (And I'm sure someone will say if I'm
> wrong.)

It is §3.10/10 of C++14, and §6.10/8 of C++17. They are modelled
directly on the C equivalent.

The number of people who don't trouble themselves to understand the
strict aliasing rules of C and C++ is surprising. The standard-
conforming way of dealing with aliasing is to use memcpy() instead of a
cast. As memcpy() is an intrinsic on VS and a built-in in gcc/clang,
it will be optimized out where a cast would (but for strict aliasing)
work, and will still work where casting wouldn't (such as when casting
would result in misalignment).

An alternative if using gcc or clang is to type-pun through a union
(the code emitted will be identical to using memcpy()). As a worst
option you can use the -fno-strict-aliasing gcc/clang extension and
have all aliasing optimizations switched off. That still wouldn't save
you when alignment is an issue on the platform in question though.

Chris Vine

unread,
May 20, 2019, 1:08:35 PM5/20/19
to
On Mon, 20 May 2019 09:23:28 +0000 (UTC)
blt_f429k@5r_jdp24fz5.gov.uk wrote:
> Do you ever get the feeling that a lot of C++ people are afraid of pointers
> and direct memory access? One wonders why they don't just use Java.

That's the wrong way around. What is a lot more frightening is
incompetent programmers writing code which depends on pointer casts
without realising that their code (a) gives undefined behaviour, and
(ii) is total crap, and might break on a compiler upgrade or on a
change of optimization level.

You said elsewhere that "Its worked on every compiler I've ever used"
but it absolutely doesn't work on recent versions of gcc and clang
without the (non-standard) -fno-strict-aliasing switch, if you have
optimizations switched on.

Scott Lurndal

unread,
May 20, 2019, 1:24:32 PM5/20/19
to
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> writes:
>On Mon, 20 May 2019 09:23:28 +0000 (UTC)
>blt_f429k@5r_jdp24fz5.gov.uk wrote:
>> Do you ever get the feeling that a lot of C++ people are afraid of pointers
>> and direct memory access? One wonders why they don't just use Java.
>
>That's the wrong way around. What is a lot more frightening is
>incompetent programmers writing code which depends on pointer casts
>without realising that their code (a) gives undefined behaviour, and
>(ii) is total crap, and might break on a compiler upgrade or on a
>change of optimization level.

Perhaps if one didn't hire incompetent programmers, one would need
not worry about programmers writing "total crap"?

Ben Bacarisse

unread,
May 20, 2019, 1:26:32 PM5/20/19
to
嘱 Tiib <oot...@hot.ee> writes:
<I've cut attributions because some seem to have got lost>
>> >>>> C supports type punning via unions.
>> >>>
>> >>> This is actually not allowed in C90, IIRC.
>> >>
>> >> I don't think that's true. Can you say more?
>> >>
>> >>>> C++ does not.

I've cut the quoted text because your reply appears to be about the
above, not what I wrote (though I may have misunderstood).

<cut>
> In C++ the [class.union] is relatively short IIRC. Union has at most
> one /active/ non-static data member at any one time.
> Most generally we must use placement new expressions (and
> if there was previous active member then explicit destructor calls)
> to change the active members. However if the previous active
> member was of standard layout then there are no need to call
> (pseudo)destructor to it and when newly active member is of
> standard layout then we can simply assign to it instead of
> that placement new.

Thanks, yes, I read that part, but I could not find where a plain write
sets the active member. It may be wrapped up in other more general text
about assignment, placement new, or some such.

> Reading from not active member
> is undefined ...

Do you know where this is stated?

--
Ben.

blt_x...@x1c.ac.uk

unread,
May 20, 2019, 1:47:08 PM5/20/19
to
On Mon, 20 May 2019 18:08:19 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>On Mon, 20 May 2019 09:23:28 +0000 (UTC)
>blt_f429k@5r_jdp24fz5.gov.uk wrote:
>> Do you ever get the feeling that a lot of C++ people are afraid of pointers
>> and direct memory access? One wonders why they don't just use Java.
>
>That's the wrong way around. What is a lot more frightening is
>incompetent programmers writing code which depends on pointer casts
>without realising that their code (a) gives undefined behaviour, and
>(ii) is total crap, and might break on a compiler upgrade or on a
>change of optimization level.

Perhaps use Java. Seriously.

>You said elsewhere that "Its worked on every compiler I've ever used"
>but it absolutely doesn't work on recent versions of gcc and clang
>without the (non-standard) -fno-strict-aliasing switch, if you have
>optimizations switched on.

fenris$ cc -v
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
fenris$ cat t.c
#include <stdio.h>

int main()
{
float f = 1.234;
int i = *(int *)&f;
printf("%f\n",*(float *)&i);
return 0;
}
fenris$ cc t.c -O3
fenris$ ./a.out
1.234000


baldur$ cc --version
gcc (SUSE Linux) 7.3.1 20180323 [gcc-7-branch revision 258812]
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

baldur$ cc -O3 t.c
baldur$ ./a.out
1.234000


Sorry, what was that you were saying?


Chris Vine

unread,
May 20, 2019, 2:05:18 PM5/20/19
to
Sorry, what exactly do you think you were proving?

Your crap code with undefined behaviour looks as if it is too
inconsequential for g++ to optimize against it. gcc/g++ will however
warn that your code is non fit for purpose - it tells you that it
breaks strict aliasing rules.

Tim Rentsch

unread,
May 20, 2019, 2:20:21 PM5/20/19
to
David Brown <david...@hesbynett.no> writes:

> On 20/05/2019 13:29, Ben Bacarisse wrote:
>
>> David Brown <david...@hesbynett.no> writes:
>>
>>> On 19/05/2019 23:21, Alf P. Steinbach wrote:
>>
>> <cut>
>>
>>>> C supports type punning via unions.
>>>
>>> This is actually not allowed in C90, IIRC.
>>
>> I don't think that's true. Can you say more?
>
> I didn't have a C90 reference handy, but I've looked it up now.
> In 3.3.2.3, accessing a member of a union when a value has been
> stored in a different member is implementation defined. (In C99
> and C11, in 6.5.2.3, there is a footnote saying exactly how "type
> punning" behaves.)

Despite that, the semantics for union member access is exactly
the same in C90 as it is in C99. Discussion notes in some of the
documents in www.open-std.org (sorry I don't have any more
specific reference) make this clear. The wording in C90 saying
"implementation-defined" is there because what value results
depends on the representations of the types involved. Note that
the footnote uses the term "type punning" (without quotes), much
like the footnote in N1256. The type punning semantics is what
was (implicitly?) assumed in K&R C. If this rule had been
changed in C90, or changed between C90 and C99, such a change
surely would have been mentioned in the Rationale documents.
AFAICT there isn't any.

Tim Rentsch

unread,
May 20, 2019, 3:07:16 PM5/20/19
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> David Brown <david...@hesbynett.no> writes:
>
>> On 19/05/2019 23:21, Alf P. Steinbach wrote:
>
> <cut>
>
>>> C supports type punning via unions.
>>
>> This is actually not allowed in C90, IIRC.
>
> I don't think that's true. Can you say more?
>
>>> C++ does not.
>>
>> You are right, I should have thought of that.
>
> Can someone point at the bit (or bits) in the standard that make
> the difference here? I find the C++ standard just big enough that
> I'm never sure I've seen all the relevant parts to answer any
> particular question!

In N4659:

12.3 p1 says in part:

In a union, a non-static data member is active if its name
refers to an object whose lifetime has begun and has not
ended (6.8).

For a union member or subobject thereof, 6.8 p1 says in part:

[I]ts lifetime only begins if that union member is the
initialized member in the union (11.6.1, 15.6.2), or as
described in 12.3.

12.3 p5 says in part:

In an assignment expression of the form E1 = E2 that uses
either the built-in assignment operator (8.18) or a trivial
assignment operator (15.8), for each element X of S(E1), if
modification of X would have undefined behavior under 6.8,
an object of the type of X is implicitly created in the
nominated storage; no initialization is performed and the
beginning of its lifetime is sequenced after the value
computation of the left and right operands and before the
assignment. [ Note: This ends the lifetime of the
previously-active member of the union, if any (6.8). --end
note ]

To see the note is right, we return to section 6.8 p1, which says
in part:

The lifetime of an object o of type T ends when: [...]
(1.4) -- the storage which the object occupies is released,
or is reused by an object that is not nested within o (4.5)

Note also 6.8 p5, which says in part:

A program may end the lifetime of any object by reusing the
storage which the object occupies [...]


So assigning to a union member, which reuses the storage of any
other (non-static) members in the same union, ends their
lifetimes, which consequently makes them not be active. Does
this mean trying to read them is undefined behavior? Yes, it
does, under the general rule that objects may not be accessed
after their lifetimes are ended. These cases are spelled out
in excruciating detail in section 6.8, paragraphs 4, 6, and 7.


Section 6.8 p4 says in part:

The properties ascribed to objects and references throughout
this International Standard apply for a given object or
reference only during its lifetime. [ Note: In particular,
[...] there are significant restrictions on the use of the
object, as described below [...] --end note ]

Section 6.8 p6 says in part:

[A]fter the lifetime of an object has ended [...] any
pointer that represents the address of the storage location
where the object will be or was located may be used but only
in limited ways. [...] The program has undefined behavior
if: [...]
(6.2) -- the pointer is used to access a non-static data
member or call a non-static member function of the object,
[...]

Section 6.8 p7 says in part:

Similarly, [...] after the lifetime of an object has ended
[...], any glvalue that refers to the original object may be
used but only in limited ways. [...] The program has
undefined behavior if:
(7.1) -- the glvalue is used to access the object, [...]


Taken together I think these passages make the case pretty
airtight.

Manfred

unread,
May 20, 2019, 3:38:41 PM5/20/19
to
On 5/20/19 7:26 PM, Ben Bacarisse wrote:
> Öö Tiib <oot...@hot.ee> writes:
> <I've cut attributions because some seem to have got lost>
>>>>>>> C supports type punning via unions.
>>>>>>
>>>>>> This is actually not allowed in C90, IIRC.
>>>>>
>>>>> I don't think that's true. Can you say more?
>>>>>
>>>>>>> C++ does not.
>
> I've cut the quoted text because your reply appears to be about the
> above, not what I wrote (though I may have misunderstood).
>
> <cut>
<recut>

>> Reading from not active member
>> is undefined ...
>
> Do you know where this is stated?
>

I think it is a consequence of the first sentence:
9.5 p1:
"...at most one of the non-static data members can be active at any
time, that is, the value of at most one of the non-static data members
can be stored in a union at any time."

Which would imply that reading from a non-active member is the same as
reading from an uninitialized variable.

But, 9.2 p19 explicitly allows the case of reading the common initial
sequence.

David Brown

unread,
May 20, 2019, 3:49:41 PM5/20/19
to
You are right - sorry for the bad example. The principle is right,
however - the casting here is bad code with undefined behaviour and the
compiler can optimise it as it wants. (It is free, for example, to
reduce "foo" to a single "ret" instruction.) And it is certainly the
case that gcc and clang use type-based alias analysis to optimise code,
and it is certainly the case that this happens sometimes in real life
code. I've seen it cause trouble in my own code once, on a platform
where "int32_t" is "long int" and therefore incompatible with "int"
despite both being 32-bit. (I did not say I like these rules!).

But I have to try a little harder to make a clear example:

int foo(float* f, int *i) {
*i = 1;
*f = 0.0f;

return *i;
}

int bar(void) {
int x = 0;

x = foo((float*) &x, &x);

return x;
}

With -O2, gcc and clang generate:

mov eax, 1
ret

With -O1, or "-O2 -fno-strict-aliasing", they return 0.

David Brown

unread,
May 20, 2019, 3:55:36 PM5/20/19
to
On 20/05/2019 17:47, Ben Bacarisse wrote:
> David Brown <david...@hesbynett.no> writes:
>
>> On 20/05/2019 13:29, Ben Bacarisse wrote:
>>> David Brown <david...@hesbynett.no> writes:
>>>
>>>> On 19/05/2019 23:21, Alf P. Steinbach wrote:
>>> <cut>
>>>>> C supports type punning via unions.
>>>>
>>>> This is actually not allowed in C90, IIRC.
>>>
>>> I don't think that's true. Can you say more?
>>
>> I didn't have a C90 reference handy, but I've looked it up now. In
>> 3.3.2.3, accessing a member of a union when a value has been stored in a
>> different member is implementation defined. (In C99 and C11, in
>> 6.5.2.3, there is a footnote saying exactly how "type punning"
>> behaves.)
>
> I consider that to be allowed. There's no way that C can't specify the
> result, so this is as "defined" a construct as it can.

Did you mean "there's no way that C /can/ specify the result" ? If so,
then that is not quite true - C99 (and C11) specify it better, even
though the final result is still implementation dependent.

>
>> It would have been better to say that type punning via unions is not
>> clearly allowed by the C90 standard, than to say the standard disallows
>> it (since it is implementation defined, not undefined).
>
> C does not use the term allowed and disallowed. Did you mean it would
> have been better to make it undefined? If so, I disagree (but that's
> hardly important).

Sorry, I meant it would have been better if /I/ had said "type punning
via unions is not clearly allowed by the C90 standard". I did not mean
to suggest the standard should be different.

(I do think the C99 wording is better, however - even if it is a
clarification rather than a change.)

>
>>>
>>>>> C++ does not.
>>>>
>>>> You are right, I should have thought of that.
>>>
>>> Can someone point at the bit (or bits) in the standard that make the
>>> difference here? I find the C++ standard just big enough that I'm never
>>> sure I've seen all the relevant parts to answer any particular question!
>>
>> I am in the same position regarding C++ - and it doesn't help that the
>> standard changes every three years. My understanding is that C++
>> followed C90 here, and did not update with the footnote and
>> clarification from C99.
>>
>> I would also be happy to see a clear reference from the standards.
>>
>> (en.cppreference.com is clear on the matter, and they are usually right.
>> Look under "Explanation" in
>> <https://en.cppreference.com/w/cpp/language/union> )
>
> That says its undefined, but the C++ standard, unlike the C one, does
> not have an annex listing all UB constructs with normative references.
> Oh well... I don't doubt anyone's word about this, I'd just like to see
> the wording.
>

Agreed.

David Brown

unread,
May 20, 2019, 3:58:04 PM5/20/19
to
I don't think that all the changes in C99 are covered in the rationale
documents (at least, not that I have seen). However, I am happy to
believe that the intended behaviour for unions has not changed between
C90 and C99, and it is merely the wording that has been made clearer.

David Brown

unread,
May 20, 2019, 4:02:05 PM5/20/19
to
On 20/05/2019 21:06, Tim Rentsch wrote:

<snip detailed explanation and references>

>
> Taken together I think these passages make the case pretty
> airtight.
>

Thank you. I can see why I failed to find a simple, clear reference
covering this case!

Ben Bacarisse

unread,
May 20, 2019, 4:17:10 PM5/20/19
to
Ah, thank you so much. I hope this did not take too long. I thought it
would be the result of a number of passages but it never occurred me to
to look at lifetime.

--
Ben.

Ben Bacarisse

unread,
May 20, 2019, 4:22:00 PM5/20/19
to
David Brown <david...@hesbynett.no> writes:

> On 20/05/2019 17:47, Ben Bacarisse wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>>> On 20/05/2019 13:29, Ben Bacarisse wrote:
>>>> David Brown <david...@hesbynett.no> writes:
>>>>
>>>>> On 19/05/2019 23:21, Alf P. Steinbach wrote:
>>>> <cut>
>>>>>> C supports type punning via unions.
>>>>>
>>>>> This is actually not allowed in C90, IIRC.
>>>>
>>>> I don't think that's true. Can you say more?
>>>
>>> I didn't have a C90 reference handy, but I've looked it up now. In
>>> 3.3.2.3, accessing a member of a union when a value has been stored in a
>>> different member is implementation defined. (In C99 and C11, in
>>> 6.5.2.3, there is a footnote saying exactly how "type punning"
>>> behaves.)
>>
>> I consider that to be allowed. There's no way that C can't specify the
>> result, so this is as "defined" a construct as it can.
>
> Did you mean "there's no way that C /can/ specify the result" ?

<sigh> Yes I did. Far too many of my typos negate my meaning.

> If so, then that is not quite true - C99 (and C11) specify it better,
> even though the final result is still implementation dependent.

Sure, they are clearer in a footnote (I take it you refer to the
"reinterpret the bits" footnote). Given the context (the late 80s) I
don't think anyone was ever in much doubt about what the implementation
defined result would be. Those were simpler times!

<cut>
--
Ben.

Ben Bacarisse

unread,
May 20, 2019, 4:24:35 PM5/20/19
to
Manfred <non...@invalid.add> writes:

> On 5/20/19 7:26 PM, Ben Bacarisse wrote:
>> 嘱 Tiib <oot...@hot.ee> writes:
>> <I've cut attributions because some seem to have got lost>
>>>>>>>> C supports type punning via unions.
>>>>>>>
>>>>>>> This is actually not allowed in C90, IIRC.
>>>>>>
>>>>>> I don't think that's true. Can you say more?
>>>>>>
>>>>>>>> C++ does not.
>>
>> I've cut the quoted text because your reply appears to be about the
>> above, not what I wrote (though I may have misunderstood).
>>
>> <cut>
> <recut>
>
>>> Reading from not active member
>>> is undefined ...
>>
>> Do you know where this is stated?
>>
>
> I think it is a consequence of the first sentence:
> 9.5 p1:
> "...at most one of the non-static data members can be active at any
> time, that is, the value of at most one of the non-static data members
> can be stored in a union at any time."
>
> Which would imply that reading from a non-active member is the same as
> reading from an uninitialized variable.

I didn't think that was quite enough, though it should have been enough
to prompt me to look for some more details. As you say it implies as
much as it says! Tim has very helpfully provided a thorough collection
of quotes.

--
Ben.

Bonita Montero

unread,
May 21, 2019, 1:46:01 AM5/21/19
to
> That's the wrong way around. What is a lot more frightening is
> incompetent programmers writing code which depends on pointer casts
> without realising that their code (a) gives undefined behaviour, ...

... in theory.

Fred.Zwarts

unread,
May 21, 2019, 3:25:21 AM5/21/19
to
"Chris Vine" schreef in bericht
news:20190520174928.c77ab71214afbd6375a5bbce@cvine--nospam--.freeserve.co.uk...
>
>On Mon, 20 May 2019 14:00:16 +0200
>David Brown <david...@hesbynett.no> wrote:
>> On 20/05/2019 11:19, blt_8...@eifgienqb.org wrote:
>[snip]
>> > Just because you don't like it doesn't make it incorrect.
>>
>> It is incorrect because it is incorrect - whether you or I like it or
>> not. In C and C++, you can't use a pointer to one type to access data
>> of a different type (outside certain exceptions) - that applies even if
>> you cast the pointer types. In the C standards, this is in 6.5p7. I
>> don't know where it is expressed in the C++ standards - perhaps someone
>> more familiar with them can say. (And I'm sure someone will say if I'm
>> wrong.)
>
>It is §3.10/10 of C++14, and §6.10/8 of C++17. They are modelled
>directly on the C equivalent.
>
>The number of people who don't trouble themselves to understand the
>strict aliasing rules of C and C++ is surprising. The standard-
>conforming way of dealing with aliasing is to use memcpy() instead of a
>cast. As memcpy() is an intrinsic on VS and a built-in in gcc/clang,
>it will be optimized out where a cast would (but for strict aliasing)
>work, and will still work where casting wouldn't (such as when casting
>would result in misalignment).

I need type punning often when accessing device registers. E.g., for a given
VME address it makes a difference whether it is accessed in D8 mode (byte
access), D16 (16-bit) mode or D32 (32-bit) mode. memcpy for device registers
is a bad idea, because it is not defined what bit-size will be used for the
copy.
I have the feeling that only type punning, in combination with a volatile
declaration is a good method for this purpose. But I am not sure that it is
well defined in the C++ standard and that it is portable across different
platforms. Fortunately, it works on the platforms that I use.
Or is there a method that is clearly supported by the C++ standard?

Öö Tiib

unread,
May 21, 2019, 4:16:50 AM5/21/19
to
On Monday, 20 May 2019 20:26:32 UTC+3, Ben Bacarisse wrote:
> Öö Tiib <oot...@hot.ee> writes:
> <I've cut attributions because some seem to have got lost>
> >> >>>> C supports type punning via unions.
> >> >>>
> >> >>> This is actually not allowed in C90, IIRC.
> >> >>
> >> >> I don't think that's true. Can you say more?
> >> >>
> >> >>>> C++ does not.
>
> I've cut the quoted text because your reply appears to be about the
> above, not what I wrote (though I may have misunderstood).

If something (on current case what exactly is the puzzle) was
misunderstood then it was most likely me. I have some knowledge
but bad communication skills.
I know that C++ does not allow type punning through union. I have
read it out from each version of standard and also from public
communications and discussions of committee members. What
compiler optimizations that is meant for is unclear but so it is.

> <cut>
> > In C++ the [class.union] is relatively short IIRC. Union has at most
> > one /active/ non-static data member at any one time.
> > Most generally we must use placement new expressions (and
> > if there was previous active member then explicit destructor calls)
> > to change the active members. However if the previous active
> > member was of standard layout then there are no need to call
> > (pseudo)destructor to it and when newly active member is of
> > standard layout then we can simply assign to it instead of
> > that placement new.
>
> Thanks, yes, I read that part, but I could not find where a plain write
> sets the active member.

Sorry I take some online version of draft ... since I don't have
the books under hand right now. There it is not plain write but
assignment operator either built in or trivial. That
http://eel.is/c++draft/class.union#5
of [class.union] http://eel.is/c++draft/class.union
Otherwise, when there are no trivial ways to start lifetime of member
(to make it active) then we have to use placement new.
http://eel.is/c++draft/class.union#6

> It may be wrapped up in other more general text
> about assignment, placement new, or some such.

Indeed. They always try to keep one thing said only in one place and
so it is painful to find sometimes where the one place is.

>
> > Reading from not active member
> > is undefined ...
>
> Do you know where this is stated?

In [basic.life]
http://eel.is/c++draft/basic.life#7
Special guarantee that overrules [basic.life] is about common initial
sequence: http://eel.is/c++draft/class.union#1

blt_14...@2mu00w.co.uk

unread,
May 21, 2019, 4:26:56 AM5/21/19
to
Are you having a slow brain day or something? You said it wouldn't work on
new compilers with optimisation. I just proved it did just as I've said it
works on every compiler I've ever tried it on.

>Your crap code with undefined behaviour looks as if it is too
>inconsequential for g++ to optimize against it. gcc/g++ will however

LOL, oh please, give it up before you make a complete fool of yourself :)

>warn that your code is non fit for purpose - it tells you that it
>breaks strict aliasing rules.

Clang doesn't and clang is a better compiler all round IMO.

Chris Vine

unread,
May 21, 2019, 6:49:31 AM5/21/19
to
The problem with aliasing arising from type punning concerns
dereferencing pointers which do not represent the "dynamic type" (in C
the "effective type") of the object being pointed to. The compiler is
entitled to assume that the object obtain by dereferencing, say, an
int* is actually an int and not a float. Does your case fall foul of
this?

A memcpy() is just another form of assignment: it so happens that
with a cast the dynamic type of the result of the cast remains the
source type, but with memcpy() it becomes the type of the destination.
However I can see that memcpy() might be problematic with device
registers because I don't think it has any atomicity guarantees. I
guess in that case using a union with volatile members might be the
answer: it is probably supported by your compiler. I don't actually
know what the standard says about unions with volatile members - when I
get a chance I must look it up.

Chris

Chris Vine

unread,
May 21, 2019, 6:49:31 AM5/21/19
to
The "it" which doesn't work is type punning through casting pointers.
Your silly toy code with undefined behaviour proves absolutely
nothing. An example has already been given up-thread of the
differences which can arise in code emitted, depending on whether the
-fno-strict-aliasing switch is applied or not.

> >Your crap code with undefined behaviour looks as if it is too
> >inconsequential for g++ to optimize against it. gcc/g++ will however
>
> LOL, oh please, give it up before you make a complete fool of yourself :)

I think you are the only one doing that.

> >warn that your code is non fit for purpose - it tells you that it
> >breaks strict aliasing rules.
>
> Clang doesn't and clang is a better compiler all round IMO.

So the standard says clearly that it is undefined behaviour but you say
"ignore that because although gcc warns that it breaks strict aliasing,
clang doesn't". If that is your approach to programming then "crap
code" seems like too mild a description.

Bart

unread,
May 21, 2019, 7:35:14 AM5/21/19
to
On 21/05/2019 11:49, Chris Vine wrote:
> On Tue, 21 May 2019 08:26:45 +0000 (UTC)

>> Are you having a slow brain day or something? You said it wouldn't work on
>> new compilers with optimisation. I just proved it did just as I've said it
>> works on every compiler I've ever tried it on.
>
> The "it" which doesn't work is type punning through casting pointers.
> Your silly toy code with undefined behaviour proves absolutely
> nothing.

Undefined behaviour because the language says so.

>> Clang doesn't and clang is a better compiler all round IMO.
>
> So the standard says clearly that it is undefined behaviour

Only because the language says so.

but you say
> "ignore that because although gcc warns that it breaks strict aliasing,
> clang doesn't". If that is your approach to programming then "crap
> code" seems like too mild a description.

It's only 'crap code' because the language says so.

My opinion is that such code can be valid, and it can be well-defined
(within a range of machines that might be the only ones you're
interested in). Or the behaviour might be specific to that a range of
machines. But that's OK because we're low-level programmers and we know
what we're doing, right?

My approach is not to use C or C++, partly because all their silly rules
on UB (which seem to only exist to enable extreme optimisations) make
life harder.

And to use alternative languages. But if those other languages can
successfully run the same code on the same machines without UB, then
what are C and C++ playing at?

This is a recent bit of code I used (expressed here as C, and inside a
struct definition):

...
union { // anonymous union
int32_t modelist[4];
int32_t mode;
};

I want to be able to access (read or write) the first 4 bytes of that
union interchangeably as either .mode or .modelist[0], including writing
as .modelist[0] then reading immediately as .mode.

Isn't that technically UB in C or C++? I don't know, but the important
thing is that I don't need to care!

As for type-punning, in the alternate language I use it is an official
feature! int->float type-punning, in C-like syntax, might be written as
(float@)a, where a is an int, and it can also work as (float@)(a+b).

(The C/C++ idiom would be *(float*)&a which only works on lvalues.)

I don't see it as being anything different from this:

a: dd 0 # 32-bit location
mov [a],eax # write 32-bit int
movd xmm0,[a] # read as 32-bit float

What does the code mean? Well if eax contained 0x3F800000, then it's
writing the binary representation of the IEEE float32 value 1.0.

And the [a] could be [esi] where esi contains a pointer that is
interpreted as int32_t* then float32* on successive lines.

All perfectly reasonable things that you might want to do.

Fred.Zwarts

unread,
May 21, 2019, 7:44:01 AM5/21/19
to
"Chris Vine" schreef in bericht
news:20190521114749.d756ce67e9563bfa8c3fd2da@cvine--nospam--.freeserve.co.uk...
Device registers normally do not use floating point types. They usually
contain integer values, or bit patterns. But even if they contain a floating
point type, it may not match the format of the host system, so one has to
separate the manitissa and the exponent and construct a floating point value
from it.
I always use pointers to uint8_t, uint16_t, uint32_t or uint64_t to access
such registers. I don't think that will be a problem.

Jorgen Grahn

unread,
May 21, 2019, 8:29:26 AM5/21/19
to
On Mon, 2019-05-20, Chris Vine wrote:

> The number of people who don't trouble themselves to understand the
> strict aliasing rules of C and C++ is surprising.

"Don't trouble themselves" is a good way of putting it, because it's
not /hard/ to grasp. If you think of memory as a store for /typed/
objects, plus the extra accomodations for char* and unions, you have
the rough picture.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

David Brown

unread,
May 21, 2019, 9:36:12 AM5/21/19
to
On 21/05/2019 13:35, Bart wrote:
> On 21/05/2019 11:49, Chris Vine wrote:
>> On Tue, 21 May 2019 08:26:45 +0000 (UTC)
>
>>> Are you having a slow brain day or something? You said it wouldn't
>>> work on
>>> new compilers with optimisation. I just proved it did just as I've
>>> said it
>>> works on every compiler I've ever tried it on.
>>
>> The "it" which doesn't work is type punning through casting pointers.
>> Your silly toy code with undefined behaviour proves absolutely
>> nothing.
>
> Undefined behaviour because the language says so.

Yes, exactly.

>
>>> Clang doesn't and clang is a better compiler all round IMO.
>>
>> So the standard says clearly that it is undefined behaviour
>
> Only because the language says so.

Yes.

It's fine to say you don't like this behaviour, and think that it should
be possible to access data via pointers of any kind. It's fine to say
that you choose to use compilers that implement such semantics - whether
it is always the case in the compiler you use, or whether you need a
switch like "-fno-strict-aliasing" to get that behaviour. You would be
far from alone here.

What is /not/ fine is to say that because /you/ think type-based alias
analysis should not be used for optimisation, it's okay to write code
that messes around with accessing data via incompatible types.

If someone wants to write C code, they should write C code. If you want
to write code in a variant of C with additional semantics, then you need
to take appropriate steps to ensure that you use it with tools that
guarantee those semantics. (And by that, I mean a tool that documents
the behaviour - not a tool that "worked on your test code".) There is
nothing wrong with writing code that is restricted like this, but you
should be clear about it.


None of this is new. The C language standards have been clear since C90
that accessing data via incompatible types is undefined behaviour. gcc
introduced optimisations on type-based alias analysis around 20 years
ago, and they did so because that's what many top-range C and C++
compilers did.


>
>  but you say
>> "ignore that because although gcc warns that it breaks strict aliasing,
>> clang doesn't".  If that is your approach to programming then "crap
>> code" seems like too mild a description.
>
> It's only 'crap code' because the language says so.
>

Yes - but since the discussion is about C or C++, it is those languages
and their standards that matter. The standards say accessing an "float"
object through an "int*" pointer is undefined behaviour - therefore it
is undefined behaviour in C and C++, and if you write such code, it is
crap C or C++ code. Particular implementations are free to give a
definition for it - code can be crap C++ code while being valid "MSVC"
code or valid "gcc -fno-strict-aliasing" code.

> My opinion is that such code can be valid, and it can be well-defined
> (within a range of machines that might be the only ones you're
> interested in). Or the behaviour might be specific to that a range of
> machines. But that's OK because we're low-level programmers and we know
> what we're doing, right?

If you use compilers that define the semantics of this kind of
cross-type access, that's fine. If you are writing more general C or
C++, or using other compilers, it is not fine.

>
> My approach is not to use C or C++, partly because all their silly rules
> on UB (which seem to only exist to enable extreme optimisations) make
> life harder.
>

I fail to see how type-based alias analysis makes anything harder. In
general, optimisations mean that you can write code in a clearer,
simpler and more maintainable manner and let the compiler handle the
efficiency details.

I mean, how often do you actually want to access float data via an int*
pointer? In real code, it's very rare that this sort of thing crops up.
Being rare, there is no problem using valid alternatives like unions or
memcpy - both of which can be tightly optimised by a compiler.

> And to use alternative languages. But if those other languages can
> successfully run the same code on the same machines without UB, then
> what are C and C++ playing at?

As always, you miss the point of undefined behaviour. And as always,
you grossly overstate its importance. In the great majority of cases,
things that are undefined behaviour in C or C++ are things that would
not turn up in correct, sensible code in the first place.


>
> This is a recent bit of code I used (expressed here as C, and inside a
> struct definition):
>
>     ...
>     union {              // anonymous union
>         int32_t modelist[4];
>         int32_t mode;
>     };
>
> I want to be able to access (read or write) the first 4 bytes of that
> union interchangeably as either .mode or .modelist[0], including writing
> as .modelist[0] then reading immediately as .mode.
>
> Isn't that technically UB in C or C++? I don't know, but the important
> thing is that I don't need to care!

That will be valid in C, but (AFAIUI) invalid in C++. However, as
others have noted, practical C++ compilers will define this behaviour to
have the same meaning as in C. (This is not type punning, since mode
and modelist[0] have the same types.)

>
> As for type-punning, in the alternate language I use it is an official
> feature! int->float type-punning, in C-like syntax, might be written as
> (float@)a, where a is an int, and it can also work as (float@)(a+b).
>

And how often is this actually relevant in real code? Or is this just
another one of your "features" whose only purpose is to let you pretend
your language is "better" than C?

Just because something is easy to specify and implement does not mean it
is useful.

David Brown

unread,
May 21, 2019, 9:45:34 AM5/21/19
to
That is strange. I work with device registers all the time, and I
rarely use type punning.

The usual way to access hardware registers is via "volatile uint32_t *"
or similar pointers, with the size you want to use. Often you use
structs rather than individual pointers, but it boils down to the same
kind of volatile accesses.

"Type punning" is when you have told the compiler that object A is of
type T, and you know want to access it while pretending it is type U.
You are not doing that, as far as I can tell.

> E.g., for a
> given VME address it makes a difference whether it is accessed in D8
> mode (byte access), D16 (16-bit) mode or D32 (32-bit) mode. memcpy for
> device registers is a bad idea, because it is not defined what bit-size
> will be used for the copy.
> I have the feeling that only type punning, in combination with a
> volatile declaration is a good method for this purpose. But I am not
> sure that it is well defined in the C++ standard and that it is portable
> across different platforms. Fortunately, it works on the platforms that
> I use.
> Or is there a method that is clearly supported by the C++ standard?

Exactly how "volatile" works, and in particular for accesses via an
absolute address cast to a pointer-to-volatile, is not as clearly
defined as it could be in C and C++. C17 clarifies it - maybe newer C++
standards inherit this improvement. But all compilers have implemented
it in the same obvious manner.

Juha Nieminen

unread,
May 21, 2019, 9:52:20 AM5/21/19
to
blt_8o...@32fkxnh90wz2bllefiqy.co.uk wrote:
> I didn't even know hexfloat existed. It seems a spectacularly useless
> manipulator. What on earth is the point of it?

If you save a floating point value in ascii usint the normal decimal
representation, in many (perhaps even most) cases there's a high
chance of losing accuracy when it's read back, for the simple
reason that the base-10 representation cannot accurately represent
every single base-2 floating point value.

Base-16 representation, however, can. It exactly represents the
original floating point value, to the last bit, and nothing is
lost in the conversion to either direction.

Its advantage is that it's agnostic to the actual floating point
value binary representation in the hardware (eg. it doesn't
assume that it's an IEEE floating point value of a given size).
Thus exact floating point values can be transferred between
computers that may use different native floating point
formats.

If you save the floating point bits as raw data, you'll at the
very least run into the problem of endianess, and of course you'll
be assuming that both the source and target architectures use the
exact same internal floating point bit representation.

Bart

unread,
May 21, 2019, 10:46:00 AM5/21/19
to
On 21/05/2019 14:36, David Brown wrote:
> On 21/05/2019 13:35, Bart wrote:

>> It's only 'crap code' because the language says so.
>>
>
> Yes - but since the discussion is about C or C++, it is those languages
> and their standards that matter.

But how can code that expresses exactly the same thing be fine in one
language and not in another?

The standards say accessing an "float"
> object through an "int*" pointer is undefined behaviour - therefore it
> is undefined behaviour in C and C++, and if you write such code, it is
> crap C or C++ code. Particular implementations are free to give a
> definition for it - code can be crap C++ code while being valid "MSVC"
> code or valid "gcc -fno-strict-aliasing" code.

And here apparently the same code can be also be fine in a particular
dialect of C or C++, or even using a particular set of compiler options,
but be 'crap' when someone changes the compiler or option.


> I mean, how often do you actually want to access float data via an int*
> pointer? In real code, it's very rare that this sort of thing crops up.
> Being rare, there is no problem using valid alternatives like unions or
> memcpy - both of which can be tightly optimised by a compiler.

In my own codebase I seem to use cast-based type-punning 5-10 times per
application, But union-based type-punning is used all the time.
Cast-based type-punning is convenient when there is no struct or union
involved.

It is very frequent that I'm interested in interpreting the bytes of a
float as some integer value, or vice versa.

(Here's an example using that non-C language; this prints the underlying
binary bits of 0.1 which is a 64-bit float:

print int64@(0.1):"b"

Output is:

11111110111001100110011001100110011001100110011001100110011010

No faffing about with unions or *(int64_t*)&x casts (which won't work on
0.1), which apparently have UB anyway, or memcpy.)

> And how often is this actually relevant in real code? Or is this just
> another one of your "features" whose only purpose is to let you pretend
> your language is "better" than C?

It's 'better' in that it allows this sort of obvious stuff that people
want to write, while C doesn't.

Chris Vine

unread,
May 21, 2019, 11:04:55 AM5/21/19
to
On Tue, 21 May 2019 12:35:02 +0100
Bart <b...@freeuk.com> wrote:
> On 21/05/2019 11:49, Chris Vine wrote:
> > On Tue, 21 May 2019 08:26:45 +0000 (UTC)
> > blt_14...@2mu00w.co.uk wrote:
> >> Are you having a slow brain day or something? You said it wouldn't work on
> >> new compilers with optimisation. I just proved it did just as I've said it
> >> works on every compiler I've ever tried it on.
> >
> > The "it" which doesn't work is type punning through casting pointers.
> > Your silly toy code with undefined behaviour proves absolutely
> > nothing.
>
> Undefined behaviour because the language says so.
>
> >> Clang doesn't and clang is a better compiler all round IMO.
> >
> > So the standard says clearly that it is undefined behaviour
>
> Only because the language says so.
>
> but you say
> > "ignore that because although gcc warns that it breaks strict aliasing,
> > clang doesn't". If that is your approach to programming then "crap
> > code" seems like too mild a description.
>
> It's only 'crap code' because the language says so.

Indeed so. It's crap because the standard says it is crap.

> My opinion is that such code can be valid, and it can be well-defined
> (within a range of machines that might be the only ones you're
> interested in). Or the behaviour might be specific to that a range of
> machines. But that's OK because we're low-level programmers and we know
> what we're doing, right?

It is apparent from this thread that at least one person does not know
what he is doing.

You are entitled to your _opinion_ if you are going to write your own
language. I think Alf also shares your view, but recognises (sadly)
that it's not how C and C++ in fact are. Our religious spammer, whilst
initially being ignorant of the strict aliasing rules, has said he has
the same view as you about type punning via pointers and now permits it
in his language (at present mainly vapourware I think, but I may be
wrong). Presumably he is only intending to have his language used on a
platform where alignment is not an issue or will have all objects
padded to the most generic alignment. But we have been talking in this
thread about C and C++. The standards for those languages determine
whether the code is valid C or C++, and the standards say it isn't.

Also, your _opinion_ might be valid if you are writing your own
compiler: you could decide to provide type punning extensions, as
indeed gcc has via union's and/or its -fno-strict-aliasing switch. But
then you cannot call the code valid C or C++. Instead it is code
conforming to your opinion.

> My approach is not to use C or C++, partly because all their silly rules
> on UB (which seem to only exist to enable extreme optimisations) make
> life harder.

What makes you say that memcpy() is harder than making a
reinterpret_cast? I don't see it. It seems to me that the main reason
people don't do it correctly is not the difficulty but that they just
can't be bothered to understand the rules.

Incidentally the standards permit any object pointer to be cast to
char* and unsigned char* and dereferenced, if you want byte level
access to an object.
Possibly, but that is your language. It's not C or C++. You are
converting a discussion about whether particular code is valid C or C++
into one about what some new language of yours should permit.

Chris Vine

unread,
May 21, 2019, 11:29:17 AM5/21/19
to
On Tue, 21 May 2019 15:45:48 +0100
Bart <b...@freeuk.com> wrote:
> On 21/05/2019 14:36, David Brown wrote:
> > On 21/05/2019 13:35, Bart wrote:
>
> >> It's only 'crap code' because the language says so.
> >
> > Yes - but since the discussion is about C or C++, it is those languages
> > and their standards that matter.
>
> But how can code that expresses exactly the same thing be fine in one
> language and not in another?

What a weird thing to say. Code that expresses exactly the same thing
can be fine in one language but not in another because the respective
standards for those languages say so.

In setting a standard, it is a matter of choice for the language
designer involving trade-offs between amongst other things optimization
opportunities, convenience to the programmer, language complexity and
safety.

Paavo Helde

unread,
May 21, 2019, 12:21:01 PM5/21/19
to
On 21.05.2019 15:29, Jorgen Grahn wrote:
> On Mon, 2019-05-20, Chris Vine wrote:
>
>> The number of people who don't trouble themselves to understand the
>> strict aliasing rules of C and C++ is surprising.
>
> "Don't trouble themselves" is a good way of putting it, because it's
> not /hard/ to grasp. If you think of memory as a store for /typed/
> objects, plus the extra accomodations for char* and unions, you have
> the rough picture.

This page

https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8

claims that the following is UB in C++.

void *p = malloc(sizeof(float));
float *fp = p;
*fp = 1.0f;

and one should use placement new instead:

new (p) float {1.0f} ;

Now that I find hard to crasp. If this is true, how is it even possible
to write e.g. a custom memory allocator?

A low-level memory allocator typically does not know anything about
float, how is it possible to convert a memory block pointer to float*
which has to be returned from allocate() (and later e.g. from
std::vector<float>::data())? Do I really need to perform a dummy
placement new in the beginning of the memory block, to obtain a valid
float* pointer?



Bonita Montero

unread,
May 21, 2019, 12:23:37 PM5/21/19
to
> This page
> https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8
> claims that the following is UB in C++.
> void *p = malloc(sizeof(float));
> float *fp = p;
> *fp = 1.0f;

LOL.

Paavo Helde

unread,
May 21, 2019, 12:50:31 PM5/21/19
to
On 21.05.2019 16:52, Juha Nieminen wrote:
> blt_8o...@32fkxnh90wz2bllefiqy.co.uk wrote:
>> I didn't even know hexfloat existed. It seems a spectacularly useless
>> manipulator. What on earth is the point of it?
>
> If you save a floating point value in ascii usint the normal decimal
> representation, in many (perhaps even most) cases there's a high
> chance of losing accuracy when it's read back, for the simple
> reason that the base-10 representation cannot accurately represent
> every single base-2 floating point value.

With enough digits, a decimal (as well as any other) representation can
get arbitrarily close to any real value, so it can also get arbitrarily
close the any value represented exactly in base-2. It is not needed to
represent the base-2 value exactly, it is just enough to provide any
base-10 value which is rounded to the correct base-2 value.

Historically there were indeed some round-trip bugs when serializing
floating-point values, but AFAIK these bugs got fixed in the C runtime
libraries about 10-20 years ago or so. Plus there are libraries which
ensure the minimum number of decimal digits for perfect round-trip.

Maybe you wanted to say that ensuring a proper round-trip is trickier in
base-10 than in base-16 and it may easily waste more bytes than strictly
necessary?

Bart

unread,
May 21, 2019, 1:25:06 PM5/21/19
to
Yes, but in this case what someone may want to do could be completely
reasonable in itself, and could be well-defined on the processors they
know their program will run on.

When you look at the reasons why C and also C++ have made certain things
UB, you could well find they don't apply in your case.

Now, I've frequently written code in one language, which works perfectly
well as native code, but it hits UB if I auto-translate to C. There are
actually more problems generating C source as a target, then generating ASM.

(And actually, when I briefly tried to target C++, there were even more
problems.)

james...@alumni.caltech.edu

unread,
May 21, 2019, 1:34:47 PM5/21/19
to
On Saturday, May 18, 2019 at 6:14:42 AM UTC-4,
...
> I didn't even know hexfloat existed. It seems a spectacularly useless
> manipulator. What on earth is the point of it?

The C++ standard defines the behavior of hexfloat in terms of the
behavior of std::printf() with a "%a" or "%A" format specifier. The C++
standard does not provide a detailed description of the behavior of
printf(), cross-referencing the C standard for that definition. The C
standard defines that behavior in part by recommending that conversions
between strings and floating point values performed by standard library
functions should be the same as those described for floating point
constants. The less important advantage described below therefore traces
back to things explained only in the C standard's description of
floating point constants. The main advantage of hexfloat described below
is far less subtle. It's an inherent consequence of using hexadecimal
rather than decimal notation.

I'll use pi as an example to demonstrate the issue. The first 17
significant digits of pi are

3.1415926535897932

Like most decimal floating point constants, 3.1415926535897932 cannot be
represented exactly (which is a truth distinct from the fact that pi
itself also cannot be represented exactly). On the system I'm currently
using, three consecutive floating point values that can be represented
exactly are:

3.141592653589792671908753618481568992137908935546875
3.141592653589793115997963468544185161590576171875000
3.141592653589793560087173318606801331043243408203125

I've shown 52 significant digits for those numbers, which is the minimum
needed to display those values exactly. 17 digits are sufficient,
however, to uniquely identify the middle value as the closest
representable value to the exact value of pi. However the C standard
permits 3.1415926535897932 to have any one of those three values. That
would be true even if the constant used all 52 significant digits of the
middle representation shown above.
The corresponding hexadecimal floating point constants for those same
values are

0X1.921FB54442D17P+1
0X1.921FB54442D18P+1
0X1.921FB54442D19P+1

The main advantage of hexadecimal floating point is simply that it takes
only 11 significant hexadecimal "digits" to represent those values
exactly, compared with the 17 significant decimal digits needed to
specify a value that's closer to the best value than to either of the
others, and compared with the 52 digits needed to represent those values
exactly.

However, there's another less important advantage. If FLT_RADIX is a
power of 2 (as it is on almost, but not quite, every implementation of C
or C++ currently in use), then 0X1.921FB54442D18P+1 is only allowed to
be represented by that exact value, the other two values are not
permitted as they would be for a decimal floating point constant. Also,
0X1.921FB54442D187P+1 is required to be rounded correctly to
0X1.921FB54442D18P+1; it's not allowed to round to 0X1.921FB54442D19P+1.
That would not be the case for decimal floating point constants.

These issues are important only for values that have so many significant
digits that they push the limits on what values can be represented
exactly. This almost never comes up with numbers that describe real-
world measured quantities unless you're using "float" for your
calculations, in which case the right solution is to use "double". It's
only in high precision scientific calculations or pure mathematics that
you're likely to run into situations where hexfloat actually becomes
important. Therefore, if you aren't doing that kind of work, hexfloat
probably seems pointless to you. However, it can be of critical
importance to people who are doing that kind of work.

Alf P. Steinbach

unread,
May 21, 2019, 2:01:46 PM5/21/19
to
On 21.05.2019 18:20, Paavo Helde wrote:
> On 21.05.2019 15:29, Jorgen Grahn wrote:
>> On Mon, 2019-05-20, Chris Vine wrote:
>>
>>> The number of people who don't trouble themselves to understand the
>>> strict aliasing rules of C and C++ is surprising.
>>
>> "Don't trouble themselves" is a good way of putting it, because it's
>> not /hard/ to grasp.  If you think of memory as a store for /typed/
>> objects, plus the extra accomodations for char* and unions, you have
>> the rough picture.
>
> This page
>
> https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8
>
> claims that the following is UB in C++.
>
> void *p = malloc(sizeof(float));
> float *fp = p;
> *fp = 1.0f;
>
> and one should use placement new instead:
>
> new (p) float {1.0f} ;

The initialization of `fp` wouldn't compile as C++, but let's assume a
`static_cast` or `reinterpret_cast` there. I favor the latter since it
communicates better to the reader, but I believe for reasons having to
do with a shortcoming of C++03 Herb Sutter and Andrei Alexandrescu
recommended used `static_cast` in their old coding guidelines book.


> > Now that I find hard to crasp. If this is true, how is it even possible
> to write e.g. a custom memory allocator?

Magic is indeed performed in a `new` expression: it transforms a `void*`
produced by an allocator function, to a typed pointer.

Another place this magic occurs, is in the member functions of a
`std::allocator`. At least in C++03. I'm not as up-to-date as I should
be to participate in C++ discussions.

Anyway, even placement `new` doesn't save one from UB when there is an
object other than byte in that memory chunk, and one obtains a pointer
to it of an unrelated pointee type. Wham bang, you're formally dead.

On the other hand, when there is no object of type other than bytes,
then `reinterpret_cast` is technically good and so is placement `new`.


> A low-level memory allocator typically does not know anything about
> float, how is it possible to convert a memory block pointer to float*
> which has to be returned from allocate() (and later e.g. from
> std::vector<float>::data())? Do I really need to perform a dummy
> placement new in the beginning of the memory block, to obtain a valid
> float* pointer?

Nah, just FUD.


Cheers!,

- Alf

Paavo Helde

unread,
May 21, 2019, 2:02:27 PM5/21/19
to
On 21.05.2019 20:24, Bart wrote:
>
> Now, I've frequently written code in one language, which works perfectly
> well as native code, but it hits UB if I auto-translate to C.

Seems like a bug in the auto-translator.


> There are
> actually more problems generating C source as a target, then generating
> ASM.

Which ASM? All the 72 architectures covered by gcc?

>
> (And actually, when I briefly tried to target C++, there were even more
> problems.)

I bet.

Bonita Montero

unread,
May 21, 2019, 2:47:12 PM5/21/19
to
> The initialization of `fp` wouldn't compile as C++, but let's assume a
> `static_cast` or `reinterpret_cast` there. I favor the latter since it
> communicates better to the reader, but I believe for reasons having to
> do with a shortcoming of C++03 Herb Sutter and Andrei Alexandrescu
> recommended used `static_cast` in their old coding guidelines book.

static_cast, reinterpret_cast or C-style-cast - pure syntatic sugar.

Bart

unread,
May 21, 2019, 2:56:47 PM5/21/19
to
On 21/05/2019 19:02, Paavo Helde wrote:
> On 21.05.2019 20:24, Bart wrote:
>>
>> Now, I've frequently written code in one language, which works perfectly
>> well as native code, but it hits UB if I auto-translate to C.
>
> Seems like a bug in the auto-translator.

Just a mismatch of languages, even though C in this case superficially
works the same way.

C as an intermediate language, even though it is very frequently used
for that purpose, leaves a lot to be desired.

>
>> There are
>> actually more problems generating C source as a target, then generating
>> ASM.
>
> Which ASM? All the 72 architectures covered by gcc?

The ones I had in mind were x64 and ARM64. I think I decided it would be
simpler to target those two than to try and generate C code which would
always compile warning-free and UB-free.

Are there are any others I'm likely to be able to program in consumer
equipment?

I think it is quite common for applications to only need to run on a
small number of architectures, but not want to be inconvenienced by a
language designed to work with every conceivable architecture, past,
present and future, and which therefore have to designate as UB,
behaviour which cannot be guaranteed to work across all of them.

>>
>> (And actually, when I briefly tried to target C++, there were even more
>> problems.)
>
> I bet.

I haven't tried it for a while. If I try it know on a smallish 3200-line
generated-C program, I get the following number of lines of errors and
warnings:

No options (just -c) Lots of -W options

gcc 0 lines 2900 lines
g++ 1150 lines 2850 lines

Typical error from the 1150-line output is:

jpeg.c:420:5: error: invalid conversion from 'int64 (*)(jpeg_stream*)'
{aka 'long long int (*)(jpeg_stream*)'} to 'void*' [-fpermissive]

(Actually, this line is nothing to do with the application, but this
language generates some metadata which includes an array of pointers to
all functions used in the program. Since every function would have its
own pointer type which depends on its signature, what should be the
array element type?

As far as I'm concerned, any function pointer can be stored within the
same space as a void* pointer on all targets I want this to run on. It
should be a non-issue.)

Chris Vine

unread,
May 21, 2019, 3:04:46 PM5/21/19
to
Looking at operator new() rather than malloc() (both of which do pretty
much the same thing), the C++14 standard (§3.7.4.1/2) says

"The pointer returned shall be suitably aligned so that it can be
converted to a pointer of any complete object type with a fundamental
alignment requirement (3.11) and then used to access the object or
array in the storage allocated (until the storage is explicitly
deallocated by a call to a corresponding deallocation function)."

So you can use the storage provided by operator new() to access an
object constucted in that storage. The dynamic type of the allocated
storage is in effect the type of the first object constructed in it
(here, a float). So for trivial types (which do not need to execute a
constructor and destructor) I don't think it is necessary to use
placement new - I think you can just assign or memcpy() into the memory.
That is a feature of trivial types.

It would be a ridiculous interpretation of the standard that malloc()
operates differently. Footnote 36 itself says:

"The intent is to have operator new() implementable by calling
std::malloc() or std::calloc(), so the rules are substantially the
same. C++ differs from C in requiring a zero request to return a
non-null pointer."

The dynamic type of malloc()'ed memory must surely, as in the case of
the effective type in C, arise upon first construction of an object in
that memory, either by placement new (C++) or assignment or memcpy()
(C, and C++ trivial types).

Chris

Chris Vine

unread,
May 21, 2019, 3:08:32 PM5/21/19
to
That depends on what you mean by "syntactic sugar". static_cast will
carry out pointer adjustment so it can be used to navigate an
inheritance graph correctly. reinterpret_cast won't - it is purely a
reinterpretation of a bit pattern.

blt_1d3...@dxekg86ut76lwln.org

unread,
May 21, 2019, 3:28:34 PM5/21/19
to
On Tue, 21 May 2019 11:49:13 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>On Tue, 21 May 2019 08:26:45 +0000 (UTC)
>blt_14...@2mu00w.co.uk wrote:
>> Are you having a slow brain day or something? You said it wouldn't work on
>> new compilers with optimisation. I just proved it did just as I've said it
>> works on every compiler I've ever tried it on.
>
>The "it" which doesn't work is type punning through casting pointers.
>Your silly toy code with undefined behaviour proves absolutely
>nothing. An example has already been given up-thread of the

Me: "The code works on every compiler I've ever tried"

You: "But it won't work on modern compilers with optimisation!"

Me: [Shows code doing exactly that with 2 compilers]

You: "Oh well, its just silly toy code blah blah feeble backpedal blah blah"

>differences which can arise in code emitted, depending on whether the
>-fno-strict-aliasing switch is applied or not.

And someone else explained what happened which was nothing to do with type
punning.

>So the standard says clearly that it is undefined behaviour but you say
>"ignore that because although gcc warns that it breaks strict aliasing,
>clang doesn't". If that is your approach to programming then "crap
>code" seems like too mild a description.

The code works and always has done. A pointer is always the same size for
every type and always will be.

However, feel free to provide a proper example where it fails or you can just
keep on farting out indignant hot air. You choice.

Chris Vine

unread,
May 21, 2019, 3:47:24 PM5/21/19
to
On Tue, 21 May 2019 19:28:17 +0000 (UTC)
blt_1d3...@dxekg86ut76lwln.org wrote:
> On Tue, 21 May 2019 11:49:13 +0100
> Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> >On Tue, 21 May 2019 08:26:45 +0000 (UTC)
> >blt_14...@2mu00w.co.uk wrote:
> >> Are you having a slow brain day or something? You said it wouldn't work on
> >> new compilers with optimisation. I just proved it did just as I've said it
> >> works on every compiler I've ever tried it on.
> >
> >The "it" which doesn't work is type punning through casting pointers.
> >Your silly toy code with undefined behaviour proves absolutely
> >nothing. An example has already been given up-thread of the
>
> Me: "The code works on every compiler I've ever tried"
>
> You: "But it won't work on modern compilers with optimisation!"
>
> Me: [Shows code doing exactly that with 2 compilers]
>
> You: "Oh well, its just silly toy code blah blah feeble backpedal blah blah"
>
> >differences which can arise in code emitted, depending on whether the
> >-fno-strict-aliasing switch is applied or not.
>
> And someone else explained what happened which was nothing to do with type
> punning.

Nonsense - see below.

> >So the standard says clearly that it is undefined behaviour but you say
> >"ignore that because although gcc warns that it breaks strict aliasing,
> >clang doesn't". If that is your approach to programming then "crap
> >code" seems like too mild a description.
>
> The code works and always has done. A pointer is always the same size for
> every type and always will be.
>
> However, feel free to provide a proper example where it fails or you can just
> keep on farting out indignant hot air. You choice.

No, still wrong I am afraid. I repeat: "The 'it' which doesn't work is
type punning through casting pointers." It doesn't work because it
doesn't work, end of story. Your bluster that when the standard says
incorrect aliasing gives rise to undefined behaviour it really means
that it works fine in your toy programs, is crap.

And the example to which I referred was corrected (20 May 2019 21:49:30
+0200) and did demonstrate the strict aliasing issue.

If you want another example, there is one in the posting of another
person:

int foo( float *f, int *i ) {
*i = 1;
*f = 0.f;

return *i;
}

int main() {
int x = 0;

std::cout << x << "\n"; // Expect 0
x = foo(reinterpret_cast<float*>(&x), &x);
std::cout << x << "\n"; // Expect 0?
}

You will find many similar examples in articles on the internet about
strict aliasing.

Scott Lurndal

unread,
May 21, 2019, 3:53:11 PM5/21/19
to
blt_1d3...@dxekg86ut76lwln.org writes:
>On Tue, 21 May 2019 11:49:13 +0100

>The code works and always has done. A pointer is always the same size for
>every type and always will be.

Actually at least one major processor vendor has been thinking about
changing this in the future.....

And it certainly wasn't true in the past.

David Brown

unread,
May 21, 2019, 5:57:21 PM5/21/19
to
On 21/05/2019 16:45, Bart wrote:
> On 21/05/2019 14:36, David Brown wrote:
>> On 21/05/2019 13:35, Bart wrote:
>
>>> It's only 'crap code' because the language says so.
>>>
>>
>> Yes - but since the discussion is about C or C++, it is those languages
>> and their standards that matter.
>
> But how can code that expresses exactly the same thing be fine in one
> language and not in another?

That's a meaningless question. If the code expresses exactly the same
thing in different languages, then it is fine in those languages. Code
that does not have defined behaviour in a language is not expressing
anything at all - totally regardless of whether similar looking code has
a meaning in a different language.


>
>   The standards say accessing an "float"
>> object through an "int*" pointer is undefined behaviour - therefore it
>> is undefined behaviour in C and C++, and if you write such code, it is
>> crap C or C++ code.  Particular implementations are free to give a
>> definition for it - code can be crap C++ code while being valid "MSVC"
>> code or valid "gcc -fno-strict-aliasing" code.
>
> And here apparently the same code can be also be fine in a particular
> dialect of C or C++, or even using a particular set of compiler options,
> but be 'crap' when someone changes the compiler or option.
>

Now you are beginning to understand!

Code has to be written in a programming language. If the words and
symbols you use don't make sense in that language, then the code is
wrong - even if the same words and symbols make sense in a different
language or different language variation.


>
>> I mean, how often do you actually want to access float data via an int*
>> pointer?  In real code, it's very rare that this sort of thing crops up.
>>   Being rare, there is no problem using valid alternatives like unions or
>> memcpy - both of which can be tightly optimised by a compiler.
>
> In my own codebase I seem to use cast-based type-punning 5-10 times per
> application, But union-based type-punning is used all the time.
> Cast-based type-punning is convenient when there is no struct or union
> involved.
>
> It is very frequent that I'm interested in interpreting the bytes of a
> float as some integer value, or vice versa.
>
> (Here's an example using that non-C language; this prints the underlying
> binary bits of 0.1 which is a 64-bit float:
>
>     print int64@(0.1):"b"
>
> Output is:
>
>   11111110111001100110011001100110011001100110011001100110011010
>
> No faffing about with unions or *(int64_t*)&x casts (which won't work on
> 0.1), which apparently have UB anyway, or memcpy.)

Your language is not C. In C, you do this differently - as you can't
access the float via a pointer to a 64-bit integer type, and there is no
pre-defined method of printing in binary anyway. Why does it bother you
so much that you have to write different things in different languages?

>
>> And how often is this actually relevant in real code?  Or is this just
>> another one of your "features" whose only purpose is to let you pretend
>> your language is "better" than C?
>
> It's 'better' in that it allows this sort of obvious stuff that people
> want to write, while C doesn't.
>

I don't believe that many people /do/ want to write this sort of thing,
and it is certainly not "obvious". (The implementation, and assembly
level, is obvious. But we are not programming in assembly.)

I can appreciate that /you/ want to use concepts like this, and thus it
makes sense to have it as a feature in your personal language. But I
have long ago given up trying to understand /why/ you want to do it or
why you find these things so useful in your coding. Perhaps they just
happen to be helpful in writing compilers and translators.


David Brown

unread,
May 21, 2019, 6:13:49 PM5/21/19
to
On 21/05/2019 20:56, Bart wrote:
> On 21/05/2019 19:02, Paavo Helde wrote:
>> On 21.05.2019 20:24, Bart wrote:
>>>
>>> Now, I've frequently written code in one language, which works perfectly
>>> well as native code, but it hits UB if I auto-translate to C.
>>
>> Seems like a bug in the auto-translator.
>
> Just a mismatch of languages, even though C in this case superficially
> works the same way.
>

In other words, bugs in the auto-translator. The flaws are in the
design and specification, rather than the implementation, but they are
bugs nonetheless.

In order to translate code from one language to another, you need to
understand both languages. And you need to generate correct and valid
code - not something that looks a bit like what you would have liked the
target language to be.


> C as an intermediate language, even though it is very frequently used
> for that purpose, leaves a lot to be desired.

It is fine as a target language, but you need to generate correct C
code. (Alternatively, you need to generate "C for this compiler, this
target and these options" code - and be honest about it. That is a
perfectly reasonable solution, and the one used by most code generators.)

Other people who write code generators or translators that produce C
manage it. And when their generated code has flaws, they blame their
generators - not the language.

>
>>
>>> There are
>>> actually more problems generating C source as a target, then generating
>>> ASM.
>>
>> Which ASM? All the 72 architectures covered by gcc?
>
> The ones I had in mind were x64 and ARM64. I think I decided it would be
> simpler to target those two than to try and generate C code which would
> always compile warning-free and UB-free.
>

You understand how assembly works. You are willing to use features of
assemblers. You don't understand how C works. You are unwilling to use
many features of the language. It is not surprising that you find
generating assembly easier than generating C code.

> Are there are any others I'm likely to be able to program in consumer
> equipment?

Since your languages and tools are for you alone, it is up to you to
answer that one.

>
> I think it is quite common for applications to only need to run on a
> small number of architectures, but not want to be inconvenienced by a
> language designed to work with every conceivable architecture, past,
> present and future, and which therefore have to designate as UB,
> behaviour which cannot be guaranteed to work across all of them.
>

I agree that code rarely has to be very portable. Of course, I disagree
about your characterisation of UB - in particular, it does not make
sense to suggest that code with undefined behaviour could work at all.
By the meaning of the words, code with undefined behaviour does not have
any definition of what it is supposed to do, and therefore cannot be
considered to "work". At best, you mean the code should do what it
looks like you think it should do. That might be okay to a human
reader, but computers are fussy about definitions.

>>>
>>> (And actually, when I briefly tried to target C++, there were even more
>>> problems.)
>>
>> I bet.
>
> I haven't tried it for a while. If I try it know on a smallish 3200-line
> generated-C program, I get the following number of lines of errors and
> warnings:
>
>          No options (just -c)    Lots of -W options
>
> gcc         0 lines              2900 lines
> g++      1150 lines              2850 lines
>
> Typical error from the 1150-line output is:
>
> jpeg.c:420:5: error: invalid conversion from 'int64 (*)(jpeg_stream*)'
> {aka 'long long int (*)(jpeg_stream*)'} to 'void*' [-fpermissive]
>
> (Actually, this line is nothing to do with the application, but this
> language generates some metadata which includes an array of pointers to
> all functions used in the program. Since every function would have its
> own pointer type which depends on its signature, what should be the
> array element type?
>
> As far as I'm concerned, any function pointer can be stored within the
> same space as a void* pointer on all targets I want this to run on. It
> should be a non-issue.)

C and C++ do not share your opinion - and you are asking the compiler to
treat your code as (approximately) standard C or C++. However, gcc (and
all other serious compilers) give you a lot of flexibility about
choosing warnings and other options, precisely to let you tune the
details of the language you want. If you want to generate code that
only works on platforms where you can store a function pointer in a
void* pointer (though I can't imagine why it would be useful), you can
tune your options to suit. Perhaps try with "-fpermissive" ?


David Brown

unread,
May 21, 2019, 6:18:21 PM5/21/19
to
It is not true at the moment either. There are more processors around
than just x86 and ARM. (I know you, Scott, know this - I am expanding
on your post, not correcting it.)

And of course, the size of pointers has absolutely /nothing/ to do with
the undefined nature of trying to access an object through a pointer to
a different type.

Daniel

unread,
May 21, 2019, 7:37:22 PM5/21/19
to
On Saturday, May 18, 2019 at 8:28:00 AM UTC-4, Bo Persson wrote:
>
> The hexfloat is a bit odd, as it really only works for output...
>
> For numeric input, the spec for the num_get facet specifically says that
> the only characters allowed are "0123456789abcdefxABCDEFX+-".
>
> https://en.cppreference.com/w/cpp/locale/num_get/get (see Stage 2)
>
> So ss >> y would stop at the 'p' anyway.
>
> The committee didn't want to break old code reading "1.0p" as one double
> and one char, like in ss >> y >> ch.
>
What about this proposal, to expand the number of characters to include p
and P?

https://timsong-cpp.github.io/lwg-issues/2381

It appears to be still open.

Daniel

Bart

unread,
May 21, 2019, 8:07:59 PM5/21/19
to
On 21/05/2019 23:13, David Brown wrote:
> On 21/05/2019 20:56, Bart wrote:

>> Just a mismatch of languages, even though C in this case superficially
>> works the same way.
>>
>
> In other words, bugs in the auto-translator.  The flaws are in the
> design and specification, rather than the implementation, but they are
> bugs nonetheless.

You can call a unwillingness to expend a huge, disproportionate effort
in overcoming C's many shortcomings for this purpose a bug if you like.

The source language has a simple, orthogonal type model, 64-bit-based,
which is very easy to superimpose on the simple hardware model of the
64-bit target you want to use.

But now introduce C between the two, which has a more complex, unwieldy,
not-quite-so-unorthogonal type system, which is 32-bit-based even when
the final target is 64-bit, with its million and one quirks, and which
doesn't quite match that of the target language.

> It is fine as a target language, but you need to generate correct C
> code.

Which means what? So that there are 0 errors and 0 warnings no matter
what options somebody will apply?

How can something be judged correct or not when that measure depends on
which options - which can be out of your control - are applied?

> Other people who write code generators or translators that produce C
> manage it.  And when their generated code has flaws, they blame their
> generators - not the language.

The fault is very largely with the unsuitably of C for the role. Except
that there isn't really anything else that is as ubiquitous.

> You understand how assembly works.  You are willing to use features of
> assemblers.  You don't understand how C works.  You are unwilling to use
> many features of the language.  It is not surprising that you find
> generating assembly easier than generating C code.

ASM doesn't stop me storing a 64-bit function pointer into a 64-bit
memory location. Source language:

import clib
ref void fnptr
fnptr := puts

Generates native code (could be a one-liner but never mind):

lea D0, [`puts*]
mov [t.fnptr], D0

No problem. Now I get it to generate C:

static void * t_fnptr;
t_fnptr = (void *)(&puts);

gcc (no options):

(Nothing)

gcc (with recommended bunch of options):

t.c:66:5: warning: ISO C forbids initialization between function
pointer and 'void *' [-Wpedantic]

g++ (no options):

t.c:66:5: error: invalid conversion from 'void (*)()' to 'void*'
[-fpermissive]

Three categories of message; which one was right? Because either my
source code is fine, or it isn't.

>> Are there are any others I'm likely to be able to program in consumer
>> equipment?
>
> Since your languages and tools are for you alone, it is up to you to
> answer that one.

I meant: which ones am I likely to come across? What can I buy from PC
World that I can program, that will have some fancy processor inside
where function pointers are bigger than 64 bits, or float64 has a
different byte ordering from int64.

If you want to generate code that
> only works on platforms where you can store a function pointer in a
> void* pointer (though I can't imagine why it would be useful),

I explained why: to produce a list of pointers to disparate functions.
(As to why /that/ might be useful, I'd have post a link to a longer
explanation.)

> you can
> tune your options to suit.

The compiler will already know where that will work and where it won't,
because it presumably knows what the target is, and can either report an
error if not, or arrange for it to work.

>  Perhaps try with "-fpermissive" ?

How does that magically make it alright? I thought such an option just
suppressed the warning? (For g++, it just changed an error to a warning.)

Is there an actual practical problem in doing such a conversion or not?

And if not, why is it bothering me with it?

Ian Collins

unread,
May 22, 2019, 12:33:32 AM5/22/19
to
On 22/05/2019 12:07, Bart wrote:
> On 21/05/2019 23:13, David Brown wrote:
>> On 21/05/2019 20:56, Bart wrote:
>
>>> Just a mismatch of languages, even though C in this case superficially
>>> works the same way.
>>>
>>
>> In other words, bugs in the auto-translator.  The flaws are in the
>> design and specification, rather than the implementation, but they are
>> bugs nonetheless.
>
> You can call a unwillingness to expend a huge, disproportionate effort
> in overcoming C's many shortcomings for this purpose a bug if you like.

That's never stopped you expending a huge, disproportionate effort in
whinging about C. That time would have easily been enough to fix your code.

> No problem. Now I get it to generate C:
>
> static void * t_fnptr;
> t_fnptr = (void *)(&puts);
>
> gcc (no options):
>
> (Nothing)
>
> gcc (with recommended bunch of options):
>
> t.c:66:5: warning: ISO C forbids initialization between function
> pointer and 'void *' [-Wpedantic]
>
> g++ (no options):
>
> t.c:66:5: error: invalid conversion from 'void (*)()' to 'void*'
> [-fpermissive]

The conversion is "conditionally-supported" in C++>=11 which makes it a
"program construct that an implementation is not required to support".

Thus:
$ clang++ -std=c++98 -Wall -Werror -Wextra -pedantic /tmp/x.cc

/tmp/x.cc:6:14: error: cast between pointer-to-function and
pointer-to-object is an extension [-Werror,-Wpedantic]
t_fnptr = (void*)(&puts);
^~~~~~~~~~~~~~
1 error generated.

$ clang++ -std=c++11 -Wall -Werror -Wextra -pedantic /tmp/x.cc
$

--
Ian.



Bonita Montero

unread,
May 22, 2019, 1:20:41 AM5/22/19
to
>>> The initialization of `fp` wouldn't compile as C++, but let's assume a
>>> `static_cast` or `reinterpret_cast` there. I favor the latter since it
>>> communicates better to the reader, but I believe for reasons having to
>>> do with a shortcoming of C++03 Herb Sutter and Andrei Alexandrescu
>>> recommended used `static_cast` in their old coding guidelines book.

>> static_cast, reinterpret_cast or C-style-cast - pure syntatic sugar.

> That depends on what you mean by "syntactic sugar". static_cast
> will carry out pointer adjustment so it can be used to navigate
> an inheritance graph correctly.

"syntactic sugar" was related to the case above.

Juha Nieminen

unread,
May 22, 2019, 2:41:02 AM5/22/19
to
Paavo Helde <myfir...@osa.pri.ee> wrote:
> With enough digits, a decimal (as well as any other) representation can
> get arbitrarily close to any real value, so it can also get arbitrarily
> close the any value represented exactly in base-2. It is not needed to
> represent the base-2 value exactly, it is just enough to provide any
> base-10 value which is rounded to the correct base-2 value.
>
> Historically there were indeed some round-trip bugs when serializing
> floating-point values, but AFAIK these bugs got fixed in the C runtime
> libraries about 10-20 years ago or so. Plus there are libraries which
> ensure the minimum number of decimal digits for perfect round-trip.

How would you know, using standard C/C++, how many digits do you need to
output in order to ensure no loss of bits when reading the value back?
(And this is assuming that the C or C++ standard library being used has
been implemented such that given enough decimal digits, they will be
rounded to the correct direction as to restore the original value
exactly.)

It is my understanding that hexadecimal floating point representation
*always* outputs the exact amount of digits to represent the value
accurately.

Bart

unread,
May 22, 2019, 6:15:01 AM5/22/19
to
On 22/05/2019 05:33, Ian Collins wrote:
> On 22/05/2019 12:07, Bart wrote:
>> On 21/05/2019 23:13, David Brown wrote:
>>> On 21/05/2019 20:56, Bart wrote:
>>
>>>> Just a mismatch of languages, even though C in this case superficially
>>>> works the same way.
>>>>
>>>
>>> In other words, bugs in the auto-translator.  The flaws are in the
>>> design and specification, rather than the implementation, but they are
>>> bugs nonetheless.
>>
>> You can call a unwillingness to expend a huge, disproportionate effort
>> in overcoming C's many shortcomings for this purpose a bug if you like.
>
> That's never stopped you expending a huge, disproportionate effort in
> whinging about C.  That time would have easily been enough to fix your
> code.

I prefer to spend the time developing a superior series of alternate
languages. Highlighting the problems in C benefits the process, and
often opens people's eyes to things they didn't know. And also, doing so
in a forum is some light relief from actual work.

But if you want to see some real whinging, watch some Jonathan Blow
videos about his new 'Jai' language that is supposed to wipe the floor
with C++.

>>       static void *  t_fnptr;
>>       t_fnptr = (void *)(&puts);

>> g++ (no options):
>>
>>     t.c:66:5: error: invalid conversion from 'void (*)()' to 'void*'
>>      [-fpermissive]
>
> The conversion is "conditionally-supported" in C++>=11 which makes it a
> "program construct that an implementation is not required to support".
>
> Thus:
> $ clang++ -std=c++98 -Wall -Werror -Wextra -pedantic /tmp/x.cc
>
> /tmp/x.cc:6:14: error: cast between pointer-to-function and
> pointer-to-object is an extension [-Werror,-Wpedantic]
>    t_fnptr = (void*)(&puts);
>              ^~~~~~~~~~~~~~
> 1 error generated.
>
> $ clang++ -std=c++11 -Wall -Werror -Wextra -pedantic /tmp/x.cc

I thought this was some magic incantation to wave away all errors. But I
tried it on my test (a 2900-Loc Linux version of the C file), and the
lines of errors and warnings went up from 1100 to 2100!

(Input was this file: https://github.com/sal55/qx/blob/master/jpeg.c,
generated by an older compiler as new ones have dropped the C target.)

David Brown

unread,
May 22, 2019, 7:22:24 AM5/22/19
to
On 22/05/2019 02:07, Bart wrote:
> On 21/05/2019 23:13, David Brown wrote:
>> On 21/05/2019 20:56, Bart wrote:
>
>>> Just a mismatch of languages, even though C in this case
>>> superficially works the same way.
>>>
>>
>> In other words, bugs in the auto-translator.  The flaws are in the
>> design and specification, rather than the implementation, but they are
>> bugs nonetheless.
>
> You can call a unwillingness to expend a huge, disproportionate effort
> in overcoming C's many shortcomings for this purpose a bug if you like.

You are happy to classify your wilful and determined ignorance of C as a
bug in yourself? Okay, I suppose.

Certainly the idea that this is all a "huge, disproportionate effort" is
your own personal problem. Undefined behaviours in C are mostly quite
clear and obvious, you rarely meet them in practice, and they are mostly
straightforward to handle. For a language generator, they are peanuts
to deal with. These have been explained to you countless times.

Of course, dealing with them nicely and efficiently involves macros and
the C preprocessor. But it is apparently far better to whine and moan
about deficiencies in C than to use the features of C to get what you need.

>
> The source language has a simple, orthogonal type model, 64-bit-based,
> which is very easy to superimpose on the simple hardware model of the
> 64-bit target you want to use.
>
> But now introduce C between the two, which has a more complex, unwieldy,
> not-quite-so-unorthogonal type system, which is 32-bit-based even when
> the final target is 64-bit, with its million and one quirks, and which
> doesn't quite match that of the target language.

C is not based on any hardware model - it is more abstract. Yes,
putting that in between the two layers that have matching models will
cause complications, and you will have to be careful to get it right.
But as abstract models go, C's is not difficult to comprehend.

>
>> It is fine as a target language, but you need to generate correct C code.
>
> Which means what? So that there are 0 errors and 0 warnings no matter
> what options somebody will apply?

No. It means that there are no errors in the code, based on whatever
restrictions you might want to place on how it is used. If you want to
generate fully portable C code (matching a particular standard), then do
so. If you want to generate code that has limitations on the compiler
or flags needed, then do so - but make sure that you document the
restrictions. Far and away the best choice here is to use conditional
compilation and compiler detection. For example, if you want to allow
casting between different pointer types to work for punning, and you
want wrapping overflow behaviour to match your source language, then try
something like this:

#ifdef __GNUC__
/* Set options needed by gcc and clang for desired C variant */
#pragma GCC optimize "-fno-strict-aliasing"
#pragma GCC optimize "-fwrapv"
#pragma GCC diagnostic ignored "-Wformat"
#elif defined(_MSC_VER)
/* Set options needed by MSVC for desired C variant */
#elif defined(_BART_C)
/* Bart's C compiler already supports Bart C */
#else
#error Untested compiler - remove this and compile at your own risk
#endif



Simple and safe.

Add whatever other tests and compiler-specific options you need to suit
the compilers you have tested - forcing compiler options with pragmas
and disabling any warnings you might want to disable. For example, if
you are confident that your code generator gets all printf formats
correct, then you might like to disable gcc's "-Wformat" warnings. Then
you can use "%dll" for int64_t, without warning, whether the target
typedef's int64_t to "long int", "long long int", or a compiler
extension type.


You can expand this for other assumptions made by your code. For
example, you can include <limits.h> and confirm that INT_MIN is
-2147483648 and that CHAR_BIT is 8.


I've given such suggestions before. But as usual, you'd prefer to
complain and get things wrong than learn from others. After all, it is
/so/ much better to write thousands of lines of Usenet posts whining
than to put a dozen lines of fixed text into your C generator.


>
> How can something be judged correct or not when that measure depends on
> which options - which can be out of your control - are applied?

Easy - put it under your control. Stop saying "this is C code" unless
it is standards conforming C code that works with any standards
conforming C compiler. If you want to write code that is only correct
in certain circumstances, then say that the code is only useable in
those circumstances. As shown above, the best way to do this is with
compile-time checks.

>
>> Other people who write code generators or translators that produce C
>> manage it.  And when their generated code has flaws, they blame their
>> generators - not the language.
>
> The fault is very largely with the unsuitably of C for the role. Except
> that there isn't really anything else that is as ubiquitous.

First, it is completely unreasonable to blame C for its suitability for
a role /you/ choose for it. That is like buying a Ford Transit and then
blaming Ford when it doesn't fit in your garage.

Secondly, C works perfectly well for this usage. It is no coincidence
that there aren't any serious alternatives - C does a fine job for this
kind of purpose, and there is no demand for anything else. The fact
that you are having trouble is a reflection on you, not on C.

>
>> You understand how assembly works.  You are willing to use features of
>> assemblers.  You don't understand how C works.  You are unwilling to
>> use many features of the language.  It is not surprising that you find
>> generating assembly easier than generating C code.
>
> ASM doesn't stop me storing a 64-bit function pointer into a 64-bit
> memory location. Source language:
>
>     import clib
>     ref void fnptr
>     fnptr := puts
>
> Generates native code (could be a one-liner but never mind):
>
>     lea     D0, [`puts*]
>     mov     [t.fnptr],  D0
>
> No problem. Now I get it to generate C:
>
>     static void *  t_fnptr;
>     t_fnptr = (void *)(&puts);
>
> gcc (no options):
>
>     (Nothing)

It is entirely reasonable to generate no code, because you aren't doing
anything. If t_fnptr is static, and the compiler can see it is never
read, then it will not bother storing anything there (if there is
optimisation enabled). I recommend using "static" at the file-scope
level whenever possible, as it generates more efficient code, reduces
errors, and improves modularity. But it doesn't help your testing in
cases like this.

If you make t_fnptr a non-static file-scope variable, you get the code
you expect:

void * t_fnptr;

void foo(void) {
t_fnptr = (void*) (&puts);
}

gcc -O2 -x c -std=gnu11:

foo:
mov QWORD PTR t_fnptr[rip], OFFSET FLAT:puts
ret

>
> gcc (with recommended bunch of options):
>
>   t.c:66:5: warning: ISO C forbids initialization between function
>    pointer and 'void *' [-Wpedantic]

"Recommended bunch of options" is meaningless unless you say what why
they are recommended, and they are only recommended for particular
cases. "-Wpedantic" is recommended when you want to try to ensure that
your code is portable standards compliant C. But what you are writing
here is /not/ standard C - it is C that is supported by many compilers,
but not all. So "-Wpedantic" would not be recommended for this code.
Please try to apply a little common sense here - it is almost as though
you /want/ to give yourself problems rather than to solve your problems.


>
> g++ (no options):
>
>   t.c:66:5: error: invalid conversion from 'void (*)()' to 'void*'
>    [-fpermissive]

I don't get that at all, with my quick godbolt.org testing.

>
> Three categories of message; which one was right? Because either my
> source code is fine, or it isn't.

You are driving yourself bananas with your nonsensical attitude here.
No, it is /not/ that case that your source code is either fine or it is
not fine. You have to ask about the circumstances and conditions. As
standard C, it is /not/ fine. As C suitable for many practical
compilers, it /is/ fine - depending on the options.


You have three choices here, as far as I can see.

1. Make the effort to generate good, portable C code to the greatest
possible extent. This would mean using "void (*)()" as your general
function pointer type, rather than "void *". It would also mean losing
some of the simplicity in a direct translation from your language to C,
as the semantics are different.

2. Adapt the target C variant to suit. The way to do that is given
earlier in this post. This will restrict the compilers and targets that
can use your code, but that is probably absolutely fine.

3. Whine, moan, complain, and blame C, blame me, blame the C committee,
blame the gcc developers, blame the dog across the road and everyone
else for your continued stubborn, wilful ignorance.


Options 1 and 2 work fine for other people, but my money's on your
picking number 3.

>
>>> Are there are any others I'm likely to be able to program in consumer
>>> equipment?
>>
>> Since your languages and tools are for you alone, it is up to you to
>> answer that one.
>
> I meant: which ones am I likely to come across? What can I buy from PC
> World that I can program, that will have some fancy processor inside
> where function pointers are bigger than 64 bits, or float64 has a
> different byte ordering from int64.

If your restrictions are computers from PC World, then it is likely that
they will be ARM or x86. The great majority of processors are /not/ ARM
or x86, but you will not be programming them. (And to be fair, the
majority of these other processors have function pointers of the same
size as data pointers, and consistent endianness.)


>
> If you want to generate code that
>> only works on platforms where you can store a function pointer in a
>> void* pointer (though I can't imagine why it would be useful),
>
> I explained why: to produce a list of pointers to disparate functions.
> (As to why /that/ might be useful, I'd have post a link to a longer
> explanation.)

But why do you want to store them in "void*" pointers? Why not a "void
(*)(void)" pointer? That is the most sensible type here, and the
standards guarantee it will work. (You need to convert back to the
correct function type pointer before calling it, of course.)

>
>> you can tune your options to suit.
>
> The compiler will already know where that will work and where it won't,
> because it presumably knows what the target is, and can either report an
> error if not, or arrange for it to work.
>
>>   Perhaps try with "-fpermissive" ?
>
> How does that magically make it alright? I thought such an option just
> suppressed the warning? (For g++, it just changed an error to a warning.)

I have not needed to try the "-fpermissive" option - I don't write code
like that in the first place. I merely repeated gcc's suggestion.

>
> Is there an actual practical problem in doing such a conversion or not?

Such conversions can be implemented, of course, but that does not mean
that they are allowed by the language.

>
> And if not, why is it bothering me with it?

The compiler gives a warning when it thinks you have made a mistake in
your code.

Ian Collins

unread,
May 22, 2019, 7:55:42 AM5/22/19
to
C++ has much stricter type conversion rules than C.

> (Input was this file: https://github.com/sal55/qx/blob/master/jpeg.c,
> generated by an older compiler as new ones have dropped the C target.)

Just compile as C and disable the (generally harmless) warnings.

$ gcc -std=c99 jpeg.c -Wno-format -c
$

--
Ian.

Paavo Helde

unread,
May 22, 2019, 8:23:25 AM5/22/19
to
On 22.05.2019 9:40, Juha Nieminen wrote:
> Paavo Helde <myfir...@osa.pri.ee> wrote:
>> With enough digits, a decimal (as well as any other) representation can
>> get arbitrarily close to any real value, so it can also get arbitrarily
>> close the any value represented exactly in base-2. It is not needed to
>> represent the base-2 value exactly, it is just enough to provide any
>> base-10 value which is rounded to the correct base-2 value.
>>
>> Historically there were indeed some round-trip bugs when serializing
>> floating-point values, but AFAIK these bugs got fixed in the C runtime
>> libraries about 10-20 years ago or so. Plus there are libraries which
>> ensure the minimum number of decimal digits for perfect round-trip.
>
> How would you know, using standard C/C++, how many digits do you need to
> output in order to ensure no loss of bits when reading the value back?
> (And this is assuming that the C or C++ standard library being used has
> been implemented such that given enough decimal digits, they will be
> rounded to the correct direction as to restore the original value
> exactly.)

Yes, it's a bit tricky, but one can use e.g.
std::numeric_limits<T>::max_digits10 (since C++11).

From cppreference.com: "The value of
std::numeric_limits<T>::max_digits10 is the number of base-10 digits
that are necessary to uniquely represent all distinct values of the type
T, such as necessary for serialization/deserialization to text. This
constant is meaningful for all floating-point types."

If I wanted to ensure shortest decimal representation for IEEE
floating-point, I would just use the ToShortest() function from the
double-conversion library (https://github.com/google/double-conversion).

>
> It is my understanding that hexadecimal floating point representation
> *always* outputs the exact amount of digits to represent the value
> accurately.

Yes, hexfloat is defined as an exact representation.

James Kuyper

unread,
May 22, 2019, 8:40:43 AM5/22/19
to
On 5/22/19 2:40 AM, Juha Nieminen wrote:
...
> How would you know, using standard C/C++, how many digits do you need to
> output in order to ensure no loss of bits when reading the value back?

#include <float.h>, and look at the value of FLT_DECIMAL_DIG,
DBL_DECIMAL_DIG, or LDBL_DECIMAL_DIG, as appropriate. Paavo has already
given you the modern C++ equivalent, but this will also work with C and
with older versions of C++.


...
> It is my understanding that hexadecimal floating point representation
> *always* outputs the exact amount of digits to represent the value
> accurately.

It would be more accurate to say that his is true by default. If you
specify a particular length, it will obey your specification, whether or
not you specify enough digits to meet that requirement.

Bart

unread,
May 22, 2019, 9:10:31 AM5/22/19
to
On 22/05/2019 12:22, David Brown wrote:
> On 22/05/2019 02:07, Bart wrote:
>> On 21/05/2019 23:13, David Brown wrote:
>>> On 21/05/2019 20:56, Bart wrote:

>> You can call a unwillingness to expend a huge, disproportionate effort
>> in overcoming C's many shortcomings for this purpose a bug if you like.
>
> You are happy to classify your wilful and determined ignorance of C as a
> bug in yourself? Okay, I suppose.

OK, so anyone who doesn't want to kowtow must be ignorant of the
language. I suppose it couldn't be that it's just not worth the bother?

> Certainly the idea that this is all a "huge, disproportionate effort" is
> your own personal problem. Undefined behaviours in C are mostly quite
> clear and obvious, you rarely meet them in practice, and they are mostly
> straightforward to handle.

By the time you've finished getting it all perfect, the resulting output
would be a travesty of C source code. It's bad enough as it is. Then you
have to ask whether it is worthwhile.

Then there are newer features I'm using cannot that easily be expressed
in standard C.

Here's one very basic example: if A is signed, and B unsigned, then my
language says that A+B is performed as signed, with overflow
well-defined, and at at least 64 bits.

C says that A+B is performed as unsigned (although even that apparently
depends on their relative widths, but who can be arsed to go and dig all
that up). If I apply casts:

(int64)A+(int64)B

to get the signage right, now there is a risk of UB if there is
overflow. Now I actually have to change the casts to both, or apply
extra ones.

With + it doesn't matter, but what about * or /? And we've only just
tried to translate A+B!

So it is easy to see that it can be a considerably bigger pain to
generate perfectly correct C, than to generate ASM.

(I've decided to stop supporting a C target for my current languages, as
they were holding it back. Some things were getting from very messy, to
impossible, to express in C.)

> For a language generator, they are peanuts
> to deal with. These have been explained to you countless times.

/You/ do it for an ambitious language like mine. Then come back and tell
us if it's still peanuts.

> C is not based on any hardware model - it is more abstract.

Most Cs will use types based around 8, 16, 32 and 64-bit widths. How is
that not a model influenced by typical hardware? And actually, it keeps
its 'char' flexible enough to adapt to any machine word size; that
doesn't sound very abstract.

> Easy - put it under your control. Stop saying "this is C code" unless
> it is standards conforming C code that works with any standards
> conforming C compiler. If you want to write code that is only correct
> in certain circumstances, then say that the code is only useable in
> those circumstances. As shown above, the best way to do this is with
> compile-time checks.

Even easier: stop generating C code. But the downsides are:

* I can't make use of C's optimising compilers. Which means some of my
programs might run slower, some considerably so. (But more typically my
C compiler builds sqlite3.exe in 0.6 seconds instead of 0.45 seconds.)

* I can't share my programs for anyone else to build from 'C' source
(unless they want to use Windows binaries to run my compilers on the
true sources). But AFAIK no one was ever interested except to complain
about how many warnings were generated.

* I can't run on x64 Linux without some extra work, or ARM Linux without
a /lot/ of extra work.

> First, it is completely unreasonable to blame C for its suitability for
> a role /you/ choose for it.

I think if a language actually requires you to define:

#include <stdint.h>
typedef int64_t int64;
typedef unsigned char byte;

at the start (and that int64_t still causes problems because C could
variously define it on top of long int or long long int or something
else), then yes it can take some of the blame.

Paavo Helde

unread,
May 22, 2019, 9:42:45 AM5/22/19
to
On 22.05.2019 16:10, Bart wrote:
> Then there are newer features I'm using cannot that easily be expressed
> in standard C.
>
> Here's one very basic example: if A is signed, and B unsigned, then my
> language says that A+B is performed as signed, with overflow
> well-defined, and at at least 64 bits.

I don't question your design, I'm just curious: what would be the use
case of 64-bit signed wrapover? I.e. in what situation is it useful to have

9223372036854775807 + 1 == -9223372036854775808

For unsigned wrapover in C and C++ at least there is a use case for
emulating hardware bit registers, or to have an automatic reset for some
generated ID numbers.


>
> C says that A+B is performed as unsigned (although even that apparently
> depends on their relative widths, but who can be arsed to go and dig all
> that up). If I apply casts:
>
> (int64)A+(int64)B
>
> to get the signage right, now there is a risk of UB if there is
> overflow. Now I actually have to change the casts to both, or apply
> extra ones.
>
> With + it doesn't matter, but what about * or /? And we've only just
> tried to translate A+B!

Looks like this should be translated to C++, not to C, with appropriate
C++ number-like classes and custom arithmetic operators.

>
> So it is easy to see that it can be a considerably bigger pain to
> generate perfectly correct C, than to generate ASM.

I personally have found almost *everything* a bigger pain in C than,
say, C++.

David Brown

unread,
May 22, 2019, 10:13:48 AM5/22/19
to
On 22/05/2019 15:10, Bart wrote:
> On 22/05/2019 12:22, David Brown wrote:
>> On 22/05/2019 02:07, Bart wrote:
>>> On 21/05/2019 23:13, David Brown wrote:
>>>> On 21/05/2019 20:56, Bart wrote:
>
>>> You can call a unwillingness to expend a huge, disproportionate effort
>>> in overcoming C's many shortcomings for this purpose a bug if you like.
>>
>> You are happy to classify your wilful and determined ignorance of C as a
>> bug in yourself?  Okay, I suppose.
>
> OK, so anyone who doesn't want to kowtow must be ignorant of the
> language. I suppose it couldn't be that it's just not worth the bother?

Do you realise how idiotic you sound here? If you want to program in C,
learn C and program in C. It is not "kowtowing" - it is learning to use
the language.

If you don't think that C is worth learning, don't use C.

You sound exactly like "fir" and his prejudice against writing in a
language anyone else can understand.

>
>> Certainly the idea that this is all a "huge, disproportionate effort" is
>> your own personal problem.  Undefined behaviours in C are mostly quite
>> clear and obvious, you rarely meet them in practice, and they are mostly
>> straightforward to handle.
>
> By the time you've finished getting it all perfect, the resulting output
> would be a travesty of C source code. It's bad enough as it is. Then you
> have to ask whether it is worthwhile.

You are /generating/ C code. Except while you are debugging your code
generator, no one particularly cares what the generated code looks like.
Most C code generates produce something horrible, full of labels,
gotos, variables called "t1, t2, t3, ...", and so on. It does not
matter what this code looks like. All that matters is that it has the
behaviour you want, on the compilers and targets you want.

If you don't like C, and you think it is so hideous and
incomprehensible, then why bother with it at all? Crawl into your
hermit cave and use your own little languages that are perfect for you.
If, on the other hand, you want to talk to other people about
programming, then learn to use a language that other people use. If C
is not for you, join a Go group, or Java group, or Pascal group, or
whatever floats your boat. It cannot be good for your mental health and
happiness to surround yourself with programmers using languages you hate
so vehemently. And after all, fun and enjoyment should be the main
motivation here.


>
> Then there are newer features I'm using cannot that easily be expressed
> in standard C.
>
> Here's one very basic example: if A is signed, and B unsigned, then my
> language says that A+B is performed as signed, with overflow
> well-defined, and at at least 64 bits.
>

Assuming by "overflow well-defined" you mean "consistently but
pointlessly defined as wrapping", this is simple in C:

(int64_t) (((uint64_t) A) + ((uint64_t) B))

Who cares if one of the casts here is unnecessary, or there are extra
parenthesis? This is generated code - make the generator simple and
consistent.

> C says that A+B is performed as unsigned (although even that apparently
> depends on their relative widths, but who can be arsed to go and dig all
> that up).

Most experienced C programmers know it without looking it up. But sure,
put in the casts to make sure it is independent of sizes and targets.

> If I apply casts:
>
>    (int64)A+(int64)B
>
> to get the signage right, now there is a risk of UB if there is
> overflow. Now I actually have to change the casts to both, or apply
> extra ones.

How difficult do you want to make this for yourself? How thick is your
skull? Surely you are trolling? You are casting from unsigned to
signed, and then complaining that there is UB!

>
> With + it doesn't matter, but what about * or /? And we've only just
> tried to translate A+B!

(int64_t) (((uint64_t) A) + ((uint64_t) B))
(int64_t) (((uint64_t) A) - ((uint64_t) B))
(int64_t) (((uint64_t) A) * ((uint64_t) B))

Do you see a pattern?

Only division is a little more complicated, but it will not overflow or
have undefined behaviour unless you are dividing by 0, or dividing
LLONG_MIN by -1. Take care of these special cases as you feel appropriate.


>
> So it is easy to see that it can be a considerably bigger pain to
> generate perfectly correct C, than to generate ASM.

It is easy to see that it is easy in C to get the same semantics.

>
> (I've decided to stop supporting a C target for my current languages, as
> they were holding it back. Some things were getting from very messy, to
> impossible, to express in C.)
>
>> For a language generator, they are peanuts
>> to deal with.  These have been explained to you countless times.
>
> /You/ do it for an ambitious language like mine. Then come back and tell
> us if it's still peanuts.

The points under discussion /are/ peanuts.

I am not telling you how to parse and interpret your language, or any of
the other aspects of making a translator - there is plenty of hard
coding there. I am just telling you a bit about the C code you should
be generating, and that bit /is/ simple.

>
>> C is not based on any hardware model - it is more abstract.
>
> Most Cs will use types based around 8, 16, 32 and 64-bit widths. How is
> that not a model influenced by typical hardware? And actually, it keeps
> its 'char' flexible enough to adapt to any machine word size; that
> doesn't sound very abstract.

The C /language/ is not based on any hardware model, it is abstract. C
/implementations/ pick concrete sizes and types that are suitable for
the hardware they target.

>
>> Easy - put it under your control.  Stop saying "this is C code" unless
>> it is standards conforming C code that works with any standards
>> conforming C compiler.  If you want to write code that is only correct
>> in certain circumstances, then say that the code is only useable in
>> those circumstances.  As shown above, the best way to do this is with
>> compile-time checks.
>
> Even easier: stop generating C code. But the downsides are:
>
> * I can't make use of C's optimising compilers. Which means some of my
> programs might run slower, some considerably so. (But more typically my
> C compiler builds sqlite3.exe in 0.6 seconds instead of 0.45 seconds.)
>
> * I can't share my programs for anyone else to build from 'C' source
> (unless they want to use Windows binaries to run my compilers on the
> true sources). But AFAIK no one was ever interested except to complain
> about how many warnings were generated.
>
> * I can't run on x64 Linux without some extra work, or ARM Linux without
> a /lot/ of extra work.
>

So make your decision. Either give up on everything to do with C, or
decide to treat the language and its tools properly.

I welcome the day when you start making posts saying "This is some code
in my language, with these semantics. How can I generate C code that
has the same meaning?" Then I will enjoy giving you help and
suggestions. At the moment, however, all we get are posts saying "This
is some code in my language, with these semantics. A direct copy to C
does not do the same thing - therefore C is a terrible language". Is it
no wonder why so many people have given up trying to help you or reason
with you?

>> First, it is completely unreasonable to blame C for its suitability for
>> a role /you/ choose for it.
>
> I think if a language actually requires you to define:
>
>    #include <stdint.h>
>    typedef int64_t int64;
>    typedef unsigned char byte;
>
> at the start (and that int64_t still causes problems because C could
> variously define it on top of long int or long long int or something
> else), then yes it can take some of the blame.
>

C does not require that, and even if it did, who cares?

blt_u...@a8wevoi6xp.ac.uk

unread,
May 22, 2019, 11:46:16 AM5/22/19
to
On Tue, 21 May 2019 20:47:07 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>type punning through casting pointers." It doesn't work because it
>doesn't work, end of story.

This is actually becoming laughable. Despite me clearly demostrating that
it DOES work you're still claiming the opposite. You ever thought of working
for the Ministry of Truth?

>If you want another example, there is one in the posting of another
>person:
>
> int foo( float *f, int *i ) {
> *i = 1;
> *f = 0.f;
>
> return *i;
> }
>
> int main() {
> int x = 0;
>
> std::cout << x << "\n"; // Expect 0
> x = foo(reinterpret_cast<float*>(&x), &x);
> std::cout << x << "\n"; // Expect 0?
> }

Sorry, and this is supposed to prove what? You pass the same address as both
parameters and set the memory there to float zero before returning the
deferenced value as an int which - SURPRISE! - is also zero!

What the hell else is it going to return?? Did you actually try running it?

>You will find many similar examples in articles on the internet about
>strict aliasing.

How about finding one that acually proves your point.

It is loading more messages.
0 new messages