struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };
static int read_s1x(struct s1 *p) { return p->x; }
static void write_s2x(struct s2 *p, int v) { p->x=v;}
int test(union s1s2 *p1, union s1s2 *p2, union s1s2 *p3)
{
if (read_s1x(&p1->v1))
{
unsigned short temp;
temp = p3->v1.x;
p3->v2.x = temp;
write_s2x(&p2->v2,1234);
temp = p3->v2.x;
p3->v1.x = temp;
}
return read_s1x(&p1->v1);
}
int test2(int x)
{
union s1s2 q[2];
q->v1.x = 4321;
return test(q,q+x,q+x);
}
#include <stdio.h>
int main(void)
{
printf("%d\n",test2(0));
}
On segunda-feira, 25 de setembro de 2017 13:41:55 PDT Myriachan wrote:
> Both GCC and Clang in -fstrict-aliasing mode with optimizations are acting
> as if they ran into undefined behavior, and return 4321 instead of the
> expected 1234. This happens in both C and C++ mode. Intel C++ and Visual
> C++ return the expected 1234. All four compilers hardwire the result as a
> constant parameter to printf rather than call test2 or modify memory at
> runtime.
>
> From my reading of the C++ Standard, particularly [class.union]/5,
> assignment expressions through a union member access changes the active
> member of the union (if the union member has a trivial default constructor,
> which it does here, being C code). Taking the address of p2->v2 and p1->v1
> ought to be legal because those are the active members of the union at the
> time their pointers are taken.
>
> Is this a well-defined program, or is there subtle undefined behavior
> happening here?
Reading from an inactive member of the union is UB. However, reading from
members of the struct belonging to a common initial sequence is not. See
12.2 [class.mem]/23
The only thing I am not so sure of is the read_s1x and write_s2x functions:
since they take pointers to different types, is the compiler allowed to assume
that write_s2x() cannot modify an object of type s1?
The only thing I am not so sure of is the read_s1x and write_s2x functions:
since they take pointers to different types, is the compiler allowed to assume
that write_s2x() cannot modify an object of type s1?Not via the parameter, anyway (not even with reinterpret_cast and launder).
// active member of q[0] at start is v1.
if (read_s1x(&p1->v1))
{
unsigned short temp;
temp = p3->v1.x; // read of v1, the current active member of q[0].
p3->v2.x = temp;
// active member of q[0] is now v2.
write_s2x(&p2->v2,1234);
temp = p3->v2.x; // read of v2, the current active member of q[0].
p3->v1.x = temp;
// active member of q[0] is now v1.
}
// active member of q[0] is v1 regardless of path "if" takes.
return read_s1x(&p1->v1);
There is no case within the code that reads an inactive member of the union - the active member is changed by the assignment operators done through a union access expression ([class.union]/5).
On terça-feira, 26 de setembro de 2017 07:10:43 PDT Hyman Rosen wrote:
> Unions were always the reinterpret_cast of C. They weren't only used to
> save space. They were used to access data of one type as data of another
> type. (Picking apart the bits of floating-point numbers is the
> paradigmatic example.) Then the optimizationists ruined everything.
That was never officially allowed.
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber wrote:What I don't get are his endless rants about this.
He wants everyone to be able to write such code and have it mean the same thing everywhere.
Even if they don't want to.
On Tuesday, September 26, 2017 at 9:56:27 AM UTC-7, Hyman Rosen wrote:On Tue, Sep 26, 2017 at 11:26 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber wrote:What I don't get are his endless rants about this.He wants everyone to be able to write such code and have it mean the same thing everywhere.Even if they don't want to.
Yes.
<rant>
As I have said (or ranted) many times before, the purpose of a programming language is to control the operation of a computer. It is best when the programming language constructs have straightforward and unambiguous meaning because that enhances the ability of everyone involved - the authors, the readers, and the programming systems - to agree what the program does. When language constructs are unclear, ambiguous, unspecified, or undefined, different parties may understand the meaning of the program differently, causing errors to go undetected. In the case of unspecified or undefined behavior, the programming system may initially appear to agree with the intentions of the programmer, but secretly permit itself to disagree, so that future builds of the program, perhaps years later, no longer perform as the programmer intended.
The purpose of optimization is to change some aspect of a program (usually its speed, sometimes its size) while not changing its meaning. But C and C++ have allowed optimization opportunities to feed back into the language design, resulting in a plethora of unspecified and undefined behavior in the languages just so optimizers may make assumptions about the code, assumptions that are easily unwarranted because they cover constructs that have been widely used and have "worked", precisely because these languages have been used for "low-level" close-to-the-machine system development where aliasing, bit-fiddling, integer overflow, and wide-ranging pointer manipulation are important. Moreover, the details of what behaviors are not allowed are themselves difficult to specify clearly, so programmers cannot tell whether they are following the rules or not.We are now in a situation where we supposedly cannot write std::vector in standard C++.
We are now in a situation where a() += b(), a() << b(), and a() <= b() each have different rules for the order of calling a() and b().
We are in a situation where the standard cannot even specify the function prototypes of the classes it defines, but must resort to weasel words like "this function does not participate in overload resolution when...".
We are in a situation where C++ has become overwhelmingly complex, and where traps lie in wait for programmers, who cannot even be wary because the dangerous areas and the safe areas are fractally intertwined.</rant>I kind of wish this were in a different thread, because the code I copied in the original message appears to me to be well-defined even in the current Standard.
On Tue, Sep 26, 2017 at 11:26 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber wrote:What I don't get are his endless rants about this.He wants everyone to be able to write such code and have it mean the same thing everywhere.Even if they don't want to.
Yes.
<rant>
As I have said (or ranted) many times before, the purpose of a programming language is to control the operation of a computer.
It is best when the programming language constructs have straightforward and unambiguous meaning because that enhances the ability of everyone involved - the authors, the readers, and the programming systems - to agree what the program does. When language constructs are unclear, ambiguous, unspecified, or undefined, different parties may understand the meaning of the program differently, causing errors to go undetected. In the case of unspecified or undefined behavior, the programming system may initially appear to agree with the intentions of the programmer, but secretly permit itself to disagree, so that future builds of the program, perhaps years later, no longer perform as the programmer intended.
The purpose of optimization is to change some aspect of a program (usually its speed, sometimes its size) while not changing its meaning. But C and C++ have allowed optimization opportunities to feed back into the language design,
resulting in a plethora of unspecified and undefined behavior in the languages just so optimizers may make assumptions about the code, assumptions that are easily unwarranted because they cover constructs that have been widely used and have "worked", precisely because these languages have been used for "low-level" close-to-the-machine system development where aliasing, bit-fiddling, integer overflow, and wide-ranging pointer manipulation are important. Moreover, the details of what behaviors are not allowed are themselves difficult to specify clearly, so programmers cannot tell whether they are following the rules or not.We are now in a situation where we supposedly cannot write std::vector in standard C++.
We are now in a situation where a() += b(), a() << b(), and a() <= b() each have different rules for the order of calling a() and b().
We are in a situation where the standard cannot even specify the function prototypes of the classes it defines, but must resort to weasel words like "this function does not participate in overload resolution when...".
We are in a situation where C++ has become overwhelmingly complex, and where traps lie in wait for programmers, who cannot even be wary because the dangerous areas and the safe areas are fractally intertwined.
int g() { return [i = 0] {
union { struct { int x; } v1; struct { int x; } v2; } q[2]{{4321}};
q[0].v2 = { q[0].v1.x };
[&qv2 = q[i].v2]{ qv2.x = 1234; }();
[&q3 = q[0]] { q3.v1 = { q3.v2.x }; }();
return [&qv1 = q[0].v1]{ return qv1.x; }();
}(); }
Richard Smith 2017-09-26 15:50:24 PDTSlightly simpler example: struct s1 {unsigned short x;}; struct s2 {unsigned short x;}; union s1s2 { struct s1 v1; struct s2 v2; }; static int read_s1x(struct s1 *p) { return p->x; } static void write_s2x(struct s2 *p, int v) { p->x=v;} int test(union s1s2 *p1, union s1s2 *p2) { if (p1->v1.x) { write_s2x(&p2->v2,1234); return read_s1x(&p1->v1); } return 0; } int test2(int x) { union s1s2 u = {.v2.x = 4321}; return test(&u, &u); } Note that this never even changes the active union member (it's always v2); instead it relies on the "common initial sequence" rule for the two loads through 'v1.x'.
On Tue, Sep 26, 2017 at 11:26 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber wrote:What I don't get are his endless rants about this.He wants everyone to be able to write such code and have it mean the same thing everywhere.Even if they don't want to.
Yes.
<rant>
As I have said (or ranted) many times before, the purpose of a programming language is to control the operation of a computer.
It is best when the programming language constructs have straightforward and unambiguous meaning because that enhances the ability of everyone involved - the authors, the readers, and the programming systems - to agree what the program does.
When language constructs are unclear, ambiguous, unspecified, or undefined, different parties may understand the meaning of the program differently, causing errors to go undetected.
In the case of unspecified or undefined behavior, the programming system may initially appear to agree with the intentions of the programmer, but secretly permit itself to disagree, so that future builds of the program, perhaps years later, no longer perform as the programmer intended.
The purpose of optimization is to change some aspect of a program (usually its speed, sometimes its size) while not changing its meaning.
But C and C++ have allowed optimization opportunities to feed back into the language design, resulting in a plethora of unspecified and undefined behavior in the languages just so optimizers may make assumptions about the code, assumptions that are easily unwarranted because they cover constructs that have been widely used and have "worked", precisely because these languages have been used for "low-level" close-to-the-machine system development where aliasing, bit-fiddling, integer overflow, and wide-ranging pointer manipulation are important. Moreover, the details of what behaviors are not allowed are themselves difficult to specify clearly, so programmers cannot tell whether they are following the rules or not.
We are now in a situation where we supposedly cannot write std::vector in standard C++.
We are now in a situation where a() += b(), a() << b(), and a() <= b() each have different rules for the order of calling a() and b().
We are in a situation where the standard cannot even specify the function prototypes of the classes it defines, but must resort to weasel words like "this function does not participate in overload resolution when...".
We are in a situation where C++ has become overwhelmingly complex, and where traps lie in wait for programmers, who cannot even be wary because the dangerous areas and the safe areas are fractally intertwined.
</rant>
在 2017年9月27日星期三 UTC+8上午12:56:27,Hyman Rosen写道:On Tue, Sep 26, 2017 at 11:26 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber wrote:What I don't get are his endless rants about this.He wants everyone to be able to write such code and have it mean the same thing everywhere.Even if they don't want to.
Yes.I have tired to repeat again. But you always forget the point, so...(Anyway, you'd better remember, your problems here have almost nothing to do with C++.)<rant>
As I have said (or ranted) many times before, the purpose of a programming language is to control the operation of a computer.False. You have multiple misconceptions.First, a programming language in general is always abstract because the rules consists it cannot be concrete. It can live without a computer. (This is similar to an algorithm.)Second, a programming language in practice often has nothing to do with any computer.
They deal things with models, for exmpale, abstraction machines (as C and C++ do), or formal systems. For any programming language need to be portable, no computer can be the model simply because no one can manufacture such a computer to be compatible with any other ones, physically.Only the implementations of programming language can target specific computers (though it is still not guaranteed in general).Third, to use a programming language to work is a matter of programmers, not programming languages.So where is your purpose come?It is best when the programming language constructs have straightforward and unambiguous meaning because that enhances the ability of everyone involved - the authors, the readers, and the programming systems - to agree what the program does.False. It is often true in industrial that we must have consensus to avoid overspecialization things to waste time.To clarify a program the meaning that no one is interested does not help.
To forbid such program being constructed is in general not feasible because you have no way to detect the "interest" or "intention". Only reasonable cost can be paid for it.When language constructs are unclear, ambiguous, unspecified, or undefined, different parties may understand the meaning of the program differently, causing errors to go undetected.These adjectives are not the same, specifically serving different purposes with different sets of agreements. Why mix them together? Or just because you failed to distinguish them?
In the case of unspecified or undefined behavior, the programming system may initially appear to agree with the intentions of the programmer, but secretly permit itself to disagree, so that future builds of the program, perhaps years later, no longer perform as the programmer intended.That's a QoI problem, a fault of the programmer, or both, by design.
Not to defend his rant, but:
Now for C/C++, this is a tradeoff. By leaving certain things undefined, the language becomes more useful to us. We wouldn't be able to cast things to `void*` and back (a perfectly well-defined operation), if we had to statically ensure that UB was not possible. So we accept the problem in the language because doing so offers us benefits which we could not get another way.
Not to defend his rant, but:
On Wednesday, September 27, 2017 at 1:51:27 AM UTC-4, FrankHB1989 wrote:在 2017年9月27日星期三 UTC+8上午12:56:27,Hyman Rosen写道:On Tue, Sep 26, 2017 at 11:26 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber wrote:What I don't get are his endless rants about this.He wants everyone to be able to write such code and have it mean the same thing everywhere.Even if they don't want to.
Yes.I have tired to repeat again. But you always forget the point, so...(Anyway, you'd better remember, your problems here have almost nothing to do with C++.)<rant>
As I have said (or ranted) many times before, the purpose of a programming language is to control the operation of a computer.False. You have multiple misconceptions.First, a programming language in general is always abstract because the rules consists it cannot be concrete. It can live without a computer. (This is similar to an algorithm.)Second, a programming language in practice often has nothing to do with any computer.While yes, programming languages often do define "models", "abstract machines" and the like, those "models" and "abstract machines" are based on actual computers to some degree. Memory in the C++ memory model is laid out as a sequence of "bytes" because we know that's how memory works on computers. Java requires 2's complement integer math because all of the platforms that Java is interested in supporting offer 2's complement integer math natively. C/C++ do not make 2's complement part of their abstract machines, because they want to be able to run fast on non-2's complement machines.
Programming languages may be written against models, but those models are always designed with an eye to actual machines. The reality of where the implementations are expected to be implemented informs the models we use to abstract them.
So while it's wrong to say that programming languages are for computers, it's just as wrong to say that they're purely for models too.Oh, and "in practice", programming languages always have to do with actual computers. Because "in practice" means that you're writing code that you intend to run on one or more implementations. "In theory" would be when you care solely about writing against the abstraction.
They deal things with models, for exmpale, abstraction machines (as C and C++ do), or formal systems. For any programming language need to be portable, no computer can be the model simply because no one can manufacture such a computer to be compatible with any other ones, physically.Only the implementations of programming language can target specific computers (though it is still not guaranteed in general).Third, to use a programming language to work is a matter of programmers, not programming languages.So where is your purpose come?It is best when the programming language constructs have straightforward and unambiguous meaning because that enhances the ability of everyone involved - the authors, the readers, and the programming systems - to agree what the program does.False. It is often true in industrial that we must have consensus to avoid overspecialization things to waste time.To clarify a program the meaning that no one is interested does not help.And yet, people keep writing them, so obviously someone is "interested" in that meaning.
To forbid such program being constructed is in general not feasible because you have no way to detect the "interest" or "intention". Only reasonable cost can be paid for it.When language constructs are unclear, ambiguous, unspecified, or undefined, different parties may understand the meaning of the program differently, causing errors to go undetected.These adjectives are not the same, specifically serving different purposes with different sets of agreements. Why mix them together? Or just because you failed to distinguish them?Because the distinctions are essentially irrelevant to his point. That being that, if you write a program that does certain things, the language does not clearly state what will happen. From a user perspective, the code's behavior is unknown. They have a certain expectation of what "ought" to happen, but the language has (typically esoteric) rules that make the code not do what they believe they have written.The specific word you use for such circumstances is irrelevant. What matters is that you wrote X, and the code looks like it should do X, but it may not. This creates confusion between the user's intent and the language's definition. Which leads to the potential for errors, which are not easy to catch, since such circumstances are allowed to appear to work.Compilation failures tell the user that what they tried is non-functional. To allow a program to compile, yet for it to still not be functional as described, creates problems.In the case of unspecified or undefined behavior, the programming system may initially appear to agree with the intentions of the programmer, but secretly permit itself to disagree, so that future builds of the program, perhaps years later, no longer perform as the programmer intended.That's a QoI problem, a fault of the programmer, or both, by design.That's essentially a tautology. You're saying the programming language is right because it's right.
If a language, "by design," creates lots of circumstances where useful code looks like it will work one way, but in fact works another way or has unpredictable results, that's a problem with the "design" of the language.
Now for C/C++, this is a tradeoff. By leaving certain things undefined, the language becomes more useful to us. We wouldn't be able to cast things to `void*` and back (a perfectly well-defined operation), if we had to statically ensure that UB was not possible. So we accept the problem in the language because doing so offers us benefits which we could not get another way.But that acceptance should not be used to say that the problem doesn't exist. If a language frequently promotes misunderstanding, that's a fault in the language, not in the programmer. You may still use it anyway, but let's not pretend it's not actually a problem.
Not to defend his rant, but:
On Wednesday, September 27, 2017 at 1:51:27 AM UTC-4, FrankHB1989 wrote:在 2017年9月27日星期三 UTC+8上午12:56:27,Hyman Rosen写道:On Tue, Sep 26, 2017 at 11:26 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber wrote:What I don't get are his endless rants about this.He wants everyone to be able to write such code and have it mean the same thing everywhere.Even if they don't want to.
Yes.I have tired to repeat again. But you always forget the point, so...(Anyway, you'd better remember, your problems here have almost nothing to do with C++.)<rant>
As I have said (or ranted) many times before, the purpose of a programming language is to control the operation of a computer.False. You have multiple misconceptions.First, a programming language in general is always abstract because the rules consists it cannot be concrete. It can live without a computer. (This is similar to an algorithm.)Second, a programming language in practice often has nothing to do with any computer.While yes, programming languages often do define "models", "abstract machines" and the like, those "models" and "abstract machines" are based on actual computers to some degree. Memory in the C++ memory model is laid out as a sequence of "bytes" because we know that's how memory works on computers. Java requires 2's complement integer math because all of the platforms that Java is interested in supporting offer 2's complement integer math natively. C/C++ do not make 2's complement part of their abstract machines, because they want to be able to run fast on non-2's complement machines.
Programming languages may be written against models, but those models are always designed with an eye to actual machines. The reality of where the implementations are expected to be implemented informs the models we use to abstract them.
So while it's wrong to say that programming languages are for computers, it's just as wrong to say that they're purely for models too.Oh, and "in practice", programming languages always have to do with actual computers. Because "in practice" means that you're writing code that you intend to run on one or more implementations. "In theory" would be when you care solely about writing against the abstraction.
They deal things with models, for exmpale, abstraction machines (as C and C++ do), or formal systems. For any programming language need to be portable, no computer can be the model simply because no one can manufacture such a computer to be compatible with any other ones, physically.
Only the implementations of programming language can target specific computers (though it is still not guaranteed in general).Third, to use a programming language to work is a matter of programmers, not programming languages.So where is your purpose come?It is best when the programming language constructs have straightforward and unambiguous meaning because that enhances the ability of everyone involved - the authors, the readers, and the programming systems - to agree what the program does.False. It is often true in industrial that we must have consensus to avoid overspecialization things to waste time.To clarify a program the meaning that no one is interested does not help.
And yet, people keep writing them, so obviously someone is "interested" in that meaning.
To forbid such program being constructed is in general not feasible because you have no way to detect the "interest" or "intention". Only reasonable cost can be paid for it.When language constructs are unclear, ambiguous, unspecified, or undefined, different parties may understand the meaning of the program differently, causing errors to go undetected.These adjectives are not the same, specifically serving different purposes with different sets of agreements. Why mix them together? Or just because you failed to distinguish them?Because the distinctions are essentially irrelevant to his point. That being that, if you write a program that does certain things, the language does not clearly state what will happen. From a user perspective, the code's behavior is unknown. They have a certain expectation of what "ought" to happen, but the language has (typically esoteric) rules that make the code not do what they believe they have written.The specific word you use for such circumstances is irrelevant. What matters is that you wrote X, and the code looks like it should do X, but it may not. This creates confusion between the user's intent and the language's definition. Which leads to the potential for errors, which are not easy to catch, since such circumstances are allowed to appear to work.Compilation failures tell the user that what they tried is non-functional. To allow a program to compile, yet for it to still not be functional as described, creates problems.In the case of unspecified or undefined behavior, the programming system may initially appear to agree with the intentions of the programmer, but secretly permit itself to disagree, so that future builds of the program, perhaps years later, no longer perform as the programmer intended.That's a QoI problem, a fault of the programmer, or both, by design.That's essentially a tautology. You're saying the programming language is right because it's right.
If a language, "by design," creates lots of circumstances where useful code looks like it will work one way, but in fact works another way or has unpredictable results, that's a problem with the "design" of the language.
Now for C/C++, this is a tradeoff. By leaving certain things undefined, the language becomes more useful to us. We wouldn't be able to cast things to `void*` and back (a perfectly well-defined operation), if we had to statically ensure that UB was not possible. So we accept the problem in the language because doing so offers us benefits which we could not get another way.But that acceptance should not be used to say that the problem doesn't exist. If a language frequently promotes misunderstanding, that's a fault in the language, not in the programmer. You may still use it anyway, but let's not pretend it's not actually a problem.
If a language, "by design," creates lots of circumstances where useful code looks like it will work one way, but in fact works another way or has unpredictable results, that's a problem with the "design" of the language.
If you want your code follow same principle then you too can ignore all things that standard say about UB.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
On Wed, Oct 18, 2017 at 2:57 PM, <inkwizyt...@gmail.com> wrote:On Wednesday, October 18, 2017 at 8:41:53 PM UTC+2, Hyman Rosen wrote:On Wed, Oct 18, 2017 at 2:29 PM, <inkwizyt...@gmail.com> wrote:If you want your code follow same principle then you too can ignore all things that standard say about UB.
No, that's completely wrong. Because of the optimizationists, compilers assume
that programs do not execute undefined behavior, and if any code path leads to
undefined behavior, the compiler assumes that this path will not be taken and does
not translate that path to behave as the programmer expects in the "bag of bits"
model. That means that if I do
union { double d; unsigned long long u; }; d = 1; printf("%llu\n", u);
the compiler will notice that it is undefined behavior for me to access u after setting
d and it can remove the entire call to printf.You completely miss the point, IF you ditch portability you can do any thing you want. You can use compiler that do not do it, use special flags, do not optimize, use intrinsic etc. etc.You could even write your own compiler od change existing ones.Only problem is when you pretend you write portable code and compilers assume that you did it. If you break contact do not except compiler to follow it too.
You completely miss the point. The David Gay code in question was first written in 1991,
and has worked as expected on a huge variety of machines and compilers. But new compilers
give themselves permission to destroy paths that involve undefined behavior and are more likely
to think they have found such paths, and so this code that's a quarter century old can start breaking
now just by being rebuilt.
And I don't know why you think the code isn't portable.
On Wednesday, October 18, 2017 at 3:31:58 PM UTC-4, Hyman Rosen wrote:And I don't know why you think the code isn't portable.Because the standard is what defines portability.
Sure, a piece of code can happen to work as you expect on some number of compilers. Or indeed all of them. But without the standard explicitly specifying that it behave as you expect, you're just relying on hope that it will continue to behave as you expect.Indeed, is that not exactly the problem you want "corrected"? That the standard gives compilers the right to make this code not work as you expect, and you want to deny it that right? What other purpose is there for putting such things in the standard except to ensure portability?
--
The day c++ stops being ultimately, prettified C, is the day it becomes obsolete.There are many high level languages that with abstract memory models to choose from. The attraction of c++ (at least for me) is that it allows high level constructs and expression while touching the real machine.The removal of type-punning from unions makes unions useless for the only job they were ever designed to do - to overlay different-shaped views over the same bag of bits.
If you can't do that with them, the entire keyword is pointless, as you can get the same behaviour as union by simply reinterpret-casting a std::aligned_storage::type.
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
The removal of type-punning from unions makes unions useless for the only job they were ever designed to do - to overlay different-shaped views over the same bag of bits.
> So their use to implement space-efficient sum types was what, an unfortunate accident?No, it was intended, but this was of course before there were variadic templates. Since we have those plus type traits, we can implement a union with nothing more than std::aligned_storage<max_size<Ts...>::value, max_align<Ts...>::value>>::type
> And you can convert bit representations between types using memcpy.You can indeed - probably the most obtuse expression of "I just want this memory to be treated like an int" one could think of. Please remember, I do understand the rational for the current standard. I did say that.
There seems to be an obvious mismatch between programmers who want
bit-blasting at all costs
and users who want TBAA. It might perhaps be useful to ask whether
there's any hints/remnants/evidence/even hearsay
of how dmr viewed it when C was developed. There's a non-zero chance
that some people might be able
to provide more than guesses at that.
This isn't an easy problem to solve >.<
On Fri, Oct 20, 2017 at 1:21 PM, Richard Hodges <hodg...@gmail.com> wrote:The removal of type-punning from unions makes unions useless for the only job they were ever designed to do - to overlay different-shaped views over the same bag of bits.That isn't what K&R1 says about unions. What is your reference document which states that was what they were designed to do?K&R1: "so long as the usage is consistent: the type retrieved must be the type most recently stored"
--
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
Anyone else care to approach this in a positive way?
> Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.We could, but that would be missing the most important thing, which is what do *I* want and what do other users of C++ want? Mr Kernighan and Mr Ritchie have had their time bashing keys. I expect they are enjoying a profitable retirement.What do we want? Of course we want it all - awesome optimisation plus the ability to directly address memory bytes through an object-shaped lens.I personally don't think that's difficult to provide, so why not provide it?
All we need is some rule such as "whenever a union is or could be addressed through some other lens other than the one that was previously written, all underlying bytes will have deemed to have been written, and the next read object will be *as if* its corresponding bytes had been written".
Then the union would be perfectly type-punnable and perfectly optimisable. This would even allow unions to be used for type punning in constexpr environments - such as for determining endianness.I have now posted two possible solutions, while the rest of the community seems intent solely on defending a partisan position.Anyone else care to approach this in a positive way?ROn 24 October 2017 at 06:25, Thiago Macieira <thi...@macieira.org> wrote:On Monday, 23 October 2017 21:15:31 PDT Richard Hodges wrote:
> I don’t think it’s particulalry useful to ponder what K&R meant by this or
> that. They weren’t holy prophets, just guys trying to make assembler less
> of a pain to write.
Ok, then we mustn't interpret when they write "implementation-defined" as the
current meaning. They could have meant what we today understand to be UB.
Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
Absolutely! It doesn't even have to be provided in the language; a library solution will serve perfectly well.
Just as well you've never looked to see what the *processor* vendors are doing with your code, then.
> Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.We could, but that would be missing the most important thing, which is what do *I* want and what do other users of C++ want? Mr Kernighan and Mr Ritchie have had their time bashing keys. I expect they are enjoying a profitable retirement.What do we want? Of course we want it all - awesome optimisation plus the ability to directly address memory bytes through an object-shaped lens.I personally don't think that's difficult to provide, so why not provide it?All we need is some rule such as "whenever a union is or could be addressed through some other lens other than the one that was previously written, all underlying bytes will have deemed to have been written, and the next read object will be *as if* its corresponding bytes had been written".Then the union would be perfectly type-punnable and perfectly optimisable.
On Tue, Oct 24, 2017 at 11:30 AM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:Just as well you've never looked to see what the *processor* vendors are doing with your code, then.
Why do you think I've never looked?
<https://www.theregister.co.uk/2017/06/25/intel_skylake_kaby_lake_hyperthreading/>
But the processor vendors are not trying to destroy the programmer's coding model.
From the programmer's point of view, the machine presents registers and memory,
and those are read and written as the program says, regardless of the machinations
that the processor may be doing under the surface. The compilers, on the other hand,
are looking for any excuse to throw away code that the programmer has written and notdo what the programmer intended. "You couldn't possibly have meant to do that, so I'm
just going to ignore your instructions and not bother to tell you" is something that would
get a human employee fired, and it's no less displeasing in a compiler.
And by the way, a minute after my last post, a colleague called me over with exactly the
floating-point problem I had posted - things that should have been exactly equal were
comparing unequal. I suggested the usual fix, saving values in volatile double variables,
and the problem went away. (Gcc on Intel using x87 rather than SSE for floating-point.)
--
So you don't find out-of-order and speculative execution disturbing?
You aren't concerned with how memory accesses appear to other threads and to signal handlers?
Compile with -mfpmath=sse and stop supporting older than Pentium III (1999).
There's nothing wrong with the language.
On 24 Oct 2017 03:47, "Richard Hodges" <hodg...@gmail.com> wrote:All we need is some rule such as "whenever a union is or could be addressed through some other lens other than the one that was previously written, all underlying bytes will have deemed to have been written, and the next read object will be *as if* its corresponding bytes had been written".Then the union would be perfectly type-punnable and perfectly optimisable.Actually, no, this is not perfectly optimizable. In fact, it invalidates a whole class of profitable optimisations based on type-based alias analysis. It's also harmful to other aspects of the language (eg, constant expression evaluation cannot respect these rules in general).
--
Melissa--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
On Tuesday, October 24, 2017 at 6:01:58 PM UTC-7, Myriachan wrote:On Tuesday, October 24, 2017 at 10:29:42 AM UTC-7, Richard Smith wrote:On 24 Oct 2017 03:47, "Richard Hodges" <hodg...@gmail.com> wrote:All we need is some rule such as "whenever a union is or could be addressed through some other lens other than the one that was previously written, all underlying bytes will have deemed to have been written, and the next read object will be *as if* its corresponding bytes had been written".Then the union would be perfectly type-punnable and perfectly optimisable.Actually, no, this is not perfectly optimizable. In fact, it invalidates a whole class of profitable optimisations based on type-based alias analysis. It's also harmful to other aspects of the language (eg, constant expression evaluation cannot respect these rules in general).What would be the right solution, then? The proposals from the compiler writes' side so far have been to more or less remove the "common sequence" rule from the language in favor of requiring that all such accesses go through a union type. This would break a lot of system APIs and other existing code without providing a good solution.I just thought of something else: what if the aliasing rules were to ignore classes entirely, and instead only dealt with primitive types? How much would that break type-based aliasing analysis's ability to optimize? The common subsequence rule would be implicit, because you're ultimately reading using the correct type for what was written there before.
To me, that's what the rule ought to be for aliasing: if a memory location is written as primitive type X, it must be read back as either cv X or cv byte type (std::byte, char, unsigned char). (Placement new without initialization would be considered a write of an indefinite value.) It shouldn't matter whether a class was involved, nor the identity of said classes.The rule would make something like this well-defined:struct X { ... int i; ... };alignas(X) unsigned char b[sizeof(X)];X *x = new(b) X;x->i = 2;assert(*reinterpret_cast<int *>(&b[offsetof(X, i)]) == 2);...which most C++ programmers expect to work, but technically doesn't.
Melissa
> I just thought of something else: what if the aliasing rules were to ignore classes entirely, and instead only dealt with primitive types?
I think that has a ring of sensibility to it.Furthermore in the case of a union of a and b, the compiler already knows that a and b live at the same address. It should treat them as just different views of the same memory. The actual 'object' is the underlying byte array holding the entire union. a and b are not 'objects' at all - just shapes of memory access (or at least should be IMHO).
Noting the above example, the idea that placement new could return an x that differs from &b seems to me to be just daft. If you can't placement-new an X at &b then the compiler/runtime should barf at that point - not just move the object.
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
> However the equivalent (and barely any more verbose) code using memcpy to load the int from its storage location is guaranteed to work.
Are you saying that following the assignment x->i = 2;
memcpy(&some_int, &b[offsetof(X, i)], sizeof(int)); will copy the value 2 to some_int?
Is that to say that the presence of memcpy causes the compiler to 'flush' all as-ifs to memory prior to the flow of control going over the memcpy?
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
> Another opinion is that C++ has and should have a data model
Agreed> that the type of an object determines how its storage can be accessedThe type should aid the programmer in making correct and intuitive decisions for the vast majority of cases, which the current c++ data model does.> Byte-wise storage access is still available as an escape hatchIt should be, but it's not really is it? The only way to get defined behaviour is to memcpy from one imaginary object to another. The memcpy-is-really-bit-alias paradigm is verbose, difficult to teach and creates programs who's source code is basically lying.
For example, what would be wrong with this model?union U {int a;float b;} u;u.a = 1;auto val = u.b; // get the float who's integer representation is binary 1
The compiler is absolutely in a position to determine that a and b are aliases, and their bitwise configurations are the very same array of N bits. The object is u, a and b are merely views of it.
Similarlyfoo(u); // where foo is declared as extern void foo(U&)Must surely cause the compiler to assume that the write of u.a *must* be visible in the bits of u.b prior to and after the call, otherwise the call might fail and any reads of u after the call might be invalid.
> The issue is that the expression u.b is an lvalue of type float.And here is the problem.u::b's type should not be 'float'. It should be 'a float-like-interface on an array of bytes that represents union { int; float; }'. The 'object' is better viewed as 'u' - the bag of bits, not 'a' or 'b'. If we view it that way, all aliasing issues go away. int x = u.a = u.b = 1.0; becomes perfectly legal.I appreciate we can do this with a custom class that wraps an aligned byte buffer. In which case, back to my previous question - why have the union keyword at all? It's obsolete as of c++11. Kill it and end the argument, since it's only valid use is as the storage for a non-template discriminated union. std::/boost:: variant already covers that and, tellingly, cannot be implemented with a union...
> Sure. But once a reference or pointer to u.b is obtained, the compiler loses that information.At the present time, since compilers today are programmed to (more or less) meet the minimum expectations of the standard. We have already established that I think the standard is short-changing us.> Yes, but the compiler is entitled to change its mind during link-time optimization.If the compiler can carry sufficient contextual information to perform link-time optimisations, it can carry the information to know that 'u' can legally represent an int and a float at the same time.
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
On Tuesday, 24 October 2017 16:25:46 PDT Hyman Rosen wrote:>
> A language where a + b == a + b is not required to be true for numbers a
> and b is broken.
This is not required to be true either
a / b * b == a * b / b
So what makes your example special?
Floating point operations can and do produce different results depending on
the order of the operations, due to loss of precision. That has nothing to do
with C++, but with the nature of floating point.
No, it was required because that's how the 8087 co-processor works. Doing it
any other way would be unbearably slow for normal use-cases, pessimising a lot
of people and having them pay for something they don't use.
I think you were advocating recently that the language should not get in the
way of using the processor and co-processors the way they were intended.
Your two expressions are not identical. You're thinking that because math
tells you they should be, since you learned in 1st grade that addition is
commutative. If they were identical, they would produce the same result.
On Wednesday, 25 October 2017 14:13:47 PDT Hyman Rosen wrote:
> The following code fails when built by `g++-4.8.5 -m32` on an x86 machine
> given 0x1000000000000001, say, as an argument.
This number cannot be represented with precision in a double, only in a long
double.
> double d(long long n) { return n; }
What you're seeing is a side-effect of this function. See
https://godbolt.org/g/8oNtXR
The i386 SysV calling convention returns floating point in ST(0), so both
current Clang and GCC 4.8 simply ask the coprocessor to load it, then return
that. ICC in strict mode as well as current GCC store it to memory first then
reload, to force the value to lose precision.
> int test(volatile long long n) { return d(n) == d(n); }
I understand it's not contrived, but it's artificial, because of that volatile.
The parameter to the test function is most definitely not volatile, so this
code is artificial and trying to trick the compiler with something.
And yet I don't see how this code would produce "bad" for any non-NaN value.
It's comparing d(n) to itself, so the result must be true for any value that
is not NaN. That's independent of whether there was rounding or loss of
precision.
> To me, that means that the implementors gave themselves permission to
> produce the above result - it was not an accident.
The absence of the text cannot be attributed to conscious deletion. It can
just as likely be lack of addition. That is to say, it's possible it was added
to C at some point but not to C++.
> It's not "tricky". And I don't know what you mean about the parameter not
> "being" volatile; volatile is defined by the standard as
> *accesses throughvolatile glvalues are evaluated strictly according to the
> rules of the abstractmachine *and that's what I wanted to have happen.
That's what I meant by artificial. You artificially chose to make it volatile,
when the data itself is not. The variable's value cannot change behind the
compiler back: the function parameter is not in MMIO memory range, its address
is not passed to other threads of execution, etc.
And besides, all four compilers DID elide one of the calls to d(n).
That's debug mode. It never occurred to me to try that.
As we've already seen, the requirement to C was added in C99. So it's very
likely that the compilers implemented the current C++ behaviour up until a
point in time when they were forced to lose precision to comply with the C
language. Since C++ did not add the same text, some compiler writers decided
not to apply the same fix to C++.
On Thursday, 26 October 2017 09:22:18 PDT Hyman Rosen wrote:
> Where does the Standard impose any such requirement on things declared
> volatile?
The keyword is you telling the compiler that the data may be changed
asynchronously and therefore every access must be reloaded.
Your data doesn't do that.
I was referring to code in release mode, in the links that I sent. There's
exactly one conversion from integer to FP, with the FILD instruction.
> Compiler writers are the worst of the optimizationists because
> they're the ones trying to come up with every possible trick so that they
> can point at the resulting assembly language and admire its magnificence.
I don't see anything wrong with that.
This is a valid reason to use volatile. But that's not your case, since the
variable in question is a parameter to a function, which most architectures
even pass in registers. A register can't be volatile.
Anyway, if you compile your code with those four compilers and using -O2, all
four produce one single integer-to-double conversion.
On Thursday, 26 October 2017 14:55:19 PDT Hyman Rosen wrote:
> I'm sorry, but you don't get to make up extra reasons and qualifications
> outside of what the Standard describes about volatile.
I'm not.
> the Standard permits taking the address of a parameter and indirecting through
> it just like any other variable.
And at that point, having the variable be volatile would make sense.
Your code didn't do that. Hence it was artificially using the qualification.
Note: volatile is a hint to the implementation to avoid aggressive optimization
involving the object because the value of the object might be changed by means
undetectable by an implementation.
"change their behaviour [...] to what I don't want" -- you're not the only C++
user out there. The language does not conform to your wishes alone, but to the
general needs of the user base at large.
Just because some code "works" today doesn't mean it will work tomorrow, if it
depending on unconfirmed assumptions. You're not about to tell me that thread-
unsafe code should keep its behaviour as it did in the early 1990s when run
today on multi-thread multi-core CPUs, are you?
Not to mention outright bugs in the source code or in the compiler. I hope
you're not suggesting that compiler writers never fix bugs because someone
could be depending on the erroneous outcome.
On Fri, Oct 27, 2017 at 3:00 PM, Yubin Ruan <ablack...@gmail.com> wrote:
> +Cc gcc-list.
>
> Does any gcc developer have any comments?
See PR82224. The code is valid.
On Tuesday, October 24, 2017 at 10:29:42 AM UTC-7, Richard Smith wrote:On 24 Oct 2017 03:47, "Richard Hodges" <hodg...@gmail.com> wrote:All we need is some rule such as "whenever a union is or could be addressed through some other lens other than the one that was previously written, all underlying bytes will have deemed to have been written, and the next read object will be *as if* its corresponding bytes had been written".Then the union would be perfectly type-punnable and perfectly optimisable.Actually, no, this is not perfectly optimizable. In fact, it invalidates a whole class of profitable optimisations based on type-based alias analysis. It's also harmful to other aspects of the language (eg, constant expression evaluation cannot respect these rules in general).What would be the right solution, then? The proposals from the compiler writes' side so far have been to more or less remove the "common sequence" rule from the language in favor of requiring that all such accesses go through a union type. This would break a lot of system APIs and other existing code without providing a good solution.