x = y = z; Undefined?

Scott Meyers

unread,

Jan 6, 2000, 3:00:00 AM1/6/00

to

I recently received the following mail about something I wrote in EC++/2E:

A friend of mine was complaining about standard's standpoint with regard
to RC implementation of strings, and proposed a solution with
proxy-classes that used op=() returning const, something along the lines
of:

class retval
{
const_reference operator=( ... );
};

I objected (out of the good practice, since op=() normally returns T&, not
const T& (for instance all the containers in the STL)). However, when I
re-read Item 15 of EC++, I didn't quite understand your reasons for
advocating non-const op=():

[snip -- EC++2ed, Item 15, p 66]

int i1, i2, i3;

...

(i1 = i2) = i3; // legal! assigns i2 to i1, then i3 to i1!

[/snip]

AFAIK, parenthesis do not introduce a sequence point (only ";", ",",
"&&" and "||" do) and the above code modifies an object (i1) twice before
a seq. point. This is undefined behaviour, according to the standard.

I think he may be right. So I have two questions:

1. Does the above example run afoul of the rule against modifing an
object more than once without an intervening sequence point? (In the
example, it's important that the type in question be a built-in. For
user-defined types, assignment is a function call, and different
rules apply.)

2. Why does assignment for built-in types return an lvalue? My
understanding is that this is a difference from C, and I know I've
read reasons for it before, but I can't remember ever reading a
particularly good reason.

Thanks,

Scott

--
Scott Meyers, Ph.D. sme...@aristeia.com
Software Development Consultant http://www.aristeia.com/
Visit http://meyerscd.awl.com/ to demo the Effective C++ CD
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html ]

Walter E Brown

unread,

Jan 6, 2000, 3:00:00 AM1/6/00

to

Scott Meyers opined thusly on 6 Jan 00 at 12:38:54 GMT:

...

| AFAIK, parenthesis do not introduce a sequence point (only ";", ",",
| "&&" and "||" do)

...

Well, not quite. The "only" in the above is not true; there are some
other sequence points beyond the few you enumerated.

For example, the ternary "?" mandates a sequence point after evaluation
of its left operand and before evaluation of either of the other two
operands.

Also, ";" causes a sequence point only because it delimits a
"full-expression", yet there are several places where full-expressions
are found without trailing semicolon. For example, the complete
predicate in an if statement has no such semicolon, yet is clearly a
full-expression. Similarly, of course, we have full-expressions via the
predicates in looping structures, via the control expression in a
switch, etc. Indeed, the

A function call, interestingly enough, involves two sequence points. One
of them is after argument evaluation and before entering the function
body, while the second is after the function has returned but before the
caller has resumed.

Scott Meyers then queried thusly at the same time:

...

| 2. Why does assignment for built-in types return an lvalue? My
| understanding is that this is a difference from C, and I know
I've
| read reasons for it before, but I can't remember ever reading a
| particularly good reason.

...

Well, on this issue, "good reason" is likely a matter of taste.

While I have believed for some time that the oft-repeated argument of
the (x=y)=z multiple assignment is specious (because it violates the
rule prohibiting multiple side effects on a single object in the absence
of any intervening sequence points), it took me a while to come up with
a different example. I finally settled on this one:

void f( T & t );
/* ... */
f( x = y );

Here we have a function that likely intends to commit a side effect on
its parameter t, which is therefore declared as a (non-const)
reference. At the same time, we have an assignment expression in the
role of argument to f. If the assignment did not return an lvalue (in
this case, a reference to x), the call would not compile. However,
because there is a sequence point after the evaluation of the argument
and before f begins its execution, the call is both syntactically and
semantically valid even if f does, indeed, modify its parameter.

Of course, some will certainly prefer to write the above call as a
sequence of two expressions:

x = y;
f( x );

However, modulo issues of programming style, I can't find anything
prohibiting the original version of the call. Because I have no reason
to disallow it, I conclude that a canonical assignment operator (on both
native and and non-native types) ought return a non-const reference. To
do otherwise, it seems to me, would violate the "least astonishment"
principle.

And so it goes.

- WEB

Andrew Koenig

unread,

Jan 6, 2000, 3:00:00 AM1/6/00

to

In article <MPG.12ddaa925...@news.supernews.com>,
Scott Meyers <sme...@aristeia.com> wrote:

> 2. Why does assignment for built-in types return an lvalue? My
> understanding is that this is a difference from C, and I know I've
> read reasons for it before, but I can't remember ever reading a
> particularly good reason.

The example that motivated the rule was something like this:

class Foo {
public:

// ...

Foo& assign(const Foo& newval) {
return (*this = newval);
}

// ...
};

The person who wrote this code expected that

return (*this = newval);

would have the same effect as

*this = newval;
return *this;

because this equivalence is valid, and commonly exploited, in C.
Unfortunately, because = returned an rvalue at the time, the effect
was to quietly return a reference to a temporary that was then
immediately destroyed.

At that point, we realized that in order for

return x = y;

to be equivalent to

x = y;
return x;

or, for that matter, for

f(x = y);

to be equivalent to

x = y;
f(x);

as both are in C, it is necessary for = to yield an lvalue, so as to
cater to functions that (respectively) return an lvalue or take an
lvalue argument.

Thinking about these examples, we also came up with the following:

while ((x = getline(cin)), cin) {
// ...
}

Again, for this to work, the comma operator has to return an lvalue.
Otherwise, we try to copy cin, which fails because there's no public
copy constructor.

So we basically concluded that the rule should be: Every built-in
operator that returns one of its operands should do so as an lvalue
if the operand is an lvalue. Thinking about it still more,
we realized that this rule is a strict extension of C, because
there's no way to write a C program to detect it.

You may agree or disagree with this reasoning, but it is historically
correct. In other words, it is the actual reasoning that led to the
introduction of this particular feature.
--
Andrew Koenig, a...@research.att.com, http://www.research.att.com/info/ark

Hyman Rosen

unread,

Jan 6, 2000, 3:00:00 AM1/6/00

to

sme...@aristeia.com (Scott Meyers) writes:
> int i1, i2, i3;

> (i1 = i2) = i3;
>

> 1. Does the above example run afoul of the rule against modifing an
> object more than once without an intervening sequence point?

Yes.

> 2. Why does assignment for built-in types return an lvalue?

void f(int &n);
void g() { int n; f(n = 3); }

ka...@gabi-soft.de

unread,

Jan 6, 2000, 3:00:00 AM1/6/00

to

sme...@aristeia.com (Scott Meyers) writes:

|> (i1 = i2) = i3; // legal! assigns i2 to i1, then i3 to i1!

|> [/snip]

|> AFAIK, parenthesis do not introduce a sequence point (only ";",
|> ",", "&&" and "||" do) and the above code modifies an object (i1)

|> twice before a seq. point. This is undefined behaviour, according
|> to the standard.

|> I think he may be right. So I have two questions:

He's right.

Andy has already given the explination as to *why* the lvalue. To be
frank, I didn't agree with the reasoning for a long time, but have
gradually changed my mind -- I still don't like the idiom it was
designed to support, but in the end, they created a simple to understand
rule as to what is and what isn't an lvalue. It's nice to find a simple
rule in C++ from time to time.

--
James Kanze mailto:James...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in Objekt orientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

John Potter

unread,

Jan 9, 2000, 3:00:00 AM1/9/00

to

On 6 Jan 2000 19:02:58 GMT, Hyman Rosen <hy...@prolifics.com> wrote:

:

: sme...@aristeia.com (Scott Meyers) writes:
: > int i1, i2, i3;

: > (i1 = i2) = i3;
: >
: > 1. Does the above example run afoul of the rule against modifing an

: > object more than once without an intervening sequence point?
:
: Yes.

Agreed. Returning references has added something new to C++ while the
old C rules about sequence points still hold. The only undefined
behavior that I have managed to observe in these cases was exactly
what I expected. I understand that this is acceptable undefined
behavior. My problem is that I can't seem to imagine a sane
implementation doing anything else. In the interest of understanding
why it is undefined behavior (other than the standard tells me so),
could someone give an example of translation which would produce
something other than expected?

(i1 = i2) += i3;
++ ++ i1;
int* pi = &(i2 = i1); // This one is well behaved

These are syntax errors in C; so, there is no precedent.

John

---

Anthony DeRobertis

unread,

Jan 10, 2000, 3:00:00 AM1/10/00

to

In article <3876254a...@news.csrlink.net>,
jpo...@falcon.lhup.edu (John Potter) wrote:

>could someone give an example of translation which would produce
>something other than expected?
>
> (i1 = i2) += i3;

Well, on my PowerPC this would generate (unoptimized):
-- sequence point --
or i1,i2,i2
add i1,i1,i3
-- sequence point --

Which could legally be re-arranged:
add i1,i1,i3
or i1, i2, i2

Which could then be optimized:
or i1, i2, i2

Which means:
i1 = i2;

As there are no sequencing points. This could concievably be done for
scheduling purposes. I don't seem to be able to get my compile to do it
(it does even weirder stuff...)

--
Windows 95 (win-DOH-z), n. A thirty-two bit extension and graphical shell
to a sixteen bit patch to an eight bit operating system originally coded
for a four bit microprocessor which was used in a PC built by a formerly
two bit company that couldn't stand one bit of competition.

John Potter

unread,

Jan 11, 2000, 3:00:00 AM1/11/00

to

On Mon, 10 Jan 2000 08:28:54 CST, Anthony DeRobertis
<dero...@erols.com> wrote:

: In article <3876254a...@news.csrlink.net>,

: jpo...@falcon.lhup.edu (John Potter) wrote:
:
:
: >could someone give an example of translation which would produce
: >something other than expected?
: >
: > (i1 = i2) += i3;
:
: Well, on my PowerPC this would generate (unoptimized):
: -- sequence point --
: or i1,i2,i2
: add i1,i1,i3
: -- sequence point --
:
: Which could legally be re-arranged:
: add i1,i1,i3
: or i1, i2, i2

I don't think so.

i3 += i1 = i2;

or i1,i2,i2
add i3,i3,i1

Re-arrangement violates the semantics of the language in both cases.
The implementation is allowed to assume that no stored value is
modified more than once. I guess that you could stretch it to the
point of being allowed to detect an expression which invokes
undefined behavior and generate purposely bad code; however, the
compiler writers are in the business of selling compilers not
harassing users. In both cases, the second instruction uses the
value of the first instruction.

: Which could then be optimized:

: or i1, i2, i2
:
: Which means:
: i1 = i2;
:
: As there are no sequencing points. This could concievably be done for
: scheduling purposes. I don't seem to be able to get my compile to do it

I'm not surprised.

: (it does even weirder stuff...)

Given i2 == 2 and i3 == 3, does it do anything which produces other
than i1 == 5? Assuming, of course, that there is a reason to do
anything.

My compiler nicely optimized
void f () { int i1, i2(2), i3(3); (i1 = i2) += i3; }
to
void f () { }

Changing to int f () and adding return i1, it optimized to return 5.

int f (int i2, int i3) { int i1; (i1 = i2) += i3; return i1; }
optimized to
add r3,r3,r4
which is i1 = i2 + i3.

Streatching my imagination, thanks to James Kanze
load r1,i2
copy r2,r1
add r2,i3
sto i1,r2
sto i1,r1
is a valid translation. With multiprocessors, the last two
instructions could be executed in parallel opening the door
for many results depending on the hardware.

The ++ ++ i example also has strange valid translations.

I'm convinced, finally.

John

Anthony DeRobertis

unread,

Jan 11, 2000, 3:00:00 AM1/11/00

to

In article <3879ca08...@news.csrlink.net>, jpo...@falcon.lhup.edu
(John Potter) wrote:

>On Mon, 10 Jan 2000 08:28:54 CST, Anthony DeRobertis
><dero...@erols.com> wrote:
>
>: In article <3876254a...@news.csrlink.net>,
>: jpo...@falcon.lhup.edu (John Potter) wrote:
>:
>:
>: >could someone give an example of translation which would produce
>: >something other than expected?
>: >
>: > (i1 = i2) += i3;
>:
>: Well, on my PowerPC this would generate (unoptimized):
>: -- sequence point --
>: or i1,i2,i2
>: add i1,i1,i3
>: -- sequence point --
>:
>: Which could legally be re-arranged:
>: add i1,i1,i3
>: or i1, i2, i2
>
>I don't think so.

<snip>

The standard, by saying that a value can't be modified twice between
sequencing points, has disallowed this dependency. The compiler need not
check if a new value for an variables depends on another new value
inside the same sequencing point, and there is no reason to expect that
it does check.

One could concieve a compiler that would decide 'ok, the final
destination is i1' and then add three too it. Then manage the assign
i1=i2 after it. Since there can be no more than one assignment to a
single variable in a sequencing point, there's nothing wrong with this
implementation, either.

--
Windows 95 (win-DOH-z), n. A thirty-two bit extension and graphical shell
to a sixteen bit patch to an eight bit operating system originally coded
for a four bit microprocessor which was used in a PC built by a formerly
two bit company that couldn't stand one bit of competition.

---

Ron Natalie

unread,

Jan 11, 2000, 3:00:00 AM1/11/00

to John Potter

John Potter wrote:
>
> I disagree. The standard requires that expressions be evaluated as
> written.

Where does it say that (hint: it doesn't, absent sequence points, the
order of evaluation is UNSPECIFIED).

>
> : One could concieve a compiler that would decide 'ok, the final

> : destination is i1' and then add three too it. Then manage the assign
> : i1=i2 after it. Since there can be no more than one assignment to a
> : single variable in a sequencing point, there's nothing wrong with this
> : implementation, either.
>

> No. The compiler could compute the value 5 store it in i1 and then
> store the value 2 from i2 in i1. It might even store the & or | of
> those two values. Adding 3 to the old value of i1 is not present
> in the expression.

Sorry, this is wrong. The order of evaluation is not specified in
C++ absent sequence points. The original poster is correct.
Changing il twice between sequence points is wrong.

>
> You seem to interpret undefined behavior as a license for the compiler
> to do stupid things. I interpret it as a license to do reasonable
> things on the assumption that there is no undefined behavior without
> needing to verify that. Any compiler that does anything based upon
> detecting undefined behavior better be issuing a diagnostic.

No. UNDEFINED BEHAVIOR does not require a diagnostic. There are far
too many things that are undefined behavior that it are impossible to
check for at compile time. By the standard saying it's undefined behavior
it says the compiler doesn't need to consider handling this sort of
error.

>
> Consider:
> int a[] = { 11, 13 }, i2(2), i3(3);
> int& i1(a[0]);
> int* p;
> *((p = &(i1 = i2)) + 1) += i3;
>
> There is no undefined behavior and a[] == { 2, 16 }.

This is a different case. a[0] is stored once, a[1] is stored once.
The falue of a[0] is not used for anything other than the target of
the store, nore is a[1].

> An interesting point is that in this part of the standard (5) where it
> talks about undefined behavior, the examples label it as unspecified
> behavior. I assume that the normative makes the non-normative examples
> errors.
>
As it stands, it's undefined behavior. Even if it were unspecified behavior
it still would not yield the predictable results that you claim.

John Potter

unread,

Jan 12, 2000, 3:00:00 AM1/12/00

to

On Tue, 11 Jan 2000 20:51:15 CST, Anthony DeRobertis
<dero...@erols.com> wrote:

: In article <3879ca08...@news.csrlink.net>, jpo...@falcon.lhup.edu

: (John Potter) wrote:
:
: >On Mon, 10 Jan 2000 08:28:54 CST, Anthony DeRobertis
: ><dero...@erols.com> wrote:
: >
: >: In article <3876254a...@news.csrlink.net>,
: >: jpo...@falcon.lhup.edu (John Potter) wrote:
: >:
: >: >could someone give an example of translation which would produce
: >: >something other than expected?
: >: >
: >: > (i1 = i2) += i3;
: >:
: >: Well, on my PowerPC this would generate (unoptimized):
: >: -- sequence point --
: >: or i1,i2,i2
: >: add i1,i1,i3
: >: -- sequence point --
: >:
: >: Which could legally be re-arranged:
: >: add i1,i1,i3
: >: or i1, i2, i2
: >
: >I don't think so.
:
: <snip>
:
: The standard, by saying that a value can't be modified twice between
: sequencing points, has disallowed this dependency. The compiler need not
: check if a new value for an variables depends on another new value
: inside the same sequencing point, and there is no reason to expect that
: it does check.

I disagree. The standard requires that expressions be evaluated as
written. The old value of a variable which is modified may only be
used to calculate the new value. The new value may be used.

i1 = i2 calculates the value 2, it is an lvalue for i1

(i1 = i2) += i3 calculates the value 5, it is an lvalue for i1

The undefined behavior involves the final value of i1. The compiler
is not required to check for undefined behavior and there is no
reason to expect that it does check. It is required to compute the
value 5. Your code above does not compute the value 5.

: One could concieve a compiler that would decide 'ok, the final
: destination is i1' and then add three too it. Then manage the assign
: i1=i2 after it. Since there can be no more than one assignment to a
: single variable in a sequencing point, there's nothing wrong with this
: implementation, either.

No. The compiler could compute the value 5 store it in i1 and then
store the value 2 from i2 in i1. It might even store the & or | of
those two values. Adding 3 to the old value of i1 is not present
in the expression.

You seem to interpret undefined behavior as a license for the compiler

to do stupid things. I interpret it as a license to do reasonable
things on the assumption that there is no undefined behavior without
needing to verify that. Any compiler that does anything based upon
detecting undefined behavior better be issuing a diagnostic.

Consider:

int a[] = { 11, 13 }, i2(2), i3(3);
int& i1(a[0]);
int* p;
*((p = &(i1 = i2)) + 1) += i3;

There is no undefined behavior and a[] == { 2, 16 }.

An interesting point is that in this part of the standard (5) where it

talks about undefined behavior, the examples label it as unspecified
behavior. I assume that the normative makes the non-normative examples
errors.

John

---

Hyman Rosen

unread,

Jan 12, 2000, 3:00:00 AM1/12/00

to

jpo...@falcon.lhup.edu (John Potter) writes:
> The undefined behavior involves the final value of i1. The compiler
> is not required to check for undefined behavior and there is no
> reason to expect that it does check. It is required to compute the
> value 5. Your code above does not compute the value 5.

No. If a program execution encounters undefined behavior, the Standard
imposes no requirements at all on any portion of the execution, before
or after the undefined behavior is encountered.

This is explicitly stated in 1.9/5.

John Potter

unread,

Jan 12, 2000, 3:00:00 AM1/12/00

to

On Wed, 12 Jan 2000 05:43:01 CST, Hyman Rosen <hy...@prolifics.com>
wrote:

: jpo...@falcon.lhup.edu (John Potter) writes:
: > The undefined behavior involves the final value of i1. The compiler
: > is not required to check for undefined behavior and there is no
: > reason to expect that it does check. It is required to compute the
: > value 5. Your code above does not compute the value 5.
:
: No. If a program execution encounters undefined behavior, the Standard
: imposes no requirements at all on any portion of the execution, before
: or after the undefined behavior is encountered.
:
: This is explicitly stated in 1.9/5.

Well known rhetoric. If I write code with undefined behavior,
execute it, and something bad happens by accident, it is my problem.

If you write a compiler which detects the presence of undefined
behavior and purposely generates code to damage my computer, you
are in trouble and the standard will not protect you.

The only way that the mentioned reordering of the instructions is
valid is if the compiler detects the undefined behavior. Thus, the
reordering is not valid.

James Kanze gave several examples of reasonable code generation
which would work properly in the absence of undefined behavior
and give strange results in the presence of undefined behavior.
That was in clc++m. I am convinced that a sane compiler can produce
other than expected from the code examples. That was what I asked
for.

I have no interest in fairy tales about optimizers which take
special actions when undefined behavior code is detected or
nasal daemons or universes ending.

John

John Potter

unread,

Jan 12, 2000, 3:00:00 AM1/12/00

to

On 11 Jan 2000 17:34:35 GMT, Ron Natalie <r...@sensor.com> wrote:

:
:
:
: John Potter wrote:
: >
: > I disagree. The standard requires that expressions be evaluated as
: > written.
:
: Where does it say that (hint: it doesn't, absent sequence points, the

: order of evaluation is UNSPECIFIED).

Wrong. The order of evaluation of expressions is defined by the
grammar. The order of evaluation of subexpressions and parameters
to functions is unspecified.

(a + b) * (c + d)

Either subexpression may be evaluated first, and even in parallel,
but they must both be evaluated prior to the multiply.

(i1 = i2) += i3

Either subexpression may be evaluated first, and even in parallel,
but they must both be evaluated prior to the add-equal. Storing
the result is not part of the evaluation.

Results may be retained where ever until a sequence point at which
time the values must be stored. This freedom allows the compiler
to make optimizations on the assumption that there is no undefined
behavior. It does not allow violating the grammar.

: Sorry, this is wrong. The order of evaluation is not specified in

: C++ absent sequence points. The original poster is correct.

Wrong, see above.

: Changing il twice between sequence points is wrong.

Yes. No arguement. So is the attempted reversal of instructions.

: > Any compiler that does anything based upon

: > detecting undefined behavior better be issuing a diagnostic.

:
: No. UNDEFINED BEHAVIOR does not require a diagnostic. There are far

: too many things that are undefined behavior that it are impossible to
: check for at compile time. By the standard saying it's undefined behavior
: it says the compiler doesn't need to consider handling this sort of
: error.

You completely missed the point. Read again. Any compiler which
decides to detect the presence of undefined code BETTER generate
a diagnostic about its detection and not generate malicious code
intended to do damage.

The standard does not require the detection of undefined behavior
to allow generation of reasonable optimized code. It does not
grant the right to commit crimes against society when it detects
undefined behavior.

I rejected the reversal of instructions because it could only be
done in the presence of undefined code. The optimizer which did
that must also be capable of nasal daemons. No interest in
fairy tales.

Compiler writers are too busy trying to support the standard and
provide reasonable optimization to have time for the nonsense
suggested.

Francis Glassborow

unread,

Jan 12, 2000, 3:00:00 AM1/12/00

to

In article <387bc96e...@news.csrlink.net>, John Potter
<jpo...@falcon.lhup.edu> writes

>If you write a compiler which detects the presence of undefined
>behavior and purposely generates code to damage my computer, you
>are in trouble and the standard will not protect you.

Almost certainly true, but it has nothing to do with the C++ Standard
and you would have to prove malicious intent in almost any legal system
that I am aware of.

Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

James Kuyper

unread,

Jan 12, 2000, 3:00:00 AM1/12/00

to

John Potter wrote:
....

> You completely missed the point. Read again. Any compiler which
> decides to detect the presence of undefined code BETTER generate
> a diagnostic about its detection and not generate malicious code
> intended to do damage.

True, but only as a QoI issue, not as a standard conformance issue. If
the Devil put out a version of MSVC++6.66 that stole your soul any time
you have it compile code that allows undefined behavior, that could
still be a conforming implementation of C++.

> The standard does not require the detection of undefined behavior
> to allow generation of reasonable optimized code. It does not
> grant the right to commit crimes against society when it detects
> undefined behavior.

True; neither does it prohibit committing such crimes; that's the job of
the legal system, not the C++ standard.

John Potter

unread,

Jan 12, 2000, 3:00:00 AM1/12/00

to

On 12 Jan 2000 17:21:41 GMT, James Kuyper <kuy...@wizard.net> wrote:

:
: John Potter wrote:
:
: > The standard does not require the detection of undefined behavior

: > to allow generation of reasonable optimized code. It does not
: > grant the right to commit crimes against society when it detects
: > undefined behavior.
:
: True; neither does it prohibit committing such crimes; that's the job of
: the legal system, not the C++ standard.

I think we agree. My original question asked for an example of a *sane*
code generation which would not do what I expected. I contend that no
*sane* compiler will use the presence of undefined code to generate
incorrect code. Any *optimization* which would fail in the absence
of undefined behavior is not *sane* code generation. I got some nice
examples elsewhere.

I will hold to my belief that compiler writers are attempting to
produce quality products. I will continue to consider code which
can only be generated when there is known undefined behavior to be
nonsense. The standard allows it, but reality forbids it.

John

Francis Glassborow

unread,

Jan 13, 2000, 3:00:00 AM1/13/00

to

In article <387bb7c0...@news.csrlink.net>, John Potter
<jpo...@falcon.lhup.edu> writes

>The standard does not require the detection of undefined behavior
>to allow generation of reasonable optimized code. It does not
>grant the right to commit crimes against society when it detects
>undefined behavior.

As it explicitly places no requirements on a compiler faced with code
including undefined behaviour I can see no reason (from the Standard)
why a stupid implementer should not do something malicious. Of course
he will not have any customers but that is something else. The Standard
does not grant any rights with regard to code including undefined
behaviour, it simply disowns it places no requirements on implementers.

Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---

Tom Payne

unread,

Jan 13, 2000, 3:00:00 AM1/13/00

to

John Potter <jpo...@falcon.lhup.edu> wrote:

> Results may be retained where ever until a sequence point at which
> time the values must be stored.

AFIK, only volatiles must be updated at the sequence point.

> You completely missed the point. Read again. Any compiler which

> decides to detect the presence of undefined code BETTER generate
^^^^^^

> a diagnostic about its detection and not generate malicious code
> intended to do damage.

... at the peril of not being used but not in order to conform.

> The standard does not require the detection of undefined behavior
> to allow generation of reasonable optimized code. It does not
> grant the right to commit crimes against society when it detects
> undefined behavior.

Undefined behavior breaks the contract between programmer and
conforming implementation.

> I rejected the reversal of instructions because it could only be
> done in the presence of undefined code. The optimizer which did
> that must also be capable of nasal daemons. No interest in
> fairy tales.

> Compiler writers are too busy trying to support the standard and
> provide reasonable optimization to have time for the nonsense
> suggested.

Accessing dangling reference can put bits to registers that send
voltages to actuators on devices that launch missles. Evil
consequences of undefined behavior do not necessarily involve
evil intent on the part of implementors.

Tom Payne

James Kuyper

unread,

Jan 13, 2000, 3:00:00 AM1/13/00

to

John Potter wrote:
....

> I will hold to my belief that compiler writers are attempting to
> produce quality products. I will continue to consider code which

That strikes me as a poor assumption. Just like anyone else, C++
implementors often find themselves forced to consider other issues to be
more important than quality. Some of them don't even need any forcing.

> can only be generated when there is known undefined behavior to be
> nonsense. The standard allows it, but reality forbids it.

Reality forbids very little; far less than you're likely to be able to
reliably predict. If you consistently write code that allows undefined
behavior, because you "know" it couldn't possibly be "sanely"
mishandled, I guarantee you'll run into trouble you could have avoided,
and I expect it won't take long. When you run into that trouble, there's
a good chance the implementor will be able to you a sane explanation
that you hadn't even considered.

John Potter

unread,

Jan 13, 2000, 3:00:00 AM1/13/00

to

On 13 Jan 2000 01:12:34 GMT, James Kuyper <kuy...@wizard.net> wrote:

:

: John Potter wrote:
: ....
: > I will hold to my belief that compiler writers are attempting to
: > produce quality products. I will continue to consider code which
:
: That strikes me as a poor assumption. Just like anyone else, C++
: implementors often find themselves forced to consider other issues to be
: more important than quality. Some of them don't even need any forcing.

Ok. Beliefs do not need proof. I'll dream on. :)

: > can only be generated when there is known undefined behavior to be

: > nonsense. The standard allows it, but reality forbids it.
:
: Reality forbids very little; far less than you're likely to be able to
: reliably predict. If you consistently write code that allows undefined
: behavior, because you "know" it couldn't possibly be "sanely"
: mishandled, I guarantee you'll run into trouble you could have avoided,
: and I expect it won't take long. When you run into that trouble, there's
: a good chance the implementor will be able to you a sane explanation
: that you hadn't even considered.

I don't write undefined behavior code on purpose. I never said that
the code did not produce undefined behavior. I rejected the example
as invalid and the rhetoric as nonsense.

To summarize:

I believed that the committee had a reason for stating that the forms
of multiple assignment introduced by reference returns in C++ should
still be undefined behavior. I asked for a real example because I
could not think of one. I was given a phoney example which the poster
stated could not be produced by the compiler in question. I rejected
it as fiction. A long list of language lawyers proceeded to lecture
on the words of the standard without giving any concrete examples. I
could conclude that I was wrong about the committee and they like your
compiler writers have no interest in producing a quality product.

Fortunately, I also asked the same question as an aside in a thread
in comp.lang.c++.moderated and received two examples of real compiler
code generation strategies which would cause results other than I
would expect. I now know that the standard makes sense. That was my
desire in this question. Thanks to all of you who avoided answering
it.

Just in case someone interested missed the question, here is a
concrete formulation.

int i1, i2(2), i3(3);
(i1 = i2) += i3;
cout << i1 << endl;

Can anyone find a compiler which produces other than 5 as the output?
Lacking that, can you find a code generation which happens to give 5
but could give something else because of the undefined behavior. I
would accept an adjustment of instruction order which would still
compute the correct values and store them in two different variables.

John

---

John Potter

unread,

Jan 13, 2000, 3:00:00 AM1/13/00

to

On 13 Jan 2000 01:12:33 GMT, Tom Payne <t...@roam-thp2.cs.ucr.edu> wrote:

:

: John Potter <jpo...@falcon.lhup.edu> wrote:
:
: > Results may be retained where ever until a sequence point at which
: > time the values must be stored.
:
: AFIK, only volatiles must be updated at the sequence point.

See 1.9/7. Modifying the state of the computation is a side effect.
As-if games are still allowed, but store now is what it says.

Hyman Rosen

unread,

Jan 13, 2000, 3:00:00 AM1/13/00

to

jpo...@falcon.lhup.edu (John Potter) writes:
> The only way that the mentioned reordering of the instructions is
> valid is if the compiler detects the undefined behavior. Thus, the
> reordering is not valid.

You are presuming to omnisciently know the behavior of all compilers
and optimizers, on into the future? Good for you then, but I think I
will avoid undefined behavior nonetheless.

Michiel Salters

unread,

Jan 13, 2000, 3:00:00 AM1/13/00

to

John Potter wrote:

> Just in case someone interested missed the question, here is a
> concrete formulation.

> int i1, i2(2), i3(3);
> (i1 = i2) += i3;
> cout << i1 << endl;

> Can anyone find a compiler which produces other than 5 as the output?
> Lacking that, can you find a code generation which happens to give 5
> but could give something else because of the undefined behavior. I
> would accept an adjustment of instruction order which would still
> compute the correct values and store them in two different variables.

> John

Has someone access to Intels new Merced/Itanium simulator & associated
compiler? I've heard they are available now?

I'm asking this because from what I've read it will execute 3 operations
at the same time, provided that the compiler knows they are independant.
Now, 3 assignments in one statement may be bundled togeteher, and executed
at the same moment, since the compiler can assume there is at most one
assignment to any single variable. So the compiler could produce this
bundle :
...
{ LOAD i1, LOAD i2, LOAD i3}
{ i1 = i2, i1 = i1+i3, LOAD cout.internal }
...
probably causing the CPU to choke. Of course, this is but one
form of undefined behavior, expecting i1 to have a value at all
is expecting too much.

Michiel Salters

John Potter

unread,

Jan 13, 2000, 3:00:00 AM1/13/00

to

On Thu, 13 Jan 2000 22:45:28 CST, Hyman Rosen <hy...@prolifics.com>
wrote:

: jpo...@falcon.lhup.edu (John Potter) writes:
: > The only way that the mentioned reordering of the instructions is
: > valid is if the compiler detects the undefined behavior. Thus, the
: > reordering is not valid.
:
: You are presuming to omnisciently know the behavior of all compilers
: and optimizers, on into the future?

For the given two machine instructions, I think I am safe in that. They
both modify the same variable with the second using the result of the
first. Without knowing that there is no sequence point, the
instructions may not be reversed.

: Good for you then, but I think I

: will avoid undefined behavior nonetheless.

Me too. I just asked for a real example. That one was not. It
sometimes helps to understand why the standard makes something
undefined.

John

Francis Glassborow

unread,

Jan 14, 2000, 3:00:00 AM1/14/00

to

In article <387d50e4...@news.csrlink.net>, John Potter
<jpo...@falcon.lhup.edu> writes

> int i1, i2(2), i3(3);
> (i1 = i2) += i3;
> cout << i1 << endl;

Any compiler of reasonable quality should be able to get such code
correct because the potential undefined behaviour has not been hidden
(e.g. by aliasing). I understood that one reason for writing the rules
the way they are was that some systems have serious problems in writing
to the same store with insufficient delay. Note that I am not a
hardware expert and so this is hearsay.

IMO compiler writers could do a much better job at detecting potential
undefined behaviour and at least issuing a warning. Code such as:

int i = 0;
int array[10];
while (i<10) array[i] = i++;

Has an easily diagnosed problem, it would be helpful if compilers issued
warning for such code even if they include an assurance that they will
generate safe code.

OTOH

void fn (int & i, int & j, int max, int * array) {
i=0;
while (i< max) array[i++] = j++;
}

Has potentially far worse undefined behaviour which cannot be reasonably
diagnosed. Yes, I know the code stinks but have a look at some real
code and you will find things just as rotten only masked by other code.

Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Zalman Stern

unread,

Jan 14, 2000, 3:00:00 AM1/14/00

to

Walter E Brown <w...@fnal.gov> wrote:
: void f( T & t );

: /* ... */
: f( x = y );

[...]

I had to deal with such a piece of code a few years ago. Many C++
programmers I asked believed it is required to pass a reference to x as
the argument to f. The Metrowerks compiler of the time disagreed and
Metrowerks defended the decision as draft standard compliant. The standard
at the time said the assignment returned an lvalue of the appropriate
type having the appropriate value. It did not say that the same lvalue that
was assigned to had to be returned. I believe section 5.17 of the actual
standard reads the same way.

A number of smart people agreed that the standard did not make this
explicit and I concluded that you *never* want to write code like that.
Can't see how anything has changed here for the final standard.

Andrew Koenig's post indicates that the intent of the standard is that a
reference to x is required above, but there is certainly some contention
about whether that intent is expressed in the language of the standard.

-Z-

---

Bill Wade

unread,

Jan 14, 2000, 3:00:00 AM1/14/00

to

Zalman Stern wrote in message <85mtg5$ii6$1...@nntp6.atl.mindspring.net>...
>
> [ Argues that the standard does not make it explicit that
> &(x=y) == &x
> ]

>
>A number of smart people agreed that the standard did not make this
>explicit and I concluded that you *never* want to write code like that.
>Can't see how anything has changed here for the final standard.
>
>Andrew Koenig's post indicates that the intent of the standard is that a
>reference to x is required above, but there is certainly some contention
>about whether that intent is expressed in the language of the standard.

I agree that the wording is not explicit. I disagree that any other
interpretation is reasonable. I also note that the same style of wording is
used with both ?: and prefix ++. I hope anyone that argues that
int x=0;
int* px = &x;
assert(&(x=x) == px);
is allowed to fail will also argue that
assert(&(x?x:x) == px);
assert(&(++x) == px);
are both allowed to fail.

5.3.2 (++) "The value is the new value of the operand. It is an lvalue."
5.16/4 (?:) "The result is of that type and is an lvalue."
5.17/1 (=) "The result ... is the value stored in the left operand ... the
result is an lvalue"

The wording is much tighter for parenthesis, &(x)==&x, and unary*, &*&x ==
&x. However in all cases the intent is clear.

Joerg Barfurth

unread,

Jan 15, 2000, 3:00:00 AM1/15/00

to

Zalman Stern <zal...@netcom9.netcom.com> wrote:

> Walter E Brown <w...@fnal.gov> wrote:
> : void f( T & t );
> : /* ... */
> : f( x = y );
> [...]
>
> I had to deal with such a piece of code a few years ago. Many C++
> programmers I asked believed it is required to pass a reference to x as
> the argument to f. The Metrowerks compiler of the time disagreed and
> Metrowerks defended the decision as draft standard compliant. The standard
> at the time said the assignment returned an lvalue of the appropriate
> type having the appropriate value. It did not say that the same lvalue that
> was assigned to had to be returned. I believe section 5.17 of the actual
> standard reads the same way.

It does

> Andrew Koenig's post indicates that the intent of the standard is that a
> reference to x is required above, but there is certainly some contention
> about whether that intent is expressed in the language of the standard.

This gets even more obscure, if you consider 5.17/p.3:
"If the left operand is not of class type, the expression is implicitly
converted (clause 4) to the cv-unqualified type of the left operand."

How does that fit in with (from 5.17/p.1):
"... the type of an assignment expression is that of its left operand.
The result of the assignment operation ... is an lvalue."

What is the difference between an assignment _expression_ and its
assignment _operation_ ?

How can an expression be converted to a type ? How can that type
possibly different from the type of the expression (by losing
qualifiers) ?

And if that means that the result is converted, what does the statement
that the result is an lvalue mean in this case ?
IIRC the only implicit conversion (clause 4) that would support this
loss of qualifiers would be lvalue to rvalue.

Consider an example:

void f(int);
void g(volatile int&);
void h(int& );

int y(42);
volatile int x;

f(x=y); // OK: lvalue-to-rvalue drops qualifiers for non-class
g(x=y); // OK ? parameter bound to x ?
h(x=y); // OK ?? parameter not bound to x (hopefully) !

I presume it is intended that only the call to f() works. Otherwise it
isn't so clear how the value of the expression can be the value assigned
(as 5.17/p.1 also states):
int z = (x=y); // z initialized to 42
g(x=y); // if parameter is bound to x, reading it may not yield 42

The only alternative seems to be that it may be the case that
&x != &(x=y)

Anyone care to elaborate on that...

--
Jörg Barfurth Tel: +49 40 23646 500
Software Engineer mailto:joerg.b...@germany.sun.com
StarOffice GmbH
+jbarfurth @ vossnet.de http://www.sun.com/staroffice

---

Tom Payne

unread,

Jan 15, 2000, 3:00:00 AM1/15/00

to

John Potter <jpo...@falcon.lhup.edu> wrote:
> On 13 Jan 2000 01:12:33 GMT, Tom Payne <t...@roam-thp2.cs.ucr.edu> wrote:

> :
> : John Potter <jpo...@falcon.lhup.edu> wrote:
> :
> : > Results may be retained where ever until a sequence point at which
> : > time the values must be stored.
> :
> : AFIK, only volatiles must be updated at the sequence point.

> See 1.9/7. Modifying the state of the computation is a side effect.
> As-if games are still allowed, but store now is what it says.

It is exactly in the matter of what can be assumed to be "as if" that
volatile and non-volatile variables differ. AFIK, omitting a
modification to a dead non-volatile can be assumed to be as if the
variable had been modified, but omitting a modification to a dead
volatile cannot. Similarly, delaying an update of a non-volatile
past a sequence point can be assumed to be as if it had been updated
before the sequence point, but not so for volatiles.

Tom Payne

James Kuyper

unread,

Jan 15, 2000, 3:00:00 AM1/15/00

to

Joerg Barfurth wrote:
....

> This gets even more obscure, if you consider 5.17/p.3:
> "If the left operand is not of class type, the expression is implicitly
> converted (clause 4) to the cv-unqualified type of the left operand."
>
> How does that fit in with (from 5.17/p.1):
> "... the type of an assignment expression is that of its left operand.
> The result of the assignment operation ... is an lvalue."
>
> What is the difference between an assignment _expression_ and its
> assignment _operation_ ?

An assignment expression is piece of syntax, which has a type that can
be determined at compile time. The assignment operation is the operation
that the assignment expression tells the implementation to perform, and
it has a result that is, in principle, evaluated at run-time.

> How can an expression be converted to a type ? How can that type
> possibly different from the type of the expression (by losing
> qualifiers) ?

Exactly.

Anthony DeRobertis

unread,

Jan 17, 2000, 3:00:00 AM1/17/00

to

In article <387dc78f...@news.csrlink.net>, jpo...@falcon.lhup.edu
(John Potter) wrote:

>On Thu, 13 Jan 2000 22:45:28 CST, Hyman Rosen <hy...@prolifics.com>
>wrote:
>
>: jpo...@falcon.lhup.edu (John Potter) writes:
>: > The only way that the mentioned reordering of the instructions is
>: > valid is if the compiler detects the undefined behavior. Thus, the
>: > reordering is not valid.
>:
>: You are presuming to omnisciently know the behavior of all compilers
>: and optimizers, on into the future?
>
>For the given two machine instructions, I think I am safe in that. They
>both modify the same variable with the second using the result of the
>first. Without knowing that there is no sequence point, the
>instructions may not be reversed.

Correct. But if a compiler were to actually use the sequence points to
determine how it can schedule instructions, it would not notice that
dependency.

There is no reason to expect a compiler to check for a store conflict
like that inside of one sequence point.

--
Windows 95 (win-DOH-z), n. A thirty-two bit extension and graphical shell
to a sixteen bit patch to an eight bit operating system originally coded
for a four bit microprocessor which was used in a PC built by a formerly
two bit company that couldn't stand one bit of competition.

---

Zalman Stern

unread,

Jan 18, 2000, 3:00:00 AM1/18/00

to

Francis Glassborow <fra...@robinton.demon.co.uk> wrote:
: Almost certainly true, but it has nothing to do with the C++ Standard

: and you would have to prove malicious intent in almost any legal system
: that I am aware of.

And you really don't want to go to court against any entity capable of
producing nasal demons anyway. Trust me on this.

-Z-

John Potter

unread,

Jan 18, 2000, 3:00:00 AM1/18/00

to

On Mon, 17 Jan 2000 20:57:19 CST, Anthony DeRobertis
<dero...@erols.com> wrote:

: In article <387dc78f...@news.csrlink.net>, jpo...@falcon.lhup.edu
: (John Potter) wrote:
:
: >For the given two machine instructions, I think I am safe in that. They

: >both modify the same variable with the second using the result of the
: >first. Without knowing that there is no sequence point, the
: >instructions may not be reversed.
:
: Correct. But if a compiler were to actually use the sequence points to
: determine how it can schedule instructions, it would not notice that
: dependency.

The dependency is a value dependency. Your example:

(i1 = i2) += i3; // undefined, generates
or i1,i2,i2
add i1,i1,i3

My counter example:

i4 = (i1 = i2) + i3; // well defined, generates
or i1,i2,i2
add i4,i1,i3

Same two instructions with the same value dependency. The second
instruction depends upon the value of the first instruction, not
upon where it was put.

: There is no reason to expect a compiler to check for a store conflict

: like that inside of one sequence point.

We agree. Without checking the store conflict and noting that this is
undefined behavior which allows anything, the two intstructions may not
be reversed because of the value dependency.

Going to a simpler machine:

mov bx,i2; i1 is in bx and will be stored later
mov ax,bx
add ax,i3
mov i4,ax
mov i1,bx; sequence point, so store it

This is a valid translation which produces two correct answers. To go
back to the original, change i4 to i1. Wrong answer with code which
could only be noticed by observing the store conflict which is not
required.

This is the kind of example I desired. It demonstrates undefined
behavior giving unexpected results with reasonable code. There is
no reversal of unreversable instructions.

I am convinced that there is good reason to make the original code
be undefined behavior. I never doubted that it was, just wanted an
example to help explain why.

John

Zalman Stern

unread,

Jan 18, 2000, 3:00:00 AM1/18/00

to

Bill Wade <bill...@stoner.com> wrote:
: I agree that the wording is not explicit. I disagree that any other
: interpretation is reasonable.

See below.

[...]
: assert(&(x=x) == px);

The Metrowerks compiler case required passing the expression as an argument
to a routine expecting a reference. The compiler issued a warning that the
code probably didn't do what was wanted. (Something like "passing a
temporary to a non-const reference.")

: However in all cases the intent is clear.

I'm pretty sure much of the readership of this newsgroup (including Bill)
knows that Bill's opening sentences and closing sentence are semantic
errors when arguing a standard with a compiler vendor. As in a DOA, Game
Over, "I thought you silenced the guard!" semantic error. We're talkin'
folks who are trained to smell fear and trepidation in bug reports and bury
them as quickly as possible. So up front admitting the standard doesn't
support your case pretty much ends the discussion.

Its also just not a very strong definition for something so fundamental in
the language. I hope it can be improved.

Anyone who wants to be able to depend on this behavior should consider
filing a defect report. I am not sufficiently motivated.

-Z-