How defined is undefined behaviour?

15 views
Skip to first unread message

Barney Barumba

unread,
Jun 13, 1998, 3:00:00 AM6/13/98
to

Given:

int a = 0;
int b = a + (a = 1 ? a = 2 : a = 3);

is b 'undefined'? As I understand it, a compiler may evaluate the
lhs and rhs of the addition in either order. Further more, if the rhs
is evaluated first, the assignment 'a = 2' may store the value 2 in
'a' before or after evaluating the lhs. This would mean that 'b'
could have the value 2, 3 or 4.

If this is true, then 'b == 3' may be true or false (undefined?),
but 'b < 5' would always be true. So is 'b' really 'undefined', or
does it have a state of 'either 2, 3 or 4'?

--
Barney.

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

Pete Becker

unread,
Jun 13, 1998, 3:00:00 AM6/13/98
to

Barney Barumba wrote:
>
> Given:
>
> int a = 0;
> int b = a + (a = 1 ? a = 2 : a = 3);
>
> is b 'undefined'? As I understand it, a compiler may evaluate the
> lhs and rhs of the addition in either order. Further more, if the rhs
> is evaluated first, the assignment 'a = 2' may store the value 2 in
> 'a' before or after evaluating the lhs. This would mean that 'b'
> could have the value 2, 3 or 4.
>
> If this is true, then 'b == 3' may be true or false (undefined?),
> but 'b < 5' would always be true. So is 'b' really 'undefined', or
> does it have a state of 'either 2, 3 or 4'?

The language definition does not impose any requirements on what a C++
compiler does with this statement. That's what it means to say that the
statement's behavior is undefined. The fact that it could be more
tightly constrained does not change what the standard says.

Michael Rubenstein

unread,
Jun 13, 1998, 3:00:00 AM6/13/98
to

On 13 Jun 1998 08:38:25 -0400, Barney Barumba
<barn...@iname.removethis.com> wrote:

>Given:
>
> int a = 0;
> int b = a + (a = 1 ? a = 2 : a = 3);
>
>is b 'undefined'? As I understand it, a compiler may evaluate the
>lhs and rhs of the addition in either order. Further more, if the rhs
>is evaluated first, the assignment 'a = 2' may store the value 2 in
>'a' before or after evaluating the lhs. This would mean that 'b'
>could have the value 2, 3 or 4.
>
>If this is true, then 'b == 3' may be true or false (undefined?),
>but 'b < 5' would always be true. So is 'b' really 'undefined', or
>does it have a state of 'either 2, 3 or 4'?

It's unlikely that an actual implementation would do anything but
assign one of those values to b. However, the [draft] standard does
not require it. As far as the standard is concerned, anything might
happen; there are no requirements.

There are other situations in which the behavior is unspecified.
Unspecified behavior is somewhat constrained. For example, in

int a = 0;
int f() { return ++a; }
// ...
cout << f() << f() << '\n';

the order of evaluation of the two calls to f() are unspecified and
the program may print out 12 or 21. An implementation is not required
to document unspecified behavior nor is it required to be consistent.
for example, if we repeat the last statement it might print out

21
12

--
Michael M Rubenstein

Steve Clamage

unread,
Jun 13, 1998, 3:00:00 AM6/13/98
to

Barney Barumba <barn...@iname.removethis.com> writes:

>Given:

> int a = 0;
> int b = a + (a = 1 ? a = 2 : a = 3);

>is b 'undefined'? As I understand it, a compiler may evaluate the
>lhs and rhs of the addition in either order. Further more, if the rhs
>is evaluated first, the assignment 'a = 2' may store the value 2 in
>'a' before or after evaluating the lhs. This would mean that 'b'
>could have the value 2, 3 or 4.

>If this is true, then 'b == 3' may be true or false (undefined?),
>but 'b < 5' would always be true. So is 'b' really 'undefined', or
>does it have a state of 'either 2, 3 or 4'?

"Undefined" means the standard places no requirements on
the implementation. Your example expression has undefined
behavior, so the implementation can do whatever it likes
and sill remain in compliance with the standard.

In prinicple, that means literally anything might happen.
The compiler might refuse to compile the program, for
example, in which case it would be doing you a favor.
It might cause the program to abort at runtime.

Formally speaking, the values of 'a' and 'b' are undefined,
and in addition you can make no assumptions about the
remainder of the program.

Practically speaking, I don't know of any compilers that
attempt this analysis, and I don't know of any compilers
that would do something truly weird. Ordinarily, the
compiler will generate naive code that will produce
some result along the lines you suggest.

--
Steve Clamage, stephen...@sun.com

Alfred Kellner

unread,
Jun 13, 1998, 3:00:00 AM6/13/98
to

Barney Barumba <barn...@iname.removethis.com> wrote:
> Given:
>
> int a = 0;
> int b = a + (a = 1 ? a = 2 : a = 3);
>
you have to be aware that

(a = 1 ? a = 2 : a = 3)
is evaluated as if
( (a=1 ? a=2 : a) = 3)
because precedence of op ?: is higher than of op =.
in short a=3.

so first take out the 'obfuscation'
(a = 1 ? a = 2 : (a = 3) )
and then discuss if it's defined or undefined

bad example:
int b,c;
1 ? b : c = 7; // equiv: b=7;
0 ? b : c = 5; // equiv: c=5;

--ALfred

Christopher M. Gurnee

unread,
Jun 13, 1998, 3:00:00 AM6/13/98
to

Barney Barumba wrote in message <6ltrrm$5...@netlab.cs.rpi.edu>...

>Given:
>
> int a = 0;
> int b = a + (a = 1 ? a = 2 : a = 3);
>
>is b 'undefined'? As I understand it, a compiler may evaluate the
>lhs and rhs of the addition in either order. Further more, if the rhs
>is evaluated first, the assignment 'a = 2' may store the value 2 in
>'a' before or after evaluating the lhs. This would mean that 'b'
>could have the value 2, 3 or 4.
>
>If this is true, then 'b == 3' may be true or false (undefined?),
>but 'b < 5' would always be true. So is 'b' really 'undefined', or
>does it have a state of 'either 2, 3 or 4'?

Short answer:
The standard says that not only is the value of both b and a
undefined, the behavior of the program itself is undefined. This
means that the program is completely free to do whatever it wants, be
that crash, or set b = 1000, or whatever, and the compiler that
produced the program is still considered standard conforming.
Practically speaking, my guess is that the program would actually set
b to either 2, 3, or 4, depending on the compiler, and then continue
on its merry way, but you should *never* rely on a guess like this.

Long answer:

Let me start out by saying that sequence points and order of
evaluation in general is one of the most commonly misunderstood topics
in C++ (IMHO), and one of the most common topics unfortunately not
covered by books or [school] classes (although there certainly are
some good books out there that do covor it). I also think that any
good C++ programmer should be *at least* vaguely familiar with it, so
here is a short tutorial.

The only thing that guarantees order of evaluation in C++ (or in C)
is something called a sequence point. Without getting too technical,
a sequent point guarantees that when it is reached, everything before
it has been evaluated and nothing after it has been evaluated. The
order of evaluation between two sequence points is unspecified. The
most common sequence point is the one that occurs at the end of each
full-expression (which usually means there is a sequence point at each
";"). Between two sequence points, you must follow these two rules:
1. never modify the value stored by an object more than once
2. if you do modify the value stored by an object, you can only use
the previous value stored by that object in order to determine the new
value to be stored by it.
If you break either rule, the behavior of the *entire* program
becomes undefined, so don't do it!

Here are some examples:
i = ++i; // breaks rule #1
cout << i << ++i; // breaks rule #2
i = i + 1; // ok
i ? i=1 : i=5; // ok (guaranteed only one of those two =
expressions are actually evaluated)
i=1; i++; // ok (sequence point at end of each full-expression)
i=1, i++; // ok (the comma operator, by definition, introduces a
sequence point between the two expressions)

Just for reference, there are eight places where sequence points
occur:
1. after a full-expression (an expression that is not a
sub-expression of another expression)
2. right before a called function begins executing
3. after a function returns, right before any code in the callee
continues executing
4-7. after the evaluation of the expression "a" in the following
expressions:
a , b
a ? b : c
a || b
a && b
No other operators introduce sequence points. Note also that a
comma, when not used as an operator (for example in separating
function call arguments), does not introduce a sequence point.
8. While initializing an object of class type, right after the
initialization of each base class object and each member object. Note
that this does not imply that initialization occurs in the order
specified in the member initialization list; this is another issue.

Now, let's take a look at your example:


> int a = 0;
> int b = a + (a = 1 ? a = 2 : a = 3);

The expression (a = 1 ? a = 2 : a = 3) is actually ok, because there
is a sequence point after a = 1 and because only one of the two
remaining expressions is evaluated. But then you use a on the left of
the + operator, violating rule #2. Therefore, your program contains
undefined behavior.

Just in case you were wondering, It's raining and thundering here in
Mass, which explains my long-windedness :)

-Chris Gurnee

Valentin Bonnard

unread,
Jun 13, 1998, 3:00:00 AM6/13/98
to

Barney Barumba <barn...@iname.removethis.com> writes:

> Given:
>
> int a = 0;
> int b = a + (a = 1 ? a = 2 : a = 3);
>
> is b 'undefined'?

There are two possible evaluation orders, each of them
has undefined behaviour. The program can reformat your
hard disk.

See also the current thread in comp.std.c about undefined
behaviour if you want to know everything about the
dangers of undefined behaviour.

--

Valentin Bonnard mailto:bonn...@pratique.fr
info about C++/a propos du C++: http://pages.pratique.fr/~bonnardv/

John Nagle

unread,
Jun 14, 1998, 3:00:00 AM6/14/98
to

cla...@Eng.Sun.COM (Steve Clamage) writes:
>Barney Barumba <barn...@iname.removethis.com> writes:
>>Given:
>> int a = 0;
>> int b = a + (a = 1 ? a = 2 : a = 3);
>In prinicple, that means literally anything might happen.
>The compiler might refuse to compile the program, for
>example, in which case it would be doing you a favor.
>It might cause the program to abort at runtime.

>Formally speaking, the values of 'a' and 'b' are undefined,
>and in addition you can make no assumptions about the
>remainder of the program.

>Practically speaking, I don't know of any compilers that
>attempt this analysis, and I don't know of any compilers
>that would do something truly weird. Ordinarily, the
>compiler will generate naive code that will produce
>some result along the lines you suggest.

On some machines, there's the possibility that the lhs and rhs
of an expression might be evaluated simultaneously. On fully-interlocked
superscalar machines, like Pentium Pros and above, the result is still
as if some sequential operation was executed. On non-interlocked
superscalar machines, if there are any left, the compiler could generate
code that does produce unpredictable results.
Compilers for explicit-parallelism machines, like Merced, will actually
have to detect side effects and do something about them, and we may
well see error messages for this sort of thing in the Merced era.

John Nagle

Reply all
Reply to author
Forward
0 new messages