strange behaviour with a[a[i]]

Dann Corbit

unread,

Jan 23, 2002, 1:21:13 AM1/23/02

to

"Kevin Miller" <erase.1st.19.c...@earthlink.net> wrote in
message news:3C4E3E79...@earthlink.net...
> srinivas reddy wrote:
> >
> > I have the following piece of code and its behaving strangely..please
> > explain why is it happening like that.
>
> No problem.
>
> > int main()
> > {
> > int a[10], i;
> >
> > i = 0;
> > a[i] = 0;
>
> This statement stores 0 in a[0].
> Now a[0] holds 0.
>
> > a[a[i]] = 1;
>
> Since i holds 0, a[i] refers to a[0], which holds 0.
> Hence a[a[i]] refers to a[0]. So this statement
> stores 1 in a[0].

I don't agree that you can assume that this works. Is the location changed
before the assignment or during the assignment or after the assigment?

Considering:
3.8: How can I understand these complex expressions? What's a
"sequence point"?

A: A sequence point is a point in time (at the end of the
evaluation of a full expression, or at the ||, &&, ?:, or comma
operators, or just before a function call) at which the dust
has settled and all side effects are guaranteed to be complete.
The ANSI/ISO C Standard states that

Between the previous and next sequence point an
object shall have its stored value modified at
most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed
only to determine the value to be stored.

The second sentence can be difficult to understand. It says
that if an object is written to within a full expression, any
and all accesses to it within the same expression must be for
the purposes of computing the value to be written. This rule
effectively constrains legal expressions to those in which the
accesses demonstrably precede the modification.

Kevin Miller

unread,

Jan 23, 2002, 2:35:03 AM1/23/02

to

I made my post before I saw yours.

As a practical matter, I think the machine would have to compute
the location before it stored something in it.

But looking at the section of the standard you quoted, I think you
are right that the standard does in fact deem that this would invoke
undefined behavior. I would also like to here from the gurus at
comp.std.c. If this means what it appears to mean, it is a really
insidious trap. Even Emmanuel didn't realize "a[a[i]] = 1;" invoked
undefined behavior.

Pai-Yi HSIAO

unread,

Jan 23, 2002, 4:44:09 AM1/23/02

to

On Wed, 23 Jan 2002, Kevin Miller wrote:
> Dann Corbit wrote:
> > > Since i holds 0, a[i] refers to a[0], which holds 0.
> > > Hence a[a[i]] refers to a[0]. So this statement
> > > stores 1 in a[0].
> >
> > I don't agree that you can assume that this works. Is the location changed
> > before the assignment or during the assignment or after the assigment?

I don't agree with Dann Corbit.
According to the word of the standard:

"the prior value shall be accessed only

to determine the value to be stored",

the assignment has been assumed after the use of the value a[0].

The question is

Why does the standard put the *key word* 'only' in the phrase?

In this case: "a[a[0]]=1;",
the standard has asked an implementation to access the "prior"
value of a[0] for determining the lvalue "a[a[0]]", that is,
the value of the lvalue has been determined before the process
of the assignment being applied.
There is no ambiguity to decide the ordre of the instructions
even if a[0] has eventually a value 0.
It is dislike the case :" (i++)+(i++);",
in which the object i is modified two times before a sequence pointed,
we have the problem to determine what the ordre of the instructions is.

What I can imagine is that an hypothetical implementation has two modes
for accessing an object: read mode and write mode.
When an object is set to 'read mode' and, at the same time, we try to
write something to it, this will cause some undefined behavior.

But... Does this kind of implementation exist?

Even if it exists, a compiler writor can put some "extra" instruction
turning the object mode to "write" while the lvalue expression
has been calculated.

Why does the standard give this constraint to restrict the freedom of
the most people?

> > The ANSI/ISO C Standard states that
> >
> > Between the previous and next sequence point an
> > object shall have its stored value modified at
> > most once by the evaluation of an expression.
> > Furthermore, the prior value shall be accessed
> > only to determine the value to be stored.
> >
> > The second sentence can be difficult to understand. It says
> > that if an object is written to within a full expression, any
> > and all accesses to it within the same expression must be for
> > the purposes of computing the value to be written. This rule
> > effectively constrains legal expressions to those in which the
> > accesses demonstrably precede the modification.

> As a practical matter, I think the machine would have to compute

> the location before it stored something in it.

"using the prior value for computing the location" is what the
standard requirs.

paiyi

Tak-Shing Chan

unread,

Jan 23, 2002, 5:42:25 AM1/23/02

to

On Wed, 23 Jan 2002, Pai-Yi HSIAO wrote:

> "the prior value shall be accessed only
> to determine the value to be stored",
>
> the assignment has been assumed after the use of the value a[0].

You cannot assume that, for the behavior is undefined.

> The question is
>
> Why does the standard put the *key word* 'only' in the phrase?

To prohibit any *other* accesses of the prior value, e.g.
you cannot use it to determine the location to be stored.

> "using the prior value for computing the location" is what the
> standard requirs.

No. See above.

Tak-Shing

t...@cs.ucr.edu

unread,

Jan 23, 2002, 6:09:08 AM1/23/02

to

In comp.std.c Pai-Yi HSIAO <hs...@ccr.jussieu.fr> wrote:

: On Wed, 23 Jan 2002, Kevin Miller wrote:
:> Dann Corbit wrote:
:> > > Since i holds 0, a[i] refers to a[0], which holds 0.
:> > > Hence a[a[i]] refers to a[0]. So this statement
:> > > stores 1 in a[0].
:> >
:> > I don't agree that you can assume that this works. Is the location changed
:> > before the assignment or during the assignment or after the assigment?

: I don't agree with Dann Corbit.
: According to the word of the standard:
:
: "the prior value shall be accessed only
: to determine the value to be stored",

: the assignment has been assumed after the use of the value a[0].

IIRC, that is what Dan said.

: The question is

: Why does the standard put the *key word* 'only' in the phrase?

: In this case: "a[a[0]]=1;",
: the standard has asked an implementation to access the "prior"
: value of a[0] for determining the lvalue "a[a[0]]", that is,
: the value of the lvalue has been determined before the process
: of the assignment being applied.
: There is no ambiguity to decide the ordre of the instructions
: even if a[0] has eventually a value 0.
: It is dislike the case :" (i++)+(i++);",
: in which the object i is modified two times before a sequence pointed,
: we have the problem to determine what the ordre of the instructions is.

: What I can imagine is that an hypothetical implementation has two modes
: for accessing an object: read mode and write mode.
: When an object is set to 'read mode' and, at the same time, we try to
: write something to it, this will cause some undefined behavior.

: But... Does this kind of implementation exist?

: Even if it exists, a compiler writor can put some "extra" instruction
: turning the object mode to "write" while the lvalue expression
: has been calculated.

: Why does the standard give this constraint to restrict the freedom of
: the most people?

This matter came up in this newsgroup two and a half years ago, and at
that time I understood the consensus of the experts who participated
in that thread to be that this clause "only to determine the address
to be stored" was unnecessarily strict and that the prior value could
also be used to compute the lvalue of the object to be modified,
without jeopardizing any reasonable optimizations. AFIK, however, no
one has filed a defect report.

As a result of that earlier thread, I presumed that the authors of
clause in question had simply overlooked the fact that the lvalue of
the object to be modified had to be computed before it could be
modified, so consulting the objects prior value during that
computation could never cause a problem under any optimzation. So I'm
very surprised by the present report that some implementor has found a
way to take advantage of the fact that programmers are not allowed to
use the prior value of an object in determining that object's lvalue
on the left side of an assignment to that object. Obviously I've
missed something.

Tom Payne

Pai-Yi HSIAO

unread,

Jan 23, 2002, 6:30:46 AM1/23/02

to

On Wed, 23 Jan 2002, Tak-Shing Chan wrote:
> On Wed, 23 Jan 2002, Pai-Yi HSIAO wrote:
>
> > "the prior value shall be accessed only
> > to determine the value to be stored",
> >
> > the assignment has been assumed after the use of the value a[0].
> You cannot assume that, for the behavior is undefined.

Which behavior do you indicate?

> > The question is
> > Why does the standard put the *key word* 'only' in the phrase?
>
> To prohibit any *other* accesses of the prior value, e.g.
> you cannot use it to determine the location to be stored.

Your e.g. is wrong.
The word "the value to be stored" in the standard should be regarded as
"lvalue", which is a location to be stored.

The standard has gauranteed the existence of the lvalue (if it's
possible).

> > "using the prior value for computing the location" is what the
> > standard requirs.
>
> No. See above.

How do you think the meaning of "the value to be stored"?

paiyi

Tom St Denis

unread,

Jan 23, 2002, 6:47:03 AM1/23/02

to

"Kevin Miller" <erase.1st.19.c...@earthlink.net> wrote in

message news:3C4E651E...@earthlink.net...

> I made my post before I saw yours.
>
> As a practical matter, I think the machine would have to compute
> the location before it stored something in it.
>
> But looking at the section of the standard you quoted, I think you
> are right that the standard does in fact deem that this would invoke
> undefined behavior. I would also like to here from the gurus at
> comp.std.c. If this means what it appears to mean, it is a really
> insidious trap. Even Emmanuel didn't realize "a[a[i]] = 1;" invoked
> undefined behavior.

That's because in reality there is nothing undefined about it. For a
program to work the a[i] inner expression *MUST* be computed before you can
store the value. There is no way around that. The compiler cannot store
into the array a[] before it knows which address to put it in. See what
would probably be undefined is something like

a[i++] = --i;

But not

a[a[j++]] = --i;

Since again, a[j++] must be computed first [even if --i is computed very
first] before the store can take place. Anyone who claims this is undefined
behaviour is out to lunch [even if the spec says so]. Now on the subject of
whether that code is a "good idea" it clearly isn't.

Tom

Tim Woodall

unread,

Jan 23, 2002, 7:00:09 AM1/23/02

to

How about a multi-processor optimizing compiler.

Assume in one source file:

int main(void)
{
int a[10];
a[0]=0;
do_clever_stuff(a);
}

And in another source file:

void do_clever_stuff(int *b)
{
b[b[0]]=1;
do_other_stuff(b[0])
}

A sufficiently clever compiler could theoretically run do_other_stuff on a
separate processor in parallel to the assignment. The compiler can use the
optimization that b[0] cannot be modified as part of the assignment.

Tim.

--
God said, "div D = rho, div B = 0, curl E = - @B/@t, curl H = J + @D/@t,"
and there was light.

http://locofungus.2y.net/ http://www.locofungus.btinternet.co.uk/

t...@cs.ucr.edu

unread,

Jan 23, 2002, 7:10:23 AM1/23/02

to

In comp.std.c Tom St Denis <tomst...@yahoo.com> wrote:
[...]
: That's because in reality there is nothing undefined about it. For a

: program to work the a[i] inner expression *MUST* be computed before you can
: store the value.

That's an excellent argument why it should be defined but has nothing
to do with whether it is defined --- a behavior is defined if and only
if the Standard says it's defined. In the case at hand, the Standard
says that the behavior isn't defined. The question is whether that is
a defect in the standard.

Tom Payne

Tom St Denis

unread,

Jan 23, 2002, 7:14:32 AM1/23/02

to

<t...@cs.ucr.edu> wrote in message news:a2m97f$hln$2...@glue.ucr.edu...

Speaking from just random suspicion I'd question whether whoever started
this UB issue actually read the spec properly in the first place?

The actions have a clearly defined behaviour in which case I'd like to think
the spec wouldn't randomly say its UB.

Tom

Tom St Denis

unread,

Jan 23, 2002, 7:19:29 AM1/23/02

to

"Tim Woodall" <t...@pauli.locofungus.org> wrote in message
news:slrna4t95...@pauli.locofungus.org...

> A sufficiently clever compiler could theoretically run do_other_stuff on a
> separate processor in parallel to the assignment. The compiler can use the
> optimization that b[0] cannot be modified as part of the assignment.

Threads are not part of the C spec. It is assumed [most likely] in the spec
that programs execute in a serially dependent fashion.

I.e

int x, a[10];

x = 4;
a[x] = x;

Both statements cannot execute at the same time obviously....

Also the whole things a crock anyways

a[a[j]] = j;

Is just as defined as

a[some_func(a)] = 3;

Tom

t...@cs.ucr.edu

unread,

Jan 23, 2002, 7:51:05 AM1/23/02

to

In comp.std.c Tim Woodall <t...@pauli.locofungus.org> wrote:
[...]
: How about a multi-processor optimizing compiler.

: Assume in one source file:

: int main(void)
: {
: int a[10];
: a[0]=0;
: do_clever_stuff(a);
: }

: And in another source file:

: void do_clever_stuff(int *b)
: {
: b[b[0]]=1;
: do_other_stuff(b[0])
: }

: A sufficiently clever compiler could theoretically run do_other_stuff on a
: separate processor in parallel to the assignment. The compiler can use the
: optimization that b[0] cannot be modified as part of the assignment.

Yes, theoretically. But I doubt that happened in the case reported or
that that was the reason the authors of the Standard disallowed
accessing an objects value to compute its address in an assignment.

Tom Payne

James Kuyper Jr.

unread,

Jan 23, 2002, 7:47:15 AM1/23/02

to

Pai-Yi HSIAO wrote:
...

> The word "the value to be stored" in the standard should be regarded as
> "lvalue", which is a location to be stored.

"the value to be stored" can't refer to the lvalue, because lvalues
can't be stored. An lvalue identifies a storage location. In the case of
a[a[i]] = 1, the value to be stored is '1', and the prior value is the
value stored in a[i].

Pai-Yi HSIAO

unread,

Jan 23, 2002, 8:04:18 AM1/23/02

to

I think I understand the sentence more.
It should be

the prior value shall be accessed only

to determine [WHERE] the value to be stored

Do I misunderstand it?

Thank you.
paiyi

James Kuyper Jr.

unread,

Jan 23, 2002, 8:09:13 AM1/23/02

to

Tom St Denis wrote:
>
> <t...@cs.ucr.edu> wrote in message news:a2m97f$hln$2...@glue.ucr.edu...

...

> > to do with whether it is defined --- a behavior is defined if and only
> > if the Standard says it's defined. In the case at hand, the Standard
> > says that the behavior isn't defined. The question is whether that is
> > a defect in the standard.
>
> Speaking from just random suspicion I'd question whether whoever started
> this UB issue actually read the spec properly in the first place?
>
> The actions have a clearly defined behaviour in which case I'd like to think
> the spec wouldn't randomly say its UB.

It violates a "shall" from section 6.5p2 of the standard, which is not a
constraint section. Section 4p2 says 'If a "shall" or "shall not"
requirement that occurs outside a constraint is violated, the behavior
is undefined." So yes, the standard does declare that the behavior is
undefined.

This particular "shall" in 6.5p2 is not a random requirement; in
general, it is needed to permit efficient implementation of C, by
allowing compilers to use optimizations that could break down if it were
violated. I'm not saying that I see any opportunity for such an
optimization in this particular case. Many of the rules in the standard
are more general than is strictly necessary, in order to make them
easier to state, and correspondingly easier to detect and avoid
violations of them.

Tim Woodall

unread,

Jan 23, 2002, 8:20:19 AM1/23/02

to

On Wed, 23 Jan 2002 12:19:29 GMT,

Tom St Denis <tomst...@yahoo.com> wrote:
>

> "Tim Woodall" <t...@pauli.locofungus.org> wrote in message
> news:slrna4t95...@pauli.locofungus.org...
>> A sufficiently clever compiler could theoretically run do_other_stuff on a
>> separate processor in parallel to the assignment. The compiler can use the
>> optimization that b[0] cannot be modified as part of the assignment.
>
> Threads are not part of the C spec. It is assumed [most likely] in the spec
> that programs execute in a serially dependent fashion.
>

No they are not and therefore any compiler that uses threads to execute a
serial program must get the same results as if it didn't.

The as-if rule says that the compiler can use threads provided that the
result is exactly the same as if it didn't.

In this instance a[a[0]]=1 is undefined behaviour and therefore even if
all single threaded compilers generated identical results, an implementation
that used multiple threads and got a different result wouldn't be
non-conformant.

> Also the whole things a crock anyways
>
> a[a[j]] = j;
>
> Is just as defined as
>
> a[some_func(a)] = 3;
>

Go away and read the standard as to where there are sequence points.

t...@cs.ucr.edu

unread,

Jan 23, 2002, 8:37:49 AM1/23/02

to

In comp.std.c Pai-Yi HSIAO <hs...@ccr.jussieu.fr> wrote:

: Do I misunderstand it?

Yes. If that modification were made the behavior of x = x+1 would
become undefined. The exemption for accessing an object's value to
determine its new value is absolutely necessary. The question is
whether such an exemption could also be made for accessing an object's
value to determine its address (i.e., lvalue) without imposing
additional overhead and/or jeopardizing important optimizations or
requiring. My belief is still that no such implementation burdens
would follow, but that belief is a bit shaken by the report that
some implementation misbehaves on a[a[0]]=1 when a[0] == 0.

Tom Payne

Tim Woodall

unread,

Jan 23, 2002, 8:40:10 AM1/23/02

to

On Wed, 23 Jan 2002 12:51:05 +0000 (UTC),
t...@cs.ucr.edu <t...@cs.ucr.edu> wrote:
> [...]

>
> Yes, theoretically. But I doubt that happened in the case reported or
> that that was the reason the authors of the Standard disallowed
> accessing an objects value to compute its address in an assignment.
>

From what I remember of the original post the behaviour was exactly as
I would have expected.

a[0]=0;
a[a[0]]=1

if(a[0]==1)
printf("Did what I expect\n");
else
printf("Cool - undefined behaviour did something strange\n");

printed "Did what I expect"

t...@cs.ucr.edu

unread,

Jan 23, 2002, 8:54:34 AM1/23/02

to

In comp.std.c Tom St Denis <tomst...@yahoo.com> wrote:
[...]
: Also the whole things a crock anyways

: a[a[j]] = j;

: Is just as defined as

: a[some_func(a)] = 3;

Your message is being posted to comp.std.c, which is a newsgroup for
discussing the international standard for the C language. That
document specifies what behavior a conforming implementation must
yield for certain C programs. Think of it as a contract between good
programs and conforming implementations. Under certain situations the
Standard imposes no behavioral obligation on the implementation, i.e.,
those situations render the contract null and void. The formal term
used by the Standard in discussing those situations is "underfined
behavior". The Standard gets to define its formal terminology and in
netgroups for discussing standards it is customary to ascribe to the
usage defined in those standard.

In the case at hand, the standard has said that it imposes no
behavioral obligations (i.e., undefined behavior) on conforming
implemetations when certain expressions modify an object and access
its value for any purpose other than determining the new value.
a[a[0]]=1 is such an expresson when a[0] has the value zero.

Tom Payne

t...@cs.ucr.edu

unread,

Jan 23, 2002, 9:00:19 AM1/23/02

to

In comp.std.c Tim Woodall <t...@pauli.locofungus.org> wrote:

: On Wed, 23 Jan 2002 12:51:05 +0000 (UTC),

: t...@cs.ucr.edu <t...@cs.ucr.edu> wrote:
:> [...]
:>
:> Yes, theoretically. But I doubt that happened in the case reported or
:> that that was the reason the authors of the Standard disallowed
:> accessing an objects value to compute its address in an assignment.
:>
: From what I remember of the original post the behaviour was exactly as
: I would have expected.

: a[0]=0;
: a[a[0]]=1

: if(a[0]==1)
: printf("Did what I expect\n");
: else
: printf("Cool - undefined behaviour did something strange\n");

: printed "Did what I expect"

Huh? Then what are we talking about? I admit to coming in late, but
from the title of the thread, I thought that something unexpected had
occurred. Am I missing something?

Tom Payne

Dik T. Winter

unread,

Jan 23, 2002, 9:08:23 AM1/23/02

to

In article <Pine.A41.4.10.102012...@moka.ccr.jussieu.fr> Pai-Yi HSIAO <hs...@ccr.jussieu.fr> writes:
...

> I think I understand the sentence more.
> It should be
> the prior value shall be accessed only
> to determine [WHERE] the value to be stored
> Do I misunderstand it?

No. The prior value shall be accessed only to calculate the *new* value.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Tim Woodall

unread,

Jan 23, 2002, 9:30:11 AM1/23/02

to

On Wed, 23 Jan 2002 14:00:19 +0000 (UTC),

t...@cs.ucr.edu <t...@cs.ucr.edu> wrote:
> In comp.std.c Tim Woodall <t...@pauli.locofungus.org> wrote:
>: On Wed, 23 Jan 2002 12:51:05 +0000 (UTC),
>: t...@cs.ucr.edu <t...@cs.ucr.edu> wrote:
>:> [...]
>:>
>:> Yes, theoretically. But I doubt that happened in the case reported or
>:> that that was the reason the authors of the Standard disallowed
>:> accessing an objects value to compute its address in an assignment.
>:>
>: From what I remember of the original post the behaviour was exactly as
>: I would have expected.
>
>: a[0]=0;
>: a[a[0]]=1
>
>: if(a[0]==1)
>: printf("Did what I expect\n");
>: else
>: printf("Cool - undefined behaviour did something strange\n");
>
>: printed "Did what I expect"
>
> Huh? Then what are we talking about? I admit to coming in late, but
> from the title of the thread, I thought that something unexpected had
> occurred. Am I missing something?
>

I checked back and the original test was

if(a[a[i]]==1) /* i was 0 */

Which could reasonably be expected to evaluate a[1]==1 which was itself
undefined as a was a local and only a[0] had been set IIRC.
(If it was a global then a[1]==0 and so the test should still have failed
-returned false)

Bruce Wheeler

unread,

Jan 23, 2002, 9:50:48 AM1/23/02

to

On Wed, 23 Jan 2002 14:00:19 +0000 (UTC), t...@cs.ucr.edu wrote:

...

>
>Huh? Then what are we talking about? I admit to coming in late, but
>from the title of the thread, I thought that something unexpected had
>occurred. Am I missing something?
>
>Tom Payne
>

Hi Tom!

Here is the original post (in full) from comp.lang.c. You got a garbled
read of what was actually written.

---------------------------
On 21 Jan 2002 23:57:47 -0800, sriniva...@yahoo.com (srinivas
reddy) wrote:

>I have the following piece of code and its behaving strangely..please
>explain why is it happening like that.
>

>int main()
>{
> int a[10], i;
>
> i = 0;
> a[i] = 0;

> a[a[i]] = 1;
>

> if (a[a[i]] == 1)
> printf("equal");

> else
> printf("unequal");
>}
>
>The above code prints "unequal" on execution.why?? and the same code
>prints "equal" when any number other than 0 is assigned to a[i]..why??

---------------------------

Making a naive assumption that an implementation 'might do' what one
'might expect', the following 'might occur'.

i = 0;
a[i] = 0; // ie a[0] = 0;
a[a[i]] = 1; // ie a[0] = 1;

if (a[a[i]] == 1) // ie if (a[1] = 1)

Note, given this interpretation, he would be assigning to a[0], and
testing a[1]. One would not expect a logical conclusion in this case.

So, your belief '... that no such implementation burdens

would follow, but that belief is a bit shaken by the report that

some implementation misbehaves on a[a[0]]=1 when a[0] == 0' does not
need to be shaken in this case.

A DR against the standard to allow an assignment of the type 'a[a[i]] =
1' would not be contradicted by the quoted behavior.

Regards,
Bruce Wheeler

Pai-Yi HSIAO

unread,

Jan 23, 2002, 10:20:00 AM1/23/02

to

On Wed, 23 Jan 2002, Dik T. Winter wrote:
>The prior value shall be accessed only to calculate the *new* value.

On Wed, 23 Jan 2002 t...@cs.ucr.edu wrote:
> In comp.std.c Pai-Yi HSIAO <hs...@ccr.jussieu.fr> wrote:
> : I think I understand the sentence more.
> : It should be
>
> : the prior value shall be accessed only
> : to determine [WHERE] the value to be stored
>
> : Do I misunderstand it?
>
> Yes. If that modification were made the behavior of x = x+1 would
> become undefined.

You are right.

> The exemption for accessing an object's value to
> determine its new value is absolutely necessary.

I think you mean:

The exemption for accessing an object's value to

determine "another" new value is absolutely necessary.

The standard said "... determine 'the value to be stored' ".

An lvalue can not be stored so it should be a rvalue.

In this respect, we can not access an object's value to determine a
lvalue. A possible consequence is that the statement like
*p = 100;
is undefined where we access an int pointer p and use its value to
compute the location of an object.
'*p' is an lvalue and it represents a location.
How can it be stored?

I think the words of the clause is not clear at all.
Regardless of the FAQ's interpretation:

This rule effectively constrains legal expressions to those in
which the accesses demonstrably precede the modification

,

does the standard really say:
"Modifying and evaluating the same object without an intervening
sequence point invokes undefined behavior"
(cited from Gergo Barany's post in thread "a[a[i]] must be defined")
?

Thomas Stegen

unread,

Jan 23, 2002, 10:40:48 AM1/23/02

to

In article <Pine.A41.4.10.102012...@moka.ccr.jussieu.fr>,

Pai-Yi HSIAO wrote:
> On Wed, 23 Jan 2002, Dik T. Winter wrote:
>>The prior value shall be accessed only to calculate the *new* value.
>
> On Wed, 23 Jan 2002 t...@cs.ucr.edu wrote:

[snip]

>
> Regardless of the FAQ's interpretation:
> This rule effectively constrains legal expressions to those in
> which the accesses demonstrably precede the modification
> ,

I understand this as: if the access happens before the modification
then it is ok.

And in the case
i = 0;

a[a[i]] = 1;

the access happens before the modification.

--
Thomas Stegen.

Barry Margolin

unread,

Jan 23, 2002, 11:36:20 AM1/23/02

to

In article <a2m5kk$gjf$1...@glue.ucr.edu>, <t...@cs.ucr.edu> wrote:
>This matter came up in this newsgroup two and a half years ago, and at
>that time I understood the consensus of the experts who participated
>in that thread to be that this clause "only to determine the address
>to be stored" was unnecessarily strict and that the prior value could
>also be used to compute the lvalue of the object to be modified,
>without jeopardizing any reasonable optimizations. AFIK, however, no
>one has filed a defect report.

Rewriting the restriction must be done extremely carefully, lest you allow
things that still aren't desired. I suspect the original authors made it
as narrow as it is because it was really difficult to come up with a
description that more accurately describes all the cases we want to allow.
The existing wording covers the most common cases.

There are some other expressions, like the one in this discussion, where
the behavior seems obvious, but if you want to stick to the letter of the
standard you need to break it into two statements with a temporary. These
cases don't come up too much, so it's probably best to leave the standard
as it is rather than risk breaking it.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Tom St Denis

unread,

Jan 23, 2002, 12:56:26 PM1/23/02

to

"Tim Woodall" <t...@pauli.locofungus.org> wrote in message

news:slrna4tdv...@pauli.locofungus.org...

> On Wed, 23 Jan 2002 12:19:29 GMT,
> Tom St Denis <tomst...@yahoo.com> wrote:
> >
> > "Tim Woodall" <t...@pauli.locofungus.org> wrote in message
> > news:slrna4t95...@pauli.locofungus.org...
> >> A sufficiently clever compiler could theoretically run do_other_stuff
on a
> >> separate processor in parallel to the assignment. The compiler can use
the
> >> optimization that b[0] cannot be modified as part of the assignment.
> >
> > Threads are not part of the C spec. It is assumed [most likely] in the
spec
> > that programs execute in a serially dependent fashion.
> >
> No they are not and therefore any compiler that uses threads to execute a
> serial program must get the same results as if it didn't.
>
> The as-if rule says that the compiler can use threads provided that the
> result is exactly the same as if it didn't.
>
> In this instance a[a[0]]=1 is undefined behaviour and therefore even if
> all single threaded compilers generated identical results, an
implementation
> that used multiple threads and got a different result wouldn't be
> non-conformant.

I still don't see that. And in fact your post makes no sense whatsoever.

In the statement

a[a[j]] = i;

Before the compiler can even think of storing anything in the array "a" it
has to know where to put it. There is no way around that requirement. In
order to know where to store it it must evaluate the index e.g. "a[j]" [in
this case].

That's like saying

a[myfunc(a)] = j;

is undefined. Or similarly

a[strlen(a) - 1] = 0;

Is undefined. Clearly the latter is not since it simply removes one char
from the string [in terms of C functions p.o.v] which is used often to
remove new-lines from a string.

I seriously think you guys are mis-reading the spec.

> > Also the whole things a crock anyways
> >
> > a[a[j]] = j;
> >
> > Is just as defined as
> >
> > a[some_func(a)] = 3;
> >
> Go away and read the standard as to where there are sequence points.

Yeah, common sense dictates that the index must be computed before the
store.

Tom

t...@cs.ucr.edu

unread,

Jan 23, 2002, 2:02:04 PM1/23/02

to

In comp.std.c Tom St Denis <tomst...@yahoo.com> wrote:
[...]

: In the statement

: a[a[j]] = i;

: Before the compiler can even think of storing anything in the array "a" it
: has to know where to put it. There is no way around that requirement.

Agreed.

: That's like saying
[...]
: Is undefined.

: I seriously think you guys are mis-reading the spec.

We are not. Keep in mind that "undefined behavior" simply means the
that the Standard imposes no requirements on the behavior (of
conforming implementations) in this situation. That has nothing to do
with:

- whether or not the intent of the programmer is clear and/or

- whether or not every reasonable implementation will behave
according to the clear and obvious intent.

So, why on earth would the intelligent and reasonable folks who wrote
the Standard go out of their way to remove all requirements on the
behavior of expressions whose intended behavior is obvious (and
inescapable in any rational implementation)? The answer is that they
wanted to allow certain optimizations the would otherwise need to be
slowed down by tests for special circumstances, and they cast an
overly broad net when they described the class of expressions whose
behavior is undefined.

Consider a slightly different expression, namely x=2*++x. It's
intended behavior obviously should be the same as x=2*(x+1), but in
fact it is undefined since the behaviors of (most) expressions that
update an object twice are undefined. The reason is to allow
optimization of code for expressions like x=(*p)++, which on a modern
architecture will often attempt update *p and x in parallel, thereby,
obtaining unintended behavior in the case where p is &x. The extra
check that would be necessary to make sure that the intended behavior
occurred in that case would unacceptably slow down the common case.

FWIW, there are even more such disclaimers in the FORTRAN standard.

Tom Payne

John Rickard

unread,

Jan 23, 2002, 2:10:39 PM1/23/02

to

In comp.std.c Barry Margolin <bar...@genuity.net> wrote:
: There are some other expressions, like the one in this discussion, where

: the behavior seems obvious, but if you want to stick to the letter of the
: standard you need to break it into two statements with a temporary. These
: cases don't come up too much, so it's probably best to leave the standard
: as it is rather than risk breaking it.

If you want to stick to the letter of the standard, then you need some
kind of creative interpretation to make the previous statement

a[i] = 0;

defined. If I didn't know better, then it would seem to me from the
wording of the standard that the prior(?) value of `i` is being read,
but not only to determine the value to be stored.

--
John Rickard <John.R...@virata.com>

t...@cs.ucr.edu

unread,

Jan 23, 2002, 2:34:14 PM1/23/02

to

In comp.std.c John Rickard <j...@0.0.0.10> wrote:

: a[i] = 0;

You must read the entire paragraph:

Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value [of the modified object (thp)] shall be
accessed only to determine the value to be stored.

In the case at hand, i's value is not being modified.

Admittedly, the phrase "the prior value" has slightly obscure
referent, but the context is sufficient that implemtors have all
seemed to reach the same conclusion as to intended meaning.

Tom Payne

Dann Corbit

unread,

Jan 23, 2002, 2:18:53 PM1/23/02

to

"Bruce Wheeler" <bruce....@siemens.at> wrote in message
news:3c4ec389...@news.siemens.at...

TIME OUT!

Why does everyone seem to think the result should be obvious? This is one
case to me where I cannot see how the standard could *possibly* impose a
behavior. In the case (for this construct) when a[i] == i the assignment
will always *simultaneously* change both the value and the location where
the value should be stored. So (in the case where we are talking of i == 0
and a[i] == 0) when we say

a[a[i]] = 1;

does the 1 get stored into a[1] or a[0]? It is not at all obvious to me.
One thing is clear, that there are no sequence points in this statement
(except the terminating semicolon). Therefore, we do not have an ordering
imposed on the operations. In fact, an expression like this:
a ^= b ^= a ^= b;
while undefined, seems to have a *more* plausible explanation than a[a[i]] =
1 does. In the case of the xor operation, we can *at least* imagine a left
to right ordering (though none is actually imposed). In the case of the
array expression, due to the interchangeability of an object with its index,
such an ordering cannot even be assumed by the uninformed.

What I want to know (more than anything else) is why this expression should
seem clear cut to others. To me, if there ever were a case for what *ought
to be* undefined behavior, this is it.

TIME IN!
--
C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
"The C-FAQ Book" ISBN 0-201-84519-9
C.A.P. FAQ: ftp://cap.connx.com/pub/Chess%20Analysis%20Project%20FAQ.htm

Barry Margolin

unread,

Jan 23, 2002, 2:50:31 PM1/23/02

to

In article <ZHC*Rk...@news.cam.virata.com>,

John Rickard <John.R...@virata.com> wrote:
>If you want to stick to the letter of the standard, then you need some
>kind of creative interpretation to make the previous statement
>
> a[i] = 0;
>
>defined. If I didn't know better, then it would seem to me from the
>wording of the standard that the prior(?) value of `i` is being read,
>but not only to determine the value to be stored.

The restriction we're talking about only applies if an object is being read
*and* modified between sequence points. In the above statement, i is only
being read, it's not being modified, so the restriction is irrelevant.

Dave Hansen

unread,

Jan 23, 2002, 4:00:02 PM1/23/02

to

Before going any further, let me say that I recognize the standard
imposes no requirement for the result of a[a[i]] = 1 when a[i]==i.
This is a response specific to Dann's question.

On Wed, 23 Jan 2002 11:18:53 -0800, "Dann Corbit" <dco...@connx.com>
wrote:

>"Bruce Wheeler" <bruce....@siemens.at> wrote in message
>news:3c4ec389...@news.siemens.at...
>> On Wed, 23 Jan 2002 14:00:19 +0000 (UTC), t...@cs.ucr.edu wrote:
>>

[...]

>> A DR against the standard to allow an assignment of the type 'a[a[i]] =
>> 1' would not be contradicted by the quoted behavior.
>
>
>TIME OUT!
>
>Why does everyone seem to think the result should be obvious? This is one
>case to me where I cannot see how the standard could *possibly* impose a
>behavior. In the case (for this construct) when a[i] == i the assignment
>will always *simultaneously* change both the value and the location where
>the value should be stored. So (in the case where we are talking of i == 0
>and a[i] == 0) when we say
> a[a[i]] = 1;
>does the 1 get stored into a[1] or a[0]? It is not at all obvious to me.

Most people think it's obvious because they think

i = a[i];

is obvious. Does your simultaneity argument allow the new value of i
to somehow change the location from which it gets this new value?

How about the sequence:

NODE *p, *q, *r; /* some structure type with a 'next' field */
...
p->next = q; /* assume p, q, and r point to valid NODE objects */
q->next = p;
r->next = q;
p->next->next = r; /* Note: p->next->next == p */

Well-defined? What is the value of p after the assignment?

Regards,

-=Dave
--
Change is inevitable, progress is not.

Minti

unread,

Jan 23, 2002, 8:04:46 PM1/23/02

to

"Dann Corbit" <dco...@connx.com> wrote in message news:<a2n25...@enews4.newsguy.com>...

>
>
> TIME OUT!
>
> Why does everyone seem to think the result should be obvious? This is one
> case to me where I cannot see how the standard could *possibly* impose a
> behavior. In the case (for this construct) when a[i] == i the assignment
> will always *simultaneously* change both the value and the location where
> the value should be stored. So (in the case where we are talking of i == 0
> and a[i] == 0) when we say
> a[a[i]] = 1;
> does the 1 get stored into a[1] or a[0]? It is not at all obvious to me.
> One thing is clear, that there are no sequence points in this statement
> (except the terminating semicolon). Therefore, we do not have an ordering
> imposed on the operations. In fact, an expression like this:
> a ^= b ^= a ^= b;
> while undefined, seems to have a *more* plausible explanation than a[a[i]] =
> 1 does. In the case of the xor operation, we can *at least* imagine a left
> to right ordering (though none is actually imposed). In the case of the
> array expression, due to the interchangeability of an object with its index,
> such an ordering cannot even be assumed by the uninformed.
>
> What I want to know (more than anything else) is why this expression should
> seem clear cut to others. To me, if there ever were a case for what *ought
> to be* undefined behavior, this is it.
>
> TIME IN!

I give up. And I am sorry if this is sounding to be too obvious. But,
I guess I would be provided with the missing link.

The only way I see any *abstract* implementation evaluating this kind
of expression would be -- >
1) Get the value of i
2) Add a to the above value
3) Get the value from the above generated address
4) Add this value to a
5) Move 1 to this address

Is there any other abstract behaviour for the expression?
Besides,
"Completely unrelated. You are modifying the location at the same
time as
the value in a[a[i]] in the case where a[i] == i."
The one way that I seem to visulaize that thing is like

int x, *a=&x, y;
*(a + *( ( a = &y ) + i ) );// a[ ( a = & y) [i] ]
Now this is what is very much undefined.

James Kuyper Jr.

unread,

Jan 23, 2002, 8:34:20 PM1/23/02

to

Dann Corbit wrote:
>
> "Bruce Wheeler" <bruce....@siemens.at> wrote in message

...

> Why does everyone seem to think the result should be obvious? This is one
> case to me where I cannot see how the standard could *possibly* impose a
> behavior. In the case (for this construct) when a[i] == i the assignment
> will always *simultaneously* change both the value and the location where
> the value should be stored. So (in the case where we are talking of i == 0
> and a[i] == 0) when we say
> a[a[i]] = 1;
> does the 1 get stored into a[1] or a[0]? It is not at all obvious to me.
> One thing is clear, that there are no sequence points in this statement
> (except the terminating semicolon). Therefore, we do not have an ordering

...

> to right ordering (though none is actually imposed). In the case of the
> array expression, due to the interchangeability of an object with its index,
> such an ordering cannot even be assumed by the uninformed.
>
> What I want to know (more than anything else) is why this expression should
> seem clear cut to others. To me, if there ever were a case for what *ought
> to be* undefined behavior, this is it.

Let's make the interchangeability you refer to more clear, by rewriting
that as equivalent code, using the definition of indexing:

*(a+*(a+i)) = 1;

This code must do several things:

1. Retrieve the address of a.
2. Retrieve the value of i.
3. Add the values retrieved by steps 1 and 2.
4. Retrieve the value stored at the location identified in step 3.
5. Add the value retrieved in step 1 to the value retrieved in step 4.
6. Store 1 in the location identified in step 5.

Each operator can only be evaluated after it's operands have been
evaluated. In other words, each step must occur after any other steps it
refers to. The only permitted orders are therefore 1,2,3,4,5,6 and
2,1,3,4,5,6. The result is the same either way. If this were legal code,
there would be no ambiguity in the ordering of those steps. You seem to
imply that step 6 can occur before step 4. How?

glen herrmannsfeldt

unread,

Jan 23, 2002, 8:40:38 PM1/23/02

to

mintiSP...@yahoo.com (Minti) writes:

>"Dann Corbit" <dco...@connx.com> wrote:
>>
>>
>> TIME OUT!
>>
>> Why does everyone seem to think the result should be obvious? This is one
>> case to me where I cannot see how the standard could *possibly* impose a
>> behavior. In the case (for this construct) when a[i] == i the assignment
>> will always *simultaneously* change both the value and the location where
>> the value should be stored. So (in the case where we are talking of i == 0
>> and a[i] == 0) when we say
>> a[a[i]] = 1;
>> does the 1 get stored into a[1] or a[0]?

a[i] MUST be calculated before the 1 can be stored. I take this
as part of the "fetch must precede store" claim.

>>It is not at all obvious to me.
>> One thing is clear, that there are no sequence points in this statement
>> (except the terminating semicolon). Therefore, we do not have an ordering
>> imposed on the operations. In fact, an expression like this:
>> a ^= b ^= a ^= b;
>> while undefined, seems to have a *more* plausible explanation

While the operations should be done in that order, the store doesn't
need to be. They could be kept in registers and stored in a different
order.
(snip)
>> TIME IN!

>I give up. And I am sorry if this is sounding to be too obvious. But,
>I guess I would be provided with the missing link.

>The only way I see any *abstract* implementation evaluating this kind
>of expression would be -- >
>1) Get the value of i
>2) Add a to the above value
>3) Get the value from the above generated address
>4) Add this value to a
>5) Move 1 to this address

I agree. I believe that the statements in the standard were
meant to describe other cases. Cases that would take too much
explanation to differentiate from this case.

>Is there any other abstract behaviour for the expression?

(snip).

Consider the related expressions, a[a[i]]++; and a[a[i]++];

Note that in both cases only one store is being done.

I can make arguments why either of these could fail without
a sequence point, yet writing a simple explanation for the
standard to differentiate them from a[a[i]]=1; is hard.

-- glen

James Kuyper Jr.

unread,

Jan 23, 2002, 8:50:08 PM1/23/02

to

Tom St Denis wrote:
...

> Threads are not part of the C spec. It is assumed [most likely] in the spec
> that programs execute in a serially dependent fashion.

The standard says that certain events must occur before other events,
but the rules it provides do not provide a unique ordering of events.
Except where it says otherwise, various events can happen
simultaneously. Finally, the as-if rule means that an implementation is
allowed to have the events actually happen in any order it chooses, so
long as it makes the observable effects match the results that would
result from a sequence that obey's the standard's ordering requirements.

> a[a[j]] = j;
>
> Is just as defined as
>
> a[some_func(a)] = 3;
>

> Tom

Incorrect. There are no sequence points in a[a[j]]=j. There are two
sequence points in a[some_func(a)] = 3, one seperating the call to
some_func() from the evaluation of a, and the second seperating the call
to some_func() from the use of its return value as an index into a. The
relevant rule explicitly states that it applies only between sequence
points, so the first statement has behavior that is not defined by the C
standard, while the behavior of the second one is well-defined
(depending upon the definition of some_func()).

Note: undefined behavior doesn't mean that anything will actually go
wrong. It means precisely this: the C standard does not define the
behavior, so it's legal for your program to behave any way the
implementation chooses to make it behave.

Generally, behavior is undefined by the C standard, because there are
(or could be) implementations where the behavior is something that most
users wouldn't expect. However, there's nothing wrong with the standard
stating that the behavior is undefined, even if there's no plausible
reason why it should cause problems.

t...@cs.ucr.edu

unread,

Jan 23, 2002, 9:10:21 PM1/23/02

to

In comp.std.c James Kuyper Jr. <kuy...@wizard.net> wrote:
: Dann Corbit wrote:
[...]

:> Why does everyone seem to think the result should be obvious? This is one
:> case to me where I cannot see how the standard could *possibly* impose a
:> behavior. In the case (for this construct) when a[i] == i the assignment
:> will always *simultaneously* change both the value and the location where
:> the value should be stored. So (in the case where we are talking of i == 0
:> and a[i] == 0) when we say
:> a[a[i]] = 1;
:> does the 1 get stored into a[1] or a[0]? It is not at all obvious to me.

[...[
: Each operator can only be evaluated after it's operands have been
: evaluated.

... which is to say that subexpressions must be evaluated before their
parents, e.g., that a[i] must be evaluated before a[a[i]], which in
turn must be evaluated before a[a[i]]=1. QED

This general principle has one (irrelevant?) exception, however: when
the result of an operator does not depend on the result of one of its
operands, the evaluation of that operand can be deferred past the
evaluation of the operator. (The deadline for the delivery of side
effects is the next sequence point in any case.)

Tom Payne

Dik T. Winter

unread,

Jan 23, 2002, 10:18:31 PM1/23/02

to

In article <3C4F6517...@wizard.net> "James Kuyper Jr." <kuy...@wizard.net> writes:

There is hardware around that does destructive reads. I.e. if you read the
value of a variable you should also restore it, otherwise it contains
nonsense. Adapting it to such hardware I get:

> 1. Retrieve the address of a.
> 2. Retrieve the value of i.

2a. Restore the value of i.

> 3. Add the values retrieved by steps 1 and 2.
> 4. Retrieve the value stored at the location identified in step 3.

4a. Restore the value stored at the location identified in step 3.

> 5. Add the value retrieved in step 1 to the value retrieved in step 4.
> 6. Store 1 in the location identified in step 5.

If 4 and 6 point to the same location, 4a might interfere. There is no
way to say that 4a should happen before 6.

James Kuyper Jr.

unread,

Jan 23, 2002, 11:03:47 PM1/23/02

to

"Dik T. Winter" wrote:
>
> In article <3C4F6517...@wizard.net> "James Kuyper Jr." <kuy...@wizard.net> writes:
>
> There is hardware around that does destructive reads. I.e. if you read the
> value of a variable you should also restore it, otherwise it contains
> nonsense. Adapting it to such hardware I get:

I'm not familiar with such hardware; either I've never used it, or I was
using software that made it emulate non-destructive reads. Is such
hardware common enough to be the reason why Dann considers this to be
obviously unsafe code?

> > 1. Retrieve the address of a.
> > 2. Retrieve the value of i.
> 2a. Restore the value of i.
> > 3. Add the values retrieved by steps 1 and 2.
> > 4. Retrieve the value stored at the location identified in step 3.
> 4a. Restore the value stored at the location identified in step 3.
> > 5. Add the value retrieved in step 1 to the value retrieved in step 4.
> > 6. Store 1 in the location identified in step 5.
>
> If 4 and 6 point to the same location, 4a might interfere. There is no
> way to say that 4a should happen before 6.

If delaying 4a has any potential optimization benefits, that sounds like
a plausible justification for prohibiting such code. However, I have no
idea whether that's the actual reason for the prohibition.

t...@cs.ucr.edu

unread,

Jan 23, 2002, 11:21:41 PM1/23/02

to

In comp.std.c Dik T. Winter <Dik.W...@cwi.nl> wrote:

: In article <3C4F6517...@wizard.net> "James Kuyper Jr." <kuy...@wizard.net> writes:

: There is hardware around that does destructive reads. I.e. if you read the
: value of a variable you should also restore it, otherwise it contains
: nonsense. Adapting it to such hardware I get:
: > 1. Retrieve the address of a.
: > 2. Retrieve the value of i.
: 2a. Restore the value of i.
: > 3. Add the values retrieved by steps 1 and 2.
: > 4. Retrieve the value stored at the location identified in step 3.
: 4a. Restore the value stored at the location identified in step 3.
: > 5. Add the value retrieved in step 1 to the value retrieved in step 4.
: > 6. Store 1 in the location identified in step 5.

: If 4 and 6 point to the same location, 4a might interfere. There is no
: way to say that 4a should happen before 6.

In such hardware, you get the same problem regardless of whether the
old value is used to compute (a) the new value or (b) the object's
lvalue. The Standard requires that implementations deal with problem
(a) and the same mechanisms could be used to deal with problem (b).

Tom Payne

Christian Bau

unread,

Jan 24, 2002, 2:38:34 AM1/24/02

to

"Dik T. Winter" wrote:
>
> In article <3C4F6517...@wizard.net> "James Kuyper Jr." <kuy...@wizard.net> writes:
>
> There is hardware around that does destructive reads. I.e. if you read the
> value of a variable you should also restore it, otherwise it contains
> nonsense.

You could also have an implementation that actively checks for undefined
behaviour.

As the C standard says, if an object is modified then it may be accessed
without intervening sequence point only to calculate the value being
stored. This rule makes the statement

a [i] = i++;

undefined behaviour. In this case, the compiler can obviously see that
there will be undefined behaviour. In the statement

a [(*p)] = (*q)++;

there is undefined behaviour if p == q. A compiler with hardware support
could actively check for this undefined behaviour in the following way:

Obviously *q is modified.
Compiler calculates address of the object *q, that is (q).
Compiler produces code that puts a hardware lock on the object *q, so
that any access to *q will crash the program.
Compiler evaluates the address of the lvalue a [(*p)]. This will crash
if p == q.
Compiler removes the hardware lock from object *q, then proceeds
normally.

I think that is an absolutely legal implementation. And the undefined
behaviour causes trouble because the compiler actively looks for
undefined behaviour. Now for the statement in question:

a [a [i]] = 1; /* Before execution, i = 0 and a [i] = 0 */

The compiler checking for undefined behaviour does the following:

Obviously a [a [i]] is modified.
Compiler determines the address of the object to be modified. It does
not actually evaluate & a[a[i]] to do this, but it simulates what would
be the result of that expression. Then it puts a hardware lock on the
resulting object, which happens to be a [0].
Compiler evaluates all code that does not determine the new value to be
stored while this hardware lock is active. Therefore the lvalue a [a
[i]] is evaluated; that means the address of a [a [i]] is calculated.
This calculation accesses a [0], which causes your computer to crash.
If the computer hadn't crashed, the next step would be removing the
hardware lock on object a [0] and storing the value 1 into it.

If the C Standard says that something causes undefined behaviour, then a
compiler is allowed to actively look for the situation and crash your
computer if undefined behaviour is detected. And such a compiler would
be very useful because it would find many bugs in everyones code.

On the other hand, I am quite sure that the undefined behaviour in this
case is not intended.

ke...@hplb.hpl.hp.com

unread,

Jan 24, 2002, 4:35:07 AM1/24/02

to

In article <a2n25...@enews4.newsguy.com>,

"Dann Corbit" <dco...@connx.com> writes:
> Why does everyone seem to think the result should be obvious? This is one
> case to me where I cannot see how the standard could *possibly* impose a
> behavior. In the case (for this construct) when a[i] == i the assignment
> will always *simultaneously* change both the value and the location where
> the value should be stored. So (in the case where we are talking of i == 0
> and a[i] == 0) when we say
> a[a[i]] = 1;
> does the 1 get stored into a[1] or a[0]? It is not at all obvious to me.

I think many people treat

A[E] = X

as requiring that the value E be computed before any storing into A[E]
is done. This looks like an entirely intuitive interpretation. The value
of a[i] can't change until the assignment is done; at that point, you know
*where* the assignment was done into, so any change to the index expression
becomes irrelevant.

They'd like a sequence point between the evaluation of E and the assignment
into A[E]. That seems reasonable to me (but then, I'm an avowed advocate
for defining the execution order in E + F as well; calibrate appropriately).

--
Chris "well-ordered" Dollin
C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html
C welcome: http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html

Tim Woodall

unread,

Jan 24, 2002, 7:20:09 AM1/24/02

to

On 24 Jan 2002 01:40:38 GMT,

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
> I can make arguments why either of these could fail without
> a sequence point, yet writing a simple explanation for the
> standard to differentiate them from a[a[i]]=1; is hard.
>

I think everyone is missing the point of undefined behaviour in the standard.

Once undefined behaviour is invoked all bets are off for _ANYTHING_ else
in the code. (There is some argument as to whether all bets are off before -
active dispute as to whether if(0) { i=*(int*)NULL; } is legal and before
anyone jumps in with a yes/no think about why this is and isn't different
from [p=NULL;] if(p) { i=*(int*)p; } and what the optimizer might do. AFAIK
this is unresolved by the standard BICBW.)

But even if we take the less stringent requirement that undefined behaviour
only occurs if the code is actually executed.

i=0; <= This is defined
a[i]=0; <= This is defined
a[a[i]]=1; <= We will assume that this does what we expect
i.e. i=0; a[0]=1 at this point.

i=a[i]; <= WHAM. The optimizer uses the intermediate result it
calculated in the previous statement.
i.e. i=0 after this statement.

Phil Tregoning

unread,

Jan 24, 2002, 8:39:41 AM1/24/02

to

"James Kuyper Jr." <kuy...@wizard.net> wrote in news:3C4F6517...@wizard.net:

>
> *(a+*(a+i)) = 1;
>
> This code must do several things:
>
> 1. Retrieve the address of a.
> 2. Retrieve the value of i.
> 3. Add the values retrieved by steps 1 and 2.
> 4. Retrieve the value stored at the location identified in step 3.
> 5. Add the value retrieved in step 1 to the value retrieved in step 4.
> 6. Store 1 in the location identified in step 5.

You seem to be assuming a RISCy implementation. Imagine a 16 bit
very CISC machine (were steps 4,5 & 6 can be done in one (long)
instruction. Imagine also that i is 32 bit. It could do it like
something like this:

1. Retrieve the address of a.

2. Retrieve the value of i.

3. Add the values retrieved by steps 1 and 2.

4A. Retrieve the value stored at the location identified in step 3.
4B. Add the value retrieved in step 1 to the value retrieved in step 4A.
4C. Store 1 in the location identified in step 4B.

5. Add 2 to the value identified in step 3.

6A. Retrieve the value stored at the location identified in step 5.
6B. Add the value retrieved in step 1 to the value retrieved in step 6A.
6C. Store 0 in the location identified in step 6B.

I bit inefficient maybe, but is it wrong? Some real machines can do
an horrendous amount of work with a single instruction (VAXen have
a single instruction that can do the same as 4A/B/C or 6A/B/C).

> Each operator can only be evaluated after it's operands have been
> evaluated. In other words, each step must occur after any other steps it
> refers to. The only permitted orders are therefore 1,2,3,4,5,6 and
> 2,1,3,4,5,6. The result is the same either way. If this were legal code,
> there would be no ambiguity in the ordering of those steps. You seem to
> imply that step 6 can occur before step 4. How?

By doing step 4, then half of step 6, then step 4 again, then the
other half of step 6. Simple.

Phil T

t...@cs.ucr.edu

unread,

Jan 24, 2002, 9:57:34 AM1/24/02

to

In comp.std.c ke...@hplb.hpl.hp.com wrote:
[...]
: I think many people treat

: A[E] = X

: as requiring that the value E be computed before any storing into A[E]
: is done. This looks like an entirely intuitive interpretation.

It's also possible that the value of the index gets inferred at
compile time; for example, consider A[0*E]. But that makes no
difference to the situation in question, because we are only
interested in getting the old value by whatever means.

: The value

: of a[i] can't change until the assignment is done; at that point, you know
: *where* the assignment was done into, so any change to the index expression
: becomes irrelevant.

Agreed.

: They'd like a sequence point between the evaluation of E and the assignment
: into A[E].

No. A sequence point is a deadline for the delivery of *side effects*
not values. The value of an expression must be obtained in time for
its use by its parent operator in the abstract syntax tree, while the
expression's side effects need only to (seem to be) delivered by its
nearest ancestral sequence point in the abstract syntax tree.

: That seems reasonable to me (but then, I'm an avowed advocate

: for defining the execution order in E + F as well; calibrate appropriately).

Nick Mclaren has (nearly) convinced me that important optimizations
for imporant programs would be lost if order of execution were
specified -- important enough to outweigh the obvious dangers imposed
by this unspecified behavior.

Tom Payne

James Kuyper Jr.

unread,

Jan 24, 2002, 9:55:36 AM1/24/02

to

Phil Tregoning wrote:
>
> "James Kuyper Jr." <kuy...@wizard.net> wrote in news:3C4F6517...@wizard.net:
>
> >
> > *(a+*(a+i)) = 1;
> >
> > This code must do several things:
> >
> > 1. Retrieve the address of a.
> > 2. Retrieve the value of i.
> > 3. Add the values retrieved by steps 1 and 2.
> > 4. Retrieve the value stored at the location identified in step 3.
> > 5. Add the value retrieved in step 1 to the value retrieved in step 4.
> > 6. Store 1 in the location identified in step 5.
>
> You seem to be assuming a RISCy implementation. Imagine a 16 bit

No, I'm trying to avoid making any assumptions, which means breaking it
down into as many seperate pieces as possible, to maximize the
opportunities for rearrangement. The net result looks like RISC, but I'm
really just referring to the abstract machine. You can insert spurious
extra operations, so long as they cancel each other out. That seems to
be what is being attempted below, but the spurious extra operations
never get cancelled out, and the true steps 5 and 6 never get achieved.

> very CISC machine (were steps 4,5 & 6 can be done in one (long)
> instruction. Imagine also that i is 32 bit. It could do it like
> something like this:
>
> 1. Retrieve the address of a.
>
> 2. Retrieve the value of i.
>
> 3. Add the values retrieved by steps 1 and 2.
>
> 4A. Retrieve the value stored at the location identified in step 3.
> 4B. Add the value retrieved in step 1 to the value retrieved in step 4A.
> 4C. Store 1 in the location identified in step 4B.

That's not the location that the C code specifies that the 1 should be
written to, except in the special case where i==a[i]. Are you implying
optimization based upon the compiler predicting the values of 'i' and
'a[i]' ? In that case, steps 1 through 4A would be unnecessary.

> 5. Add 2 to the value identified in step 3.

Which is an address that the C code makes no reference to.

> 6A. Retrieve the value stored at the location identified in step 5.

Which is a value that the C code doesn't ask to be retrieved.

> 6B. Add the value retrieved in step 1 to the value retrieved in step 6A.
> 6C. Store 0 in the location identified in step 6B.

Which writes a value that the C code doesn't ask to be written, to a
location that the C code doesn't refer to.

> I bit inefficient maybe, but is it wrong? Some real machines can do

Yes. Every step after 4B seems to constitute an randomly incorrect
implementation of the code. I can't even see why you think there's any
connection between those instructions and the C code as written. Of
course, the standard says that the behavior of this code is undefined -
however, if the standard required implementations to handle this code,
the meaning of the code is quite clear, and those instructions are
completely unrelated to a correct implementation of it.

Phil Tregoning

unread,

Jan 24, 2002, 11:23:14 AM1/24/02

to

"James Kuyper Jr." <kuy...@wizard.net> wrote in news:3C502101...@wizard.net:

> Phil Tregoning wrote:
>>
>> [A load of garbage]

>>
>
> Every step after 4B seems to constitute an randomly incorrect
> implementation of the code.

Yep.

Phil T

Andreas Schwab

unread,

Jan 24, 2002, 10:57:46 AM1/24/02

to

t...@pauli.locofungus.org (Tim Woodall) writes:

It can't, because you have to take aliasing into account, and a[x] aliases
a[y] if x == y (obviously). So this optimisation would be invalid. And
the sequence point makes sure that all side effects are completed.

Andreas.

--
Andreas Schwab, SuSE Labs, sch...@suse.de
SuSE GmbH, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Pai-Yi HSIAO

unread,

Jan 24, 2002, 1:13:52 PM1/24/02

to

On Thu, 24 Jan 2002, Tim Woodall wrote:
> I think everyone is missing the point of undefined behaviour in the standard.

> i=0; <= This is defined

> a[i]=0; <= This is defined
> a[a[i]]=1; <= We will assume that this does what we expect
> i.e. i=0; a[0]=1 at this point.

C FAQ has given an plausible interpretation for the clause 6.5 #2 that

This rule effectively constrains legal expressions to those

in which the accesses demonstrably precede the modification.

The key word in the interpretation which should be read is "demonstrably".
If that "a[i] needs to be retrived first to decide the lvalue a[a[i]]"
can be demonstrated, the expression "a[a[i]]=1;" is then well defined;
even though a[i] has int value equal to i.

In fact, the way and the only way to access to an object of the form a[x]
is to evaluate x first.

"a[a[i]]=1;" may be transformed to the canonical form as the examples in
annex D of thestandard:

*(a + *(a + i )) =1;

In this respect, there is no ambiguity about the order of the evaluations
in the sequence and hence
"a[a[i]]=1;" is well defined including the case while a[i]==i.

Someones may drop me a question that
"Modifying and evaluating the same object without an intervening
sequence point invokes undefined behavior".

I turn them back another question:
Is the expression "x=x;" defined?

In annex D, the standard gives many examples to show how to decide an
expressin satisfying the requirement of 6.5.

If you want to defend "a[a[i]]=1;" is undefined while a[i]==i,
please show us by the events analysis like the standard does in annex D
which event causes ambiguity?

paiyi

glen herrmannsfeldt

unread,

Jan 24, 2002, 1:48:37 PM1/24/02

to

"James Kuyper Jr." <kuy...@wizard.net> writes:

>result from a sequence that obey's the standard's ordering requirements.

>> a[a[j]] = j;
>>

(snip of function call sequence point discussion)

>Note: undefined behavior doesn't mean that anything will actually go
>wrong. It means precisely this: the C standard does not define the
>behavior, so it's legal for your program to behave any way the
>implementation chooses to make it behave.

>Generally, behavior is undefined by the C standard, because there are
>(or could be) implementations where the behavior is something that most
>users wouldn't expect. However, there's nothing wrong with the standard
>stating that the behavior is undefined, even if there's no plausible
>reason why it should cause problems.

I have not yet seen any suggestions of possible implementations of
this statement that do anything different than the obvious one.

I had to stretch to come up with possible undefined behavior
for a[a[j]]++; but that might be within the standard.

Presumably if a[j] != j when it is executed, and a[j] is well
defined then the statement is legal, and must work. So, is there
a possible machine implementation that this statement won't work
on, that is otherwise legal according to the standard?

-- glen

glen herrmannsfeldt

unread,

Jan 24, 2002, 2:38:04 PM1/24/02

to

t...@cs.ucr.edu writes:

(snip regarding a[a[i]]=1;)

>This general principle has one (irrelevant?) exception, however: when
>the result of an operator does not depend on the result of one of its
>operands, the evaluation of that operand can be deferred past the
>evaluation of the operator. (The deadline for the delivery of side
>effects is the next sequence point in any case.)

Yes. And one of the more common considerations is that stores can
be delayed to the sequence point, even if they don't need to.
Stores can't occur any earlier than the value to be stored is
computed, though. In this case, there is only one fetch from array
a, one store into it, and that store must occur after the fetch.

I wrote before about a[a[j]]++; which I suggested could generate
the wrong answer on some machines, according to this discussion.

There are memory systems where a read is destructive and so the
value must be written back into memory after it is read.
Usually this is done internally, but I could imagine a machine where
the rewrite is done explicitely. The C standard might allow the
rewrite to be moved to the sequence point. I had thought that it
didn't apply to the current case, but maybe it does.

We know that stores can be be delayed to the sequence point.
Could memory write back also be delayed to a sequence point?
It would seem that it would also fail for a[i]+a[j] when
i==j, so maybe that isn't true. Presumably a[i]+a[j] is
defined to work, so the compiler couldn't delay write back
in that case, but it could in a[a[j]]=1; case.

Could the a[j] restore be delayed until after the a[]=1 store?
It seems that this rule might allow it. When people start putting
core memory in RISC machines we can worry about this one.
restore could, wi

-- glen

glen herrmannsfeldt

unread,

Jan 24, 2002, 3:00:43 PM1/24/02

to

ke...@hplb.hpl.hp.com () writes:

>I think many people treat

> A[E] = X

>as requiring that the value E be computed before any storing into A[E]

>is done. This looks like an entirely intuitive interpretation. The value

>of a[i] can't change until the assignment is done; at that point, you know
>*where* the assignment was done into, so any change to the index expression
>becomes irrelevant.

OK, how about this. While we are used to register machines, fetch
A, fetch E, add, fetch X, store X into calculated location,
one could have a four address machine, where a[a[i]]=1 was one
instruction. It might be that the instruction is executed one
byte at a time, and the operands are refetched for each byte.
(a very limited amount of internal memory).

The closest I know to this condition is the decimal multiply and
divide on S/360, where the operands are all in memory (no decimal
registers) and they may be fetched multiple times during instruction
execution. This causes unusual effects if the operands overlap.
This seems not to be true in ESA/390, though.

-- glen

t...@cs.ucr.edu

unread,

Jan 24, 2002, 5:03:55 PM1/24/02

to

In comp.std.c glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
[...]
: one of the more common considerations is that stores can

: be delayed to the sequence point, even if they don't need to.

Agreed.

: Stores can't occur any earlier than the value to be stored is
: computed, though.

Agreed.

: In this case, there is only one fetch from array

: a, one store into it, and that store must occur after the fetch.

Agreed.

: I wrote before about a[a[j]]++; which I suggested could generate

: the wrong answer on some machines, according to this discussion.

By definition, no expression generates "wrong answers" under
conforming implementations. Per the Standard, the a[a[j]] has no
right or wrong results when the value a[j] is j -- in fact, the
Standard explicitly states that it does not attach any behavior to
such expressions.

: We know that stores can be be delayed to the sequence point.

Always.

: Could memory write back also be delayed to a sequence point?

: It would seem that it would also fail for a[i]+a[j] when
: i==j, so maybe that isn't true. Presumably a[i]+a[j] is
: defined to work, so the compiler couldn't delay write back
: in that case, but it could in a[a[j]]=1; case.

Agreed.

: Could the a[j] restore be delayed until after the a[]=1 store?

Yes. It make no difference when a[j] is not equal to j, and the
behavior is undefined (i.e., anything goes) when a[i] equals j.

: It seems that this rule might allow it. When people start putting

: core memory in RISC machines we can worry about this one.

I don't find this example a strongly compelling reason for prohibiting
using the old value of a variable in computing an lvalue that turns
out to be that of the variable. But it's the only reason I've seen
that makes technical sense to me.

Tom Payne

Christian Bau

unread,

Jan 24, 2002, 5:16:32 PM1/24/02

to

glen herrmannsfeldt wrote:
>
> I have not yet seen any suggestions of possible implementations of
> this statement that do anything different than the obvious one.

I think I posted one: Consider an implementation that actively tries to
detect undefined behaviour, and crashes your program as soon as
undefined behaviour is detected.

An example that makes this more clear:

typedef struct { int a [4]; int b; int c; } mystruct;

int main (void) {
mystruct x;
int i;

for (i = 0; i < 5; ++i) x.a [i] = 0;
return 0;
}

There is obviously undefined behaviour when the non-existent x.a[4] is
set to 0. In practice, however, nothing will go wrong when this code is
executed because the program will store to an address that is correctly
aligned and in the middle of the object x. But in this case, a compiler
that implements range checking will detect the store into x.a[4] and
complain about it at runtime.

The only difference is that in a [a [i]] = 1; with i = 0 and a [i] = 0
the undefined behaviour is most likely not intended by the standard but
accidently, by poor wording. In the example that I just gave people will
agree that the behaviour _should_ be undefined. But in both cases the
behaviour _is_ undefined.

Christian Bau

unread,

Jan 24, 2002, 5:32:41 PM1/24/02

to

t...@cs.ucr.edu wrote:
>
> <snipped>

> I don't find this example a strongly compelling reason for prohibiting
> using the old value of a variable in computing an lvalue that turns
> out to be that of the variable. But it's the only reason I've seen
> that makes technical sense to me.

I think the intention of the standard is that any use of an object that
"obviously" must preceede the modification of the object is intended to
be legal, and any use of the object where it cannot be decided whether
it happens before, after, or at the same time as a modification is
illegal.

One could try to express this by stating that any subexpression of an
expression "precedes" the containing expression, for example both the
lvalue a [a [i]] and the rvalue 1 preceede the assignment a [a [i]] = 1.
If I define this relation as transitive, then the expression a [i]
"preceedes" the whole assignment. The standard could then say that if an
object is modified and accessed in the same expression without
intervening sequence point, then the access must either precede the
modification or the access and modification must be caused by the same
operator (like ++, --, +=, -= etc. ).

A similar expression where one _could_ argue that behaviour is
technically undefined is

i = i * 0 + 1;

One _could_ argue that i is accessed, but it is not used to determine
the value that is stored, because that value is always 1, so technically
this would be undefined (I think that argument would be wrong, but it is
not completely unreasonable). The definition that I gave above would
state clearly that i*0 preceedes the assignment i = i*0 + 1, so this
would be definitely defined.

Barry Margolin

unread,

Jan 24, 2002, 6:16:42 PM1/24/02

to

In article <3C5087C1...@cbau.freeserve.co.uk>,

Christian Bau <christ...@cbau.freeserve.co.uk> wrote:
>glen herrmannsfeldt wrote:
>>
>> I have not yet seen any suggestions of possible implementations of
>> this statement that do anything different than the obvious one.
>
>I think I posted one: Consider an implementation that actively tries to
>detect undefined behaviour, and crashes your program as soon as
>undefined behaviour is detected.

That's a circular answer, since this implementation only does that because
the standard arbitrarily defines the statement as undefined. The question
on the table is why should the standard prohibit this statement in the
first place? The usual reasons for making something undefined are:

1) because you can't figure out a valid meaning for it; or
2) to avoid forcing a particular interpretation, thus allowing
implementation flexibility and optimizations.

I think it's pretty clear that #1 doesn't apply in this case, although it
may apply to some other "modify an object twice between sequence points"
cases. But I think the main reason for this restriction is #2. So are
there any possible implementations of a[a[i]]=1 that would be prevented by
defining it to have the obvious interpretation?

Someone posted a possible situation, where elements of a[] are 32 bits, and
the implementation uses 16-bit operations to update it. It updates the
low-order bits first, then does another fetch to get the value of a[i] so
it can do the high-order bits. This second fetch gets the new value rather
than the old one, so it ends up assigning to the high-order bits of the
wrong element.

James Kuyper Jr.

unread,

Jan 24, 2002, 7:12:22 PM1/24/02

to

glen herrmannsfeldt wrote:
...

> >> a[a[j]] = j;

...

> I have not yet seen any suggestions of possible implementations of
> this statement that do anything different than the obvious one.

A suggestion has been posted, involving an implementation with
destructive reads. Since, in the abstract machine, reads are not
destructive, any implementation of C on such a machine must write back
the value that was destroyed. However, a decent optimizing compiler for
such a machine would remove unnecessary write-backs, and would often
delay delayable ones; this can be justified using the as-if rule. The
algorithm it uses to detect delayable write-backs could incorrectly
identify this particular write-back as delayable, and such an algorithm
would still be legal, precisely because this code has undefined
behavior. Without bothering to detail such an algorithm, we can see why
it would be plausible for it to fail in this case. The algorithm could
have built into it somehow a shortcut based upon the assumption that the
"shall" in 6.5p2 will not be violated.

I'm not sure that's the reason why the behavior of this code was made
undefined, but such platforms are a plausible argument for making it so.

James Kuyper Jr.

unread,

Jan 24, 2002, 7:19:52 PM1/24/02

to

Pai-Yi HSIAO wrote:
...

> Someones may drop me a question that
> "Modifying and evaluating the same object without an intervening
> sequence point invokes undefined behavior".

That's not a question; it's an incorrect statement of a rule from the
standard.

> I turn them back another question:
> Is the expression "x=x;" defined?

Yes, because it doesn't violate a correct statement of the standard's
rule. Re-read it; it's been posted in this thread.

> If you want to defend "a[a[i]]=1;" is undefined while a[i]==i,
> please show us by the events analysis like the standard does in annex D
> which event causes ambiguity?

A couple of such examples have been posted now. One involves destructive
reads. The other involves reading 32-bit values using instructions that
retrieve only 16 bits at a time, and rearranging the natural order in
which those instructions are applied in a way that is legal only because
the behavior of this code is undefined.

Dik T. Winter

unread,

Jan 24, 2002, 8:53:54 PM1/24/02

to

Not the same mechanisms. Computation of the new value uses only stuff
on the right-hand side of the assignment. The object's lvalue is on
the left-hand side. With the current standard a compiler *must* do all
the restores needed for retrieves of values on the right hand side before
it can start even computing the left hand side (unless the compiler can
conclude that a restore is not needed).

Douglas A. Gwyn

unread,

Jan 24, 2002, 9:21:05 PM1/24/02

to

t...@cs.ucr.edu wrote:
> In comp.std.c Tom St Denis <tomst...@yahoo.com> wrote:
> : a[a[j]] = i;
> We are not. Keep in mind that "undefined behavior" simply means the
> that the Standard imposes no requirements on the behavior (of
> conforming implementations) in this situation. That has nothing to do
> with:
> - whether or not the intent of the programmer is clear and/or
> - whether or not every reasonable implementation will behave
> according to the clear and obvious intent.

Even if it is "obvious" by Tom St Denis thought he meant by that
source, in a real implementation one can have register caching,
vector instructions, etc. which all could make the fast code
that almost every user would expect to be generated produce an
effect different from what Tom thought it should be. The license
for implementations to be *efficient* is important in the success
of C as a systems programming language, and in order to prevent
excessive liberties being taken by code optimizers the standard
provides ways for programmers to *portably* and *reliably* express
what they intend; the most notable tool in this regard being the
sequence point guarantees.

Douglas A. Gwyn

unread,

Jan 24, 2002, 9:27:33 PM1/24/02

to

Dann Corbit wrote:
> ... I cannot see how the standard could *possibly* impose a behavior.

Oh, it *could*, e.g. by requiring execution strictly guided by the
syntax tree (bottom-up, left-to-right). Indeed that seems to be how
most people *think* about what the meaning "should" be. We didn't
actually require this because it would force noticeably slower code
to be generated in many cases where a reasonable programmer would
not be getting any benefit from the stricter requirement anyway.
Rather than penalize *everybody* by default, we support the best
code generation for the careful user by default, and when a more
strict order of evaluation is necessary for the correctness of some
algorithms there are ways for those programmers to obtain it.

Tom St Denis

unread,

Jan 24, 2002, 9:28:45 PM1/24/02

to

"Douglas A. Gwyn" <DAG...@null.net> wrote in message
news:3C50C19A...@null.net...

I just can't see how any valid compiler could not handle compiling

a[a[j]] = j;

I mean, there is absolutely *no* way to optimize the store in a way such
that a[j] is computed after the store. Provided that "j" is in bounds [i.e
not out in the middle of nowhere] this code should always execute two
fundamental steps

1. Look up the j'th element of a, call that p
2. Store j in p'th element of a

Regardless of how the code is compiled into machine code. Even if step #1
and #2 occur in multiple universes [with a really long LAN cable] step #2
must wait for step #1 to finish.

This whole discussion of "sequence points" is beyond me [I haven't read the
spec] but common sense dictates the order of predence is a key here. I
seriously believe you guys are just mistaken and are interpretting the
standard incorrectly. Otherwise alot of code would be undefined such as

a[func(a)] = 0;
*(a + *(b + i)) = j;

etc...Clearly those two have very defined behaviour [provided the variables
and functions are prototyped..etc..]

Tom

Douglas A. Gwyn

unread,

Jan 24, 2002, 10:10:17 PM1/24/02

to

Tom St Denis wrote:
> I mean, there is absolutely *no* way to optimize the store
> in a way such that a[j] is computed after the store.

But the *value* might be different after the store
(if j == a[j]), and if a swath of a vector is being
operated on it might be that nominally the "same"
element has different values at different places in
the pipeline. It is in order to avoid forcing the
generated code to have to pessimistically flush
pipelines etc. (nor to have to add a preliminary
test of whether j == a[j]) that the default policy is
to not require such pessimism by the implementation.

t...@cs.ucr.edu

unread,

Jan 24, 2002, 11:19:20 PM1/24/02

to

In comp.std.c Tom St Denis <tomst...@yahoo.com> wrote:

[...]
: I just can't see how any valid compiler could not handle compiling

: a[a[j]] = j;

: I mean, there is absolutely *no* way to optimize the store in a way such
: that a[j] is computed after the store.

Other posters have offered to scenarios where your conclusion could fail:

- memories with destructive readouts, where the write back of the original
value of the original value of a[j] follows the assignment of j to a[j].
(This causes no problem when the value of a[j] is not j.)

- implementations that look up a[j] twice, once to figure out where to
store the low-order bits of j and a second time to determine where to
store the high-order bits of j.

I don't like those implementations. I don't think they are sufficient
reason to prohibit assignments like those under discussion. In fact,
I don't believe that the standards committee thinks they are
sufficient reason to prohibit such assignments. (Rather, I'm
guessing/hoping that they're prohibited due to an oversight.) But
there is no escaping the fact that the Standard does declare such
assignment expressions to yield behavior that is not defined by the
Standard.

Tom Payne

Tom St Denis

unread,

Jan 24, 2002, 11:21:05 PM1/24/02

to

<t...@cs.ucr.edu> wrote in message news:a2qmc8$1rp$1...@glue.ucr.edu...

> In comp.std.c Tom St Denis <tomst...@yahoo.com> wrote:
> [...]
> : I just can't see how any valid compiler could not handle compiling
>
> : a[a[j]] = j;
>
> : I mean, there is absolutely *no* way to optimize the store in a way such
> : that a[j] is computed after the store.
>
> Other posters have offered to scenarios where your conclusion could fail:
>
> - memories with destructive readouts, where the write back of the
original
> value of the original value of a[j] follows the assignment of j to
a[j].
> (This causes no problem when the value of a[j] is not j.)
>
> - implementations that look up a[j] twice, once to figure out where to
> store the low-order bits of j and a second time to determine where to
> store the high-order bits of j.

Regardless of how you accomplish a[j], the order of the operations is
clearly defined.

Just like if you compute

unsigned long a, b, c, d;

d = (a * b) / c;

You could compute a*b using a set of 16x16=32 multiplications, and in some
contrived case the low order division could be thought correct to occur
before the set of 16-bit multplications form the [at least] 32-bit result.
But that wouldn't be the case. Clearly the a*b must be completed
[regardless of how you accomplish it] before you divide.

Similarly

a[a[j]] = j

Regardless of how you accomplish "a[j]" you must have the integer
preped-and-ready before the store can take place. Even if you shift bits
into an index register 3 at a time, etc...

Tom

Kaz Kylheku

unread,

Jan 24, 2002, 11:32:10 PM1/24/02

to

In article <a2qmc8$1rp$1...@glue.ucr.edu>, t...@cs.ucr.edu wrote:
>In comp.std.c Tom St Denis <tomst...@yahoo.com> wrote:
>[...]
>: I just can't see how any valid compiler could not handle compiling
>
>: a[a[j]] = j;
>
>: I mean, there is absolutely *no* way to optimize the store in a way such
>: that a[j] is computed after the store.
>
>Other posters have offered to scenarios where your conclusion could fail:

Note that a[a[j]] = j is semantically quite similar to

j->a->a = j

Suppose that you are using arrays to represent a linked list. Instead
of having a struct with a member called next, you have a next[] array,
and instead of a node pointer, you have an index. So instead of

node->next->next = node;

you write

next[next[node]] = node;

The only problem with this is that next[] can be all considered one
object. So it appears that the expression next[node] accesses the object
for a purpose other than determining the value to be stored. Namely, it
determines *what* part of that object shall receive the value. However
determining *where* something is stored is just as much of a dependency
as determining *what* to store. A store can't take place until the
where and the what are known, and the where and the what can't be known
until next[node] is evaluated on the left, and node on the right.

It's been a long time since I last worked through some examples using
C99's formal model of sequence points, but the intuition I gained from
doing so leads me to suspect that the model will bear this one out.

So at worst, I suspect that this expression is among those which
are correct according to the non-normative model, but incorrect
according to the normative text.

t...@cs.ucr.edu

unread,

Jan 25, 2002, 12:05:14 AM1/25/02

to

In comp.std.c Dik T. Winter <Dik.W...@cwi.nl> wrote:

: In article <a2o24l$61a$1...@glue.ucr.edu> t...@cs.ucr.edu writes:
[...]
: > : If 4 and 6 point to the same location, 4a might interfere. There is no

: > : way to say that 4a should happen before 6.
: >
: > In such hardware, you get the same problem regardless of whether the
: > old value is used to compute (a) the new value or (b) the object's
: > lvalue. The Standard requires that implementations deal with problem
: > (a) and the same mechanisms could be used to deal with problem (b).

[...]
: With the current standard a compiler *must* do all

: the restores needed for retrieves of values on the right hand side before
: it can start even computing the left hand side (unless the compiler can
: conclude that a restore is not needed).

And, the Standard does not currently impose a similar prohibition on
late restorations when the old value is used on the left-hand side.
My point is that, were such a prohibition adopted, compliance could be
achieved in the same way as compliance with the current prohibition.
In either case, the implementation must make sure that certain writes
precede certain other writes. Either that can be achieved or it
can't.

Tom Payne

t...@cs.ucr.edu

unread,

Jan 25, 2002, 12:20:01 AM1/25/02

to

In comp.std.c Tom St Denis <tomst...@yahoo.com> wrote:

[...]
: Regardless of how you accomplish a[j], the order of the operations is
: clearly defined.

Nonsense! When a[j] == j, the Standard defines the behavior of
a[a[j]]=i to be "undefined", i.e., anything goes.

: Just like if you compute

: unsigned long a, b, c, d;

: d = (a * b) / c;

Again nonsense! When c == 0, the Standard defines the behavior of (a*b)/c
to be "undefined" -- again anything goes.

[...]
: Similarly

: a[a[j]] = j

: Regardless of how you accomplish "a[j]" you must have the integer
: preped-and-ready before the store can take place.

Not when a[j] == j. In that case the Standard defines the resulting
behavior of evaluating a[a[j]]=j to be "undefined", and yet again
anything goes.

Tom Payne

Brian Raiter

unread,

Jan 25, 2002, 1:16:56 AM1/25/02

to

Tom St Denis <tomst...@yahoo.com> wrote:

> d = (a * b) / c;
>
> You could compute a*b using a set of 16x16=32 multiplications, and
> in some contrived case the low order division could be thought
> correct to occur before the set of 16-bit multplications form the
> [at least] 32-bit result. But that wouldn't be the case. Clearly
> the a*b must be completed [regardless of how you accomplish it]
> before you divide.

Imagine that the optimizer knew that it already had (a / c) in a
register, left over from the previous statement. It might be faster to
start calculating the value of (b / c), store the value of the
leftover register in d while the division is being done, and then
multiply d by (b/c), storing the result in d. Here, the multiplication
is done after all division operations are completed.

> Similarly
>
> a[a[j]] = j
>
> Regardless of how you accomplish "a[j]" you must have the integer
> preped-and-ready before the store can take place. Even if you shift
> bits into an index register 3 at a time, etc...

Okay. Imagine an assembly language which has 16-bit general registers
A and B, and 32-bit index registers X and Y. In order to make it
possible to manipulate pointers gracefully, the language has some
doubly-indirect addressing modes. Thus, "a[a[j]] = j" might be
compiled into something like this:

load X <= [addr_j] ; X equals j
calc Y <= addr_a + X << 2 ; Y points to a[j]
load A <= 1sthalf(X) ; A equals 1st half of j
load B <= 2ndhalf(X) ; A equals 2nd half of j
load [addr_a + [Y] << 2], A ; store 1st half in a[a[j]]
load [addr_a + 4 + [Y] << 2], B ; store 2nd half in a[????]

Such a processor might even be programmed to fault if you attempted to
read and write to the same memory location in a single instruction.

b

t...@cs.ucr.edu

unread,

Jan 25, 2002, 1:56:02 AM1/25/02

to

In comp.std.c Kaz Kylheku <k...@accton.shaw.ca> wrote:

: Note that a[a[j]] = j is semantically quite similar to

: j->a->a = j

: Suppose that you are using arrays to represent a linked list. Instead
: of having a struct with a member called next, you have a next[] array,
: and instead of a node pointer, you have an index. So instead of

: node->next->next = node;

: you write

: next[next[node]] = node;

: The only problem with this is that next[] can be all considered one
: object. So it appears that the expression next[node] accesses the object
: for a purpose other than determining the value to be stored. Namely, it
: determines *what* part of that object shall receive the value. However
: determining *where* something is stored is just as much of a dependency
: as determining *what* to store. A store can't take place until the
: where and the what are known, and the where and the what can't be known
: until next[node] is evaluated on the left, and node on the right.

Even if node[] is considered to be composed of many separate say
int's, we still run afoul of the Standard when next[node] == node, for
in that case we are using the old value of next[node] for a purpose
other than computing the new value of next[next[node]], i.e., of
next[node].

: It's been a long time since I last worked through some examples using

: C99's formal model of sequence points, but the intuition I gained from
: doing so leads me to suspect that the model will bear this one out.

: So at worst, I suspect that this expression is among those which
: are correct according to the non-normative model, but incorrect
: according to the normative text.

I would not be surprised if that were the case.

Tom Payne

ke...@hplb.hpl.hp.com

unread,

Jan 25, 2002, 5:12:57 AM1/25/02

to

In article <xp348.30642$I8.58...@news4.rdc1.on.home.com>,

"Tom St Denis" <tomst...@yahoo.com> writes:
>
> This whole discussion of "sequence points" is beyond me [I haven't read the
> spec] but common sense dictates the order of predence is a key here. I
> seriously believe you guys are just mistaken and are interpretting the
> standard incorrectly. Otherwise alot of code would be undefined such as

If you haven't *read* the spec, how on Earth would you know whether other
people have mis-interpreted it?

--
Chris "electric hedgehog" Dollin
C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html
C welcome: http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html

Pai-Yi HSIAO

unread,

Jan 25, 2002, 5:58:37 AM1/25/02

to

On Fri, 25 Jan 2002, James Kuyper Jr. wrote:
> Pai-Yi HSIAO wrote:
> > Someones may drop me a question that
> > "Modifying and evaluating the same object without an intervening
> > sequence point invokes undefined behavior".
>
> That's not a question; it's an incorrect statement of a rule from the
> standard.

You're right.
I misunderstood the standard.

> > If you want to defend "a[a[i]]=1;" is undefined while a[i]==i,
> > please show us by the events analysis like the standard does in annex D
> > which event causes ambiguity?
>
> A couple of such examples have been posted now. One involves destructive
> reads. The other involves reading 32-bit values using instructions that
> retrieve only 16 bits at a time, and rearranging the natural order in
> which those instructions are applied in a way that is legal only because
> the behavior of this code is undefined.

Thank you for your reply.
paiyi

Pai-Yi HSIAO

unread,

Jan 25, 2002, 7:21:10 AM1/25/02

to

On 25 Jan 2002 ke...@hplb.hpl.hp.com wrote:
> In article <xp348.30642$I8.58...@news4.rdc1.on.home.com>,
> "Tom St Denis" <tomst...@yahoo.com> writes:
> >
> > This whole discussion of "sequence points" is beyond me [I haven't read the
> > spec] but common sense dictates the order of predence is a key here. I
> > seriously believe you guys are just mistaken and are interpretting the
> > standard incorrectly. Otherwise alot of code would be undefined such as
>
> If you haven't *read* the spec, how on Earth would you know whether other
> people have mis-interpreted it?

Dear Tom St Denis,

I invite you to read the clause 6.5#2 again

"Furthermore, the prior value shall be accessed
only to determine the value to be stored.".

Christian Bau has given a good interpretation for it in this thread:
"if an object is modified then it may be accessed without intervening
sequence point only to calculate the value being stored."

This clause gives legality to write an expression like "i=i+5;" in the
language C.
Why?
Because we see the object "i" is going to be modified in the expression.
The ONLY way and the MOST way to use "i" in this expression is to
take its (old) value (before modifying) to do some calculation
and the obtained value will be stored to "i".

Now turn back to our discussion:

Is the expression "A[A[i]]=1;" defined in C?

Please forget the intuitive evolution ordre to do the assignement.
We need to check if the expression satifies the requirement of 6.5#2.

Two cases needed to be considered:
(1) A[i] != i and (2) A[i] == i
before the expression is evaluated.

Case(1): suppose A[i]==j which is not equal to i.
The object A[j] is going to be modified.
The old value (beore modifying) of A[j] can be *at most* used to
calculated the value which will store to A[j].
That is, A[j] can be appear *at most* on the right hand side of the
assignment operator "=".
In this case, A[j] do not appear on the right hand side of the assignment
operator. It doesn't violate the requirement of 6.5#2.

For the case(2), said A[i]==i,
the object A[i] is going to be modified.
Unfortunately, there is *another* usage of the object A[i] which happens
in the left hand side of the assignment to the object A[i].
That violates the constraint of 6.5#2.

Now you should recognize that the expression "A[A[i]]=1;" has some
marginal behavior (happening when A[i]==i) which the standard doesn't
allow.

While you accept it is undefined as A[i]==i, we can pass to another
subject why the standard puts this constraint, which has killed
the legality of some "should_be_legal" expression?

Should we loose it?

How to loose it?

Paiyi

t...@cs.ucr.edu

unread,

Jan 25, 2002, 8:15:43 AM1/25/02

to

In comp.std.c Pai-Yi HSIAO <hs...@ccr.jussieu.fr> wrote:
[...]
: Now you should recognize that the expression "A[A[i]]=1;" has some

: marginal behavior (happening when A[i]==i) which the standard doesn't
: allow.

: While you accept it is undefined as A[i]==i, we can pass to another
: subject why the standard puts this constraint, which has killed
: the legality of some "should_be_legal" expression?

I have prefered to think that the committee simply overlooked the facts
that:

1) The intended behavior for a[a[i]]=n is obvious, even when a[i] == i.

2) Object code for a[a[i]]=n that gives that intended behavior when
a[i] != i always gives that obvious and intended behavior when
a[i] == i.

3) The change to the wording of the Standard to remove the current
unnecessary restriction is minor and doesn't impact readability.

Various others who have posted in this thread have, however, disagreed
with each of these three claims. In particular, they pointed out that
claim #2 doesn't hold:

a) in possibly optimal implementations on architectures with destructive
readouts from memory,

b) for implemetations that store the low-order bits of n to a[a[i]] and
then attempt to recompute the address of a[a[i]] to determine where
to store the high-order bits of n.

These (somewhat aberrant) counter-examples refute the "always" part of
claim #2.

Tom Payne

m...@my-deja.com

unread,

Jan 25, 2002, 10:05:51 AM1/25/02

to

b...@drizzle.com (Brian Raiter) wrote:

> Tom St Denis <tomst...@yahoo.com> wrote:
>
> > d = (a * b) / c;
> >
> > You could compute a*b using a set of 16x16=32 multiplications, and
> > in some contrived case the low order division could be thought
> > correct to occur before the set of 16-bit multplications form the
> > [at least] 32-bit result. But that wouldn't be the case. Clearly
> > the a*b must be completed [regardless of how you accomplish it]
> > before you divide.
>
> Imagine that the optimizer knew that it already had (a / c) in a
> register, left over from the previous statement. It might be faster to
> start calculating the value of (b / c), store the value of the
> leftover register in d while the division is being done, and then
> multiply d by (b/c), storing the result in d. Here, the multiplication
> is done after all division operations are completed.

One too many divisons: you probably meant to multiply (a/c) by 'b',
not (b/c).

Since M. St. Denis' example invovled integers, any compiler which did
the above would produce code that would arrive at wrong answer given a
= 2, b = 3, c = 4. Given floating point representations, we are in a
similar situation as there _is_ a difference between the various ways
you can estimate the value of (a*b)/c.

> > Similarly
> >
> > a[a[j]] = j
> >
> > Regardless of how you accomplish "a[j]" you must have the integer
> > preped-and-ready before the store can take place. Even if you shift
> > bits into an index register 3 at a time, etc...
>
> Okay. Imagine an assembly language which has 16-bit general registers
> A and B, and 32-bit index registers X and Y. In order to make it
> possible to manipulate pointers gracefully, the language has some
> doubly-indirect addressing modes. Thus, "a[a[j]] = j" might be
> compiled into something like this:
>
> load X <= [addr_j] ; X equals j
> calc Y <= addr_a + X << 2 ; Y points to a[j]
> load A <= 1sthalf(X) ; A equals 1st half of j
> load B <= 2ndhalf(X) ; A equals 2nd half of j
> load [addr_a + [Y] << 2], A ; store 1st half in a[a[j]]
> load [addr_a + 4 + [Y] << 2], B ; store 2nd half in a[????]

This would not be a very pleasant machine to program, even with such
"graceful" pointer handling. The *very* common case:

a[j] = ...

would require the use of a temporary:

load [0 + [Y] << 2], A
load [4 + [Y] << 2], B

where Y holds the pointer to the temporary pointer. Now given that
one has already expended (considerable!) effort to use a temporary in
the most-common case, it stands to reason that a similar temporary can
be used for the

a[a[j]] = ...

case too. And like the (a*b)/c case above, any optimizer that removes
the temporary in the a[a[j]] case -- because there is a nifty
doubly-indirect mode -- can be said to produce the "wrong answer".

Of course, the Standard doesn't say anything like this, nor does it
offer opinions (such as mine) about the ease (or lack thereof) of
developing a program on such machinery. It only throws up it's hands
and mumbles about "undefined behaviour" -- even in plainly obvious
constructs like a[a[j]].

> Such a processor might even be programmed to fault if you attempted to
> read and write to the same memory location in a single instruction.

But are their any examples of such computers? How common are they?
If they all disappeared tomorrow, would anyone notice?

Even if there were, though, why is the Standard so willing to
accomodate such loathsome hardware? Is such a Standard useful at all?
Well, beyond serving as the Ultimate Authority for nit-picking legal
arguments in newsgroups? ;-)

Barry Margolin

unread,

Jan 25, 2002, 11:54:38 AM1/25/02

to

In article <ed548.1482$jb.1...@news2.calgary.shaw.ca>,

Kaz Kylheku <k...@ashi.footprints.net> wrote:
>Note that a[a[j]] = j is semantically quite similar to
>
> j->a->a = j
>
>Suppose that you are using arrays to represent a linked list. Instead
>of having a struct with a member called next, you have a next[] array,
>and instead of a node pointer, you have an index. So instead of
>
> node->next->next = node;
>
>you write
>
> next[next[node]] = node;

Just as this has undefined behavior only in the case where
next[node]==node, I think your structure assignment would have undefined
behavior if node->next == node.

Can someone confirm or deny this?

Brian Raiter

unread,

Jan 25, 2002, 1:25:08 PM1/25/02

to

>> Imagine that the optimizer knew that it already had (a / c) in a
>> register, left over from the previous statement. It might be faster to
>> start calculating the value of (b / c), store the value of the
>> leftover register in d while the division is being done, and then
>> multiply d by (b/c), storing the result in d. Here, the multiplication
>> is done after all division operations are completed.
>
> One too many divisons: you probably meant to multiply (a/c) by 'b',
> not (b/c).

Sorry; of course you're right. I should've kept silent.

>> Okay. Imagine an assembly language which has 16-bit general registers
>> A and B, and 32-bit index registers X and Y. In order to make it
>> possible to manipulate pointers gracefully, the language has some
>> doubly-indirect addressing modes. Thus, "a[a[j]] = j" might be
>> compiled into something like this:
>>
>> load X <= [addr_j] ; X equals j
>> calc Y <= addr_a + X << 2 ; Y points to a[j]
>> load A <= 1sthalf(X) ; A equals 1st half of j
>> load B <= 2ndhalf(X) ; A equals 2nd half of j
>> load [addr_a + [Y] << 2], A ; store 1st half in a[a[j]]
>> load [addr_a + 4 + [Y] << 2], B ; store 2nd half in a[????]
>
> This would not be a very pleasant machine to program, even with such
> "graceful" pointer handling. The *very* common case:
>
> a[j] = ...
>
> would require the use of a temporary:
>
> load [0 + [Y] << 2], A
> load [4 + [Y] << 2], B

Not necessarily. Use Y to point to j instead:

calc Y <= addr_j

load [addr_a + [Y] << 2], A

load [addr_a + 4 + [Y] << 2], B

> And like the (a*b)/c case above, any optimizer that removes the

> temporary in the a[a[j]] case -- because there is a nifty
> doubly-indirect mode -- can be said to produce the "wrong answer".

I think the requirements of the standard guarantee that use of the
"nifty" doubly-indirect mode should never produce a wrong answer,
(given the assumption that the optimizer doesn't use it across a
sequence point, or anything of that nature).

>> Such a processor might even be programmed to fault if you attempted to
>> read and write to the same memory location in a single instruction.
>
> But are their any examples of such computers? How common are they?
> If they all disappeared tomorrow, would anyone notice?

My point was simply to refute Tom St. Denis's argument that such a
situation could not occur. I was attempting to clarify the points made
by other people.

I offer no opinions about what the standard *ought* to say; even I can
see that I would be out of my depth in doing so.

b

Al Grant

unread,

Jan 25, 2002, 1:39:53 PM1/25/02

to

t...@cs.ucr.edu wrote in message news:<a2rlpv$b4u$1...@glue.ucr.edu>...

> a) in possibly optimal implementations on architectures with destructive
> readouts from memory,

By analogy, C might have had a "Rule X" that said that in

x = *p + *q;

it was illegal for p and q to alias. This would allow a compiler
for an architecture with destructive reads to load from *p and *q,
then rewrite *p and *q. This might be faster than the ordering
such a compiler would have to adopt given the normal C semantics.

However, it doesn't then follow that "C needed a Rule X to allow
optimal implementation on architectures with destructive reads."

> b) for implemetations that store the low-order bits of n to a[a[i]] and
> then attempt to recompute the address of a[a[i]] to determine where
> to store the high-order bits of n.

C gives no guarantee that E in
A[E] = n
can be evaluated multiple times. A compiler might do so, as an
optimization (rematerialization) if it can guarantee that the result
will not change and there are no side-effects. But it must be able
to use a temporary instead, in case E is side-effecting or volatile.
So the case of the write being done "in pieces" is already handled
by the general case. So again the justification for the rule seems
to be that it is important to enable a certain optimization for
certain programs on certain architectures. Is it?

Pai-Yi HSIAO

unread,

Jan 25, 2002, 2:04:36 PM1/25/02

to

On Fri, 25 Jan 2002 t...@cs.ucr.edu wrote:
> I have prefered to think that the committee simply overlooked the facts
> that:
>
> 1) The intended behavior for a[a[i]]=n is obvious, even when a[i] == i.
>
> 2) Object code for a[a[i]]=n that gives that intended behavior when
> a[i] != i always gives that obvious and intended behavior when
> a[i] == i.
>
> 3) The change to the wording of the Standard to remove the current
> unnecessary restriction is minor and doesn't impact readability.
>
> Various others who have posted in this thread have, however, disagreed
> with each of these three claims. In particular, they pointed out that
> claim #2 doesn't hold:
>
> a) in possibly optimal implementations on architectures with destructive
> readouts from memory,
>
> b) for implemetations that store the low-order bits of n to a[a[i]] and
> then attempt to recompute the address of a[a[i]] to determine where
> to store the high-order bits of n.
>
> These (somewhat aberrant) counter-examples refute the "always" part of
> claim #2.

It should be the responsability of the compiler writer for the special
architecture to make the complier comforming to the standard.
It should not be the standard makes compromise with some special
architectures and lets the clauses be written down
although I know many parts of the standard come from there.

It may be convenient to code the program with the increment and
decrement operators.
But it also makes it a very difficult task to define a strict and
logical evaluation order of an expression after having introduced the
operators to the language.
A direct consequence being found is the appearance of the clause 6.5#2.

paiyi

Douglas A. Gwyn

unread,

Jan 25, 2002, 5:41:38 PM1/25/02

to

t...@cs.ucr.edu wrote:
> Nonsense! When a[j] == j, the Standard defines the behavior of
> a[a[j]]=i to be "undefined", i.e., anything goes.

It wasn't made undefined behavior on a whim of the committee;
undefined behavior is identified as such whenever there is a real
possibility that some reasonable implementation on some reasonable
architecture could in fact have a significant problem producing a
specific defined behavior. There is a trade-off between exactly
defined behavior and efficient behavior, and since C's major role
is as a systems implementation language, we often specify the
trade off in the direction of efficient generated code, so long
as there is a reasonable way for programmers to obtain more
controlled behavior when that is what they want.

Barry Margolin

unread,

Jan 25, 2002, 6:12:08 PM1/25/02

to

In article <5765b025.02012...@posting.google.com>,

Al Grant <alg...@myrealbox.com> wrote:
>> b) for implemetations that store the low-order bits of n to a[a[i]] and
>> then attempt to recompute the address of a[a[i]] to determine where
>> to store the high-order bits of n.
>
>C gives no guarantee that E in
> A[E] = n
>can be evaluated multiple times. A compiler might do so, as an
>optimization (rematerialization) if it can guarantee that the result
>will not change and there are no side-effects. But it must be able
>to use a temporary instead, in case E is side-effecting or volatile.
>So the case of the write being done "in pieces" is already handled
>by the general case. So again the justification for the rule seems
>to be that it is important to enable a certain optimization for
>certain programs on certain architectures. Is it?

Right. The rule about not reading and modifying the same location between
sequence points gives it license to assume that the value of E won't change
as a result of the partial assignment. The compiler can often determine
that E has no side effects and doesn't involve any volatile variables. So
in this case it knows it's safe to recompute E rather than saving it in a
temporary, and if that's faster on some unusual hardware it would
presumably be a good idea for the optimizer to generate that code.

Barry Margolin

unread,

Jan 25, 2002, 6:16:32 PM1/25/02

to

In article <Pine.A41.4.10.102012...@moka.ccr.jussieu.fr>,

Pai-Yi HSIAO <hs...@ccr.jussieu.fr> wrote:
>It should be the responsability of the compiler writer for the special
>architecture to make the complier comforming to the standard.
>It should not be the standard makes compromise with some special
>architectures and lets the clauses be written down
>although I know many parts of the standard come from there.

But the standard should not be written in such a way that it prevents
useful optimizations with little benefit to the programmer.

Is 'a[a[i]] = j' really such an important thing to support that you need to
prevent some optimizations that are allowed by that clause of the standard?
If we have to require programmers to write 'temp=a[i];a[temp]=j' in rare
instances, is that such a burden?

The standard was written by compiler writers and users, and it's a
compromise between the needs of each.

t...@cs.ucr.edu

unread,

Jan 25, 2002, 6:54:44 PM1/25/02

to

In comp.std.c Al Grant <alg...@myrealbox.com> wrote:
: t...@cs.ucr.edu wrote in message news:<a2rlpv$b4u$1...@glue.ucr.edu>...

Elsewhere in this thread, Doug Gwyn wrote:

It wasn't made undefined behavior on a whim of the committee;
undefined behavior is identified as such whenever there is a real
possibility that some reasonable implementation on some reasonable
architecture could in fact have a significant problem producing a
specific defined behavior. There is a trade-off between exactly
defined behavior and efficient behavior, and since C's major role
is as a systems implementation language, we often specify the
trade off in the direction of efficient generated code, so long
as there is a reasonable way for programmers to obtain more
controlled behavior when that is what they want.

The key words here are "a real possibility that some reasonable
implementation on some reasonable architecture architecture could in
fact have a real problem". Your analysis of the two counter-examples
offered thus far shows that similar problems (optimizatin
opportunities) for the first the architecture have not been considered
compelling by the committee and that the second counter-example does
not involve a signficant problem.

So, I've still not seen an example of a significant problem with a
reasonable implementation on a reasonable architecture that is fixed
by this prohibition in the current wording.

Tom Payne

James Kuyper Jr.

unread,

Jan 25, 2002, 7:12:27 PM1/25/02

to

t...@cs.ucr.edu wrote:
...

> So, I've still not seen an example of a significant problem with a
> reasonable implementation on a reasonable architecture that is fixed
> by this prohibition in the current wording.

I'd recommend that you propose an alternative wording. The current
wording is intended to prohibit something that definitiely needs to be
prohibited; it incidentally also prohibits this. It might be a good idea
to find out just how much more complicated the wording would have to get
to distinguish the two cases.

Dik T. Winter

unread,

Jan 25, 2002, 7:39:46 PM1/25/02

to

In article <a2sr84$mdn$1...@glue.ucr.edu> t...@cs.ucr.edu writes:
> So, I've still not seen an example of a significant problem with a
> reasonable implementation on a reasonable architecture that is fixed
> by this prohibition in the current wording.

On the other hand, the current wording is simple. I presume that a
wording that would allow a[a[i]] = i but no other aberrations could
be quite complicated.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

NOSPA...@sebastian9.com

unread,

Jan 25, 2002, 8:21:17 PM1/25/02

to

According to James Kuyper Jr. <kuy...@wizard.net>:

Assuming the wording is the same as C89, what about adding the
words "and/or to determine the lvalue into which the value is
to be stored." after "only to determine the value to be stored."?

Or you could add a sequence point after the evaluation of both
sides of an assignment operator but before the side effect of
the assignment takes place (although I understand that some
implementors might object to this last).

I guess one possible complication with the first solution
is what happens with an expression with multiple assignments like:
a[a[i]] = a[a[j]] = k;
if a[j] == i, and all other locations referenced are distinct.
(This sets up a race condition between the update and reference to a[i],
even though this item is only updated once between the sequence points.)

One could argue that this expression would remain U.B. under my proposed
wording, because my wording refers to "the lvalue" and not "a lvalue",
and so applies only to single assignments, but perhaps the wording
needs to be tightened further.

--
Dave Wallace (Remove NOSPAM from my address to email me)
It is quite humbling to realize that the storage occupied by the longest
line from a typical Usenet posting is sufficient to provide a state space
so vast that all the computation power in the world can not conquer it.

NOSPA...@sebastian9.com

unread,

Jan 25, 2002, 9:16:02 PM1/25/02

to

According to Barry Margolin <bar...@genuity.net>:

> In article <Pine.A41.4.10.102012...@moka.ccr.jussieu.fr>,
> Pai-Yi HSIAO <hs...@ccr.jussieu.fr> wrote:
> >It should be the responsability of the compiler writer for the special
> >architecture to make the complier comforming to the standard.
> >It should not be the standard makes compromise with some special
> >architectures and lets the clauses be written down
> >although I know many parts of the standard come from there.
>
> But the standard should not be written in such a way that it prevents
> useful optimizations with little benefit to the programmer.
>
> Is 'a[a[i]] = j' really such an important thing to support that you need to
> prevent some optimizations that are allowed by that clause of the standard?
> If we have to require programmers to write 'temp=a[i];a[temp]=j' in rare
> instances, is that such a burden?

I think so. The problem is not just the limited set of cases in
which the standard as written actually declares the behavior undefined:
the problem is that in order to *guarantee* the absence of undefined
behavior, the programmer needs to avoid a much larger set of useful
constructs that are probably perfectly well defined, just because
he or she can't *prove* that the U.B. case does not occur.
Many double indirections in computing the LHS of an assignment are
potentially suspect.

For example, you can't write:

void foo(int a[], int b[], int i, int j)
{
a[b[i]] = j; /* Possible U.B. if a[] and b[] overlap and b[i] has the wrong
* value */
}

unless you can analyze all possible callers of foo and determine that
the problem case does not occur.

Also, the indirection may be hidden by macros whose expansions are not
immediately visible to the programmer, but hidden away in some header
file.

I think both:
a[INDEXMACRO(i)] =j;
and
MYMACRO(p)->next = q;
are potentially U.B. under the literal interpretation of the standard
unless you carefully examine the expansion of the
macros in question and determine that no bad indirection can occur
in this context. This involves understanding both the syntax of the code
and also the run-time details of how the data structures in question refer
to themselves.

When you couple the difficulty in recognizing this class of U.B. with
the fact that most implementations will in fact do the correct thing
even if the U.B. case occurs, I think that there is likely to be
a lot of this "potential undefined behavior" lurking in existing
code that will only be discovered if the code is ported to an
implementation that doesn't do the correct thing and the bad
case both occurs and is detected.

> The standard was written by compiler writers and users, and it's a
> compromise between the needs of each.

If it's an actual compromise, that's one thing. If on the other hand,
there are no compiler writers that are actually arguing that they
need this case to be U.B., and it's just an artifact of the existing
wording that no one knew how to improve, that's something
entirely different.

NOSPA...@sebastian9.com

unread,

Jan 25, 2002, 9:47:00 PM1/25/02

to

According to <t...@cs.ucr.edu>:
> In comp.std.c ke...@hplb.hpl.hp.com wrote:
> [...]
> : I think many people treat
>
> : A[E] = X
>
> : as requiring that the value E be computed before any storing into A[E]
> : is done. This looks like an entirely intuitive interpretation.
>
> It's also possible that the value of the index gets inferred at
> compile time; for example, consider A[0*E]. But that makes no
> difference to the situation in question, because we are only
> interested in getting the old value by whatever means.
>
> : The value
> : of a[i] can't change until the assignment is done; at that point, you know
> : *where* the assignment was done into, so any change to the index expression
> : becomes irrelevant.
>
> Agreed.
>
> : They'd like a sequence point between the evaluation of E and the assignment
> : into A[E].
>
> No. A sequence point is a deadline for the delivery of *side effects*
> not values. The value of an expression must be obtained in time for
> its use by its parent operator in the abstract syntax tree, while the
> expression's side effects need only to (seem to be) delivered by its
> nearest ancestral sequence point in the abstract syntax tree.

As I noted elsewhere in this thread, without the sequence point there
is a potential problem with relaxing the current rule to allow use
of the old value in computing the LHS when you have multiple assignments:
A[E] = E = X
has a race condition between the update to E and its use in evaluating A[E].
I think that any relaxation that allows the old value to be used
in the LHS probably needs to be restricted to single assignments,
unless a sequence point is introduced at the point of assignment.

Dik T. Winter

unread,

Jan 25, 2002, 10:13:44 PM1/25/02

to

In article <a2t5b4$if8$1...@news.panix.com> NOSPA...@sebastian9.com writes:
> I think that any relaxation that allows the old value to be used
> in the LHS probably needs to be restricted to single assignments,
> unless a sequence point is introduced at the point of assignment.

Restricting to single assignments might be difficult to word.
What about a[a[i]] = i + (j = i + j)? This would be disallowed?
Assignments do not only occur on the left hand side.

Introducing a sequence point at the assignment would be both
counter-intuitive and inhibit quite a few optimisations.

Douglas A. Gwyn

unread,

Jan 25, 2002, 10:44:09 PM1/25/02

to

NOSPA...@sebastian9.com wrote:
> behavior, the programmer needs to avoid a much larger set of useful

> constructs that are probably perfectly well defined, ...

Except they might in fact malfunction on some decent implementations.
If the programmer is convinced that a construct cannot malfunction on
any implementation his program will ever run on, he is free to code
the construct in a not strictly conforming manner.

> For example, you can't write:
> void foo(int a[], int b[], int i, int j)
> {
> a[b[i]] = j;
> }

> unless you can analyze all possible callers of foo and determine that
> the problem case does not occur.

Which is a fact of life. On the other hand, by using "restrict"
qualification on the pointer parameters, you can get the C99
conforming compiler to automatically generate diagnostics when
the function is used in a dangerous context.

NOSPA...@sebastian9.com

unread,

Jan 26, 2002, 4:17:30 AM1/26/02

to

According to Dik T. Winter <Dik.W...@cwi.nl>:

> In article <a2t5b4$if8$1...@news.panix.com> NOSPA...@sebastian9.com writes:
> > I think that any relaxation that allows the old value to be used
> > in the LHS probably needs to be restricted to single assignments,
> > unless a sequence point is introduced at the point of assignment.
>
> Restricting to single assignments might be difficult to word.
> What about a[a[i]] = i + (j = i + j)? This would be disallowed?
> Assignments do not only occur on the left hand side.
>
> Introducing a sequence point at the assignment would be both
> counter-intuitive and inhibit quite a few optimisations.

After further consideration, I think that the key idea is
that if a value is both read and written between sequence
points, the logic of the expression must force all reads
to occur before the write is initiated (since once the write
is initiated, the value of the object effectively becomes
indeterminate until the next sequence point). This occurs
if and only if the value is used in an expression in some
branch of the parse tree for the operator that initiates
the write. The current wording achieves this by requiring
it to occur in the branch that computes the value to be assigned,
but this is overly restrictive. The real restriction is
that the old value can only be used to determine either
the value to be stored or the location to be stored into by
the write in question, since either of these will compel the old
value to be computed before initiating the write. So your
expression above would be ok, but A[E] = E = X would not,
since it uses the old value of E to compute the address for
a *different* location than the object itself.

Francis Glassborow

unread,

Jan 26, 2002, 7:54:55 AM1/26/02

to

In article <a2t0ad$g7k$1...@news.panix.com>, NOSPA...@sebastian9.com
writes

>Or you could add a sequence point after the evaluation of both
>sides of an assignment operator but before the side effect of
>the assignment takes place (although I understand that some
>implementors might object to this last).

You bet they would. Why should extracting the value of an aggregate be
completed before copying (assigning) it to another object. That could be
extremely inefficient.

--
Francis Glassborow
Check out the ACCU Spring Conference 2002
4 Days, 4 tracks, 4+ languages, World class speakers
For details see: http://www.accu.org/events/public/accu0204.htm

Minti

unread,

Jan 26, 2002, 8:16:15 AM1/26/02

to

"James Kuyper Jr." <kuy...@wizard.net> wrote in message news:<3C50A54A...@wizard.net>...
> Pai-Yi HSIAO wrote:
> ...

> > Someones may drop me a question that
> > "Modifying and evaluating the same object without an intervening
> > sequence point invokes undefined behavior".
>
> That's not a question; it's an incorrect statement of a rule from the
> standard.
>

> > I turn them back another question:
> > Is the expression "x=x;" defined?
>
> Yes, because it doesn't violate a correct statement of the standard's
> rule. Re-read it; it's been posted in this thread.

>
> > If you want to defend "a[a[i]]=1;" is undefined while a[i]==i,
> > please show us by the events analysis like the standard does in annex D
> > which event causes ambiguity?
>
> A couple of such examples have been posted now. One involves destructive
> reads. The other involves reading 32-bit values using instructions that
> retrieve only 16 bits at a time, and rearranging the natural order in
> which those instructions are applied in a way that is legal only because
> the behavior of this code is undefined.

Ok, So I am finding it hard to find the 32/16 bit example from all
this collection. Could you tell me about the Name of the poster.

James Kuyper Jr.

unread,

Jan 26, 2002, 8:59:50 AM1/26/02

to

Minti wrote:
>
> "James Kuyper Jr." <kuy...@wizard.net> wrote in message news:<3C50A54A...@wizard.net>...

...

> > A couple of such examples have been posted now. One involves destructive
> > reads. The other involves reading 32-bit values using instructions that
> > retrieve only 16 bits at a time, and rearranging the natural order in
> > which those instructions are applied in a way that is legal only because
> > the behavior of this code is undefined.
>
> Ok, So I am finding it hard to find the 32/16 bit example from all
> this collection. Could you tell me about the Name of the poster.

The first mention of the idea was in a message by Phil Tregoning with
the header

Date: 24 Jan 2002 13:39:41 GMT

I even responded to that message, without noticing the 16-bit/32-bit
aspect of what he was talking about. Of course, Phil described it
incorrectly; he himself later referred to his own message as "a load of
garbage". The best explanation that's been made was by Brian Raiter, in
a message with the header

Date: 24 Jan 2002 22:16:56 -0800

However, I didn't notice the 16/32 aspect of either of those messages
until just now. I first noticed the idea when Barry Margolin summarized
it; he attributed it to "other posters", and didn't give any details,
but I saw immediately how it would work.

Pai-Yi HSIAO

unread,

Jan 26, 2002, 2:02:42 PM1/26/02

to

"Barry Margolin" <bar...@genuity.net> a écrit :
> Kaz Kylheku <k...@ashi.footprints.net> wrote:

> > node->next->next = node;

> Just as this has undefined behavior only in the case

> where next[node]==node, I think your structure assignment
> would have undefined behavior if node->next == node.
> Can someone confirm or deny this?

What you said is true.
But you only pointed out 'half' of the possible undefined
behaviors.

Please notice that :
If node->next->next == node, the above expression
"node->next->next = node;" is ALSO undefined.

Why?

If 'node->next->next' represents the object 'node',
the access to the prior value of 'node' can ONLY appear
on the right hand side of the assignment.
Unfortunately, the access to the prior value of 'node'
appears also on the left hand side of the assignment.
It violates the constraint of 6.5#2.

A direct consequence is that if we have a cyclic link:
node1, node2, node3,....nodeK, node1,
the standard prohibits us to code an expression like
"node1->next->....->next = node1;"
with '->next' repeating more than K times.

paiyi

Pai-Yi HSIAO

unread,

Jan 26, 2002, 3:13:26 PM1/26/02

to

According to the standard 6.5#2,
the expression
node1->next = node2;
has a marginal udefinded behavior while 'node1->next'
represents the object 'node1'.

It implies that as soon as the 'node1->next' has been set to
'node1', there is no legal way to change its value to
another one. Doesn't it?

paiyi

Dik T. Winter

unread,

Jan 26, 2002, 9:18:28 PM1/26/02

to

In article <a2ts7a$3pu$1...@news.panix.com> NOSPA...@sebastian9.com writes:
...

> The real restriction is
> that the old value can only be used to determine either
> the value to be stored or the location to be stored into by
> the write in question, since either of these will compel the old
> value to be computed before initiating the write.

Strictly reading, this rules out a[i] = i + 1; so the wording is still
not correct.

Micah Cowan

unread,

Jan 27, 2002, 2:36:00 AM1/27/02

to

Pai-Yi HSIAO <hs...@ccr.jussieu.fr> writes:

No. Here's the text you referenced:

Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be read only to

determine the value to be stored.

In your example above, there are two objects to consider (excluding
node2) - one is node1, which is a pointer to some unknown struct or union, and
the other is the 'next' field of the struct or union pointed to. Both
are entirely distinct objects. Most importantly, changing the object
'node1->next' does not comprise a modification to the object 'node1'
(in this case, a pointer) in any way, shape or form. node1 still
contains exactly the same information it did before, and has not been
set; it is the object node1 *points* to that has been altered.

Even if this weren't so (e.g., node1.next), there is only a single
modification happening - no problems here.

Micah

Pai-Yi HSIAO

unread,

Jan 27, 2002, 5:36:15 AM1/27/02

to

On Sun, 27 Jan 2002, Micah Cowan wrote:

>No. Here's the text you referenced: [snip]

>In your example above, there are two objects to consider
>(excluding node2) - one is node1, which is a pointer to some
>unknown struct or union, and the other is the 'next' field of
>the struct or union pointed to.
>Both are entirely distinct objects.

Yes. You are right. I've made the mistake. Thank you.

>Even if this weren't so (e.g., node1.next), there is only a single
>modification happening - no problems here.

Not so easy.

Does the expression
"node1.next = &node2;" has a marginal behavior which violates the
constraint of 6.5#2 while 'node1.next' points to node1?

'node1.next' is going to be modified that implies ALSO the object 'node1'
is going to be modified. 6.5#2 says the prior value of 'node1' can be used
ONLY to calculate some value to be stored (to 'node1'). Should the left
hand side of the assignment expression: 'node1.next' be regarded as an
access to the object 'node1' because 'node1.next' is equal to '*(&node1 +
offsetof(next))'?

paiyi