Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What language to address "tricky assignment statement"

17 views
Skip to first unread message

Tim Rentsch

unread,
Jan 7, 2005, 3:11:26 AM1/7/05
to
A recent thread in comp.lang.c discusses the question of whether

p = p->next = q;

evokes undefined behavior or not. There were advocates in each of the
two camps.

A crucial excerpt of language in the standard was this:

"Furthermore, the prior value shall be read only to determine the
value to be stored."

Although it's clear that the statement here is central to resolving
the question, it was not clear what conclusions are appropriate to
draw. The meta-conclusion is that the language in the standard
needs clarifying.

So, here are my questions:

1. What language should be drafted for the case where constructs
like the example given above _do_ cause undefined behavior?

2. What language should be drafted for the case where constructs
like the example given above do _not_ cause undefined behavior
(which presumably means that they have the "obvious" or "naive"
semantics)?

3. Should the next revision of the standard express (1) or (2)?
Or is there perhaps yet another alternative?


I don't mean to re-start here the debate about what the current
standard says about this question (which debate may still be ongoing
in comp.lang.c). I meant only to ask questions about how the language
in the standard might be clarified, and what that language should
express.

Jun Woong

unread,
Jan 7, 2005, 9:08:59 AM1/7/05
to

"Tim Rentsch" <t...@alumnus.caltech.edu> wrote in message news:kfn6529...@alumnus.caltech.edu...

> A recent thread in comp.lang.c discusses the question of whether
>
> p = p->next = q;
>
> evokes undefined behavior or not. There were advocates in each of the
> two camps.
>
> A crucial excerpt of language in the standard was this:
>
> "Furthermore, the prior value shall be read only to determine the
> value to be stored."
>
> Although it's clear that the statement here is central to resolving
> the question, it was not clear what conclusions are appropriate to
> draw. The meta-conclusion is that the language in the standard
> needs clarifying.
>
> So, here are my questions:
>
> 1. What language should be drafted for the case where constructs
> like the example given above _do_ cause undefined behavior?
>

It does now. What makes you think the above assignment does not invoke
undefined behavior? I can't say whether the formal model for the
sequence point, if the committee succeeds to complete it, would agree
with the current normative wording you cited, but anyway the CURRENT
one makes it invalid. Is the access to p in "p->next" ONLY for
determining the value to be stored in p? You think "no" then you admit
it's undefined behavior.

Note that the intent of the cited wording is to allow for

i = i + 1;

not to allow for the tricky assignments like one you showed.


--
Jun, Woong (woong at icu.ac.kr)
Information and Communications Univ.

Douglas A. Gwyn

unread,
Jan 7, 2005, 5:59:23 PM1/7/05
to
Tim Rentsch wrote:
> p = p->next = q;

> "Furthermore, the prior value shall be read only to determine the


> value to be stored."
> Although it's clear that the statement here is central to resolving
> the question, it was not clear what conclusions are appropriate to
> draw. The meta-conclusion is that the language in the standard
> needs clarifying.

It is clear that p->next reads p for a purpose other than
determining the value to be stored, namely to compute an
lvalue to store the value obtained by reading q, which
(after possible type conversion) will also be the value
stored in p.

Presumably the programmer intends that the value be stored
in p->next *before* p receives the value obtained from q,
but without any sequence point there is no guarantee that
that will be what the code actually does; the storage
operations might occur in either order or even concurrently.
For example, the final value of p might be stored from the
register caching the value lust loaded from q faster than
the address of p->next can be computed in another of the
concurrent processor pipelines, and it is possible that the
new value of p would be the one used in both pipelines,
especially if p is allocated to a register.

The only reason there is *any* exception to the sequence
point requirement is that otherwise code like i = 2*i + 1
would not be allowed, which would be a great inconvenience.
The phrasing used in the standard allows compilers to limit
the scope of the exception so that better code can be
generated overall.

In more complex situations, programmers are advised to make
use of sequence points whenever a particular sequencing of
operations is needed. As the push accelerates for faster
hardware and compiler-generated code, this will become
increasingly important. I'm not saying that that is a good
thing, just that it is inevitable.

> So, here are my questions:
> 1. What language should be drafted for the case where constructs
> like the example given above _do_ cause undefined behavior?

What do you mean, what language should be drafted? Were
you appointed to some drafting committee, or what?

> 2. What language should be drafted for the case where constructs
> like the example given above do _not_ cause undefined behavior
> (which presumably means that they have the "obvious" or "naive"
> semantics)?
> 3. Should the next revision of the standard express (1) or (2)?
> Or is there perhaps yet another alternative?

I don't follow the logic. Surely a given instance either
causes undefined behavior or it does not.

> I don't mean to re-start here the debate about what the current
> standard says about this question (which debate may still be ongoing
> in comp.lang.c). I meant only to ask questions about how the language
> in the standard might be clarified, and what that language should
> express.

The standard already expresses what was deliberately decided.

WG14 has already made progess on a couple of documents that
are intended to clarify the sequence point rules (but not to
change them). At least one of those may appear in a future
revision of the standard.

Brian Inglis

unread,
Jan 8, 2005, 1:30:11 PM1/8/05
to
On 07 Jan 2005 00:11:26 -0800 in comp.std.c, Tim Rentsch
<t...@alumnus.caltech.edu> wrote:

>A recent thread in comp.lang.c discusses the question of whether
>
> p = p->next = q;
>
>evokes undefined behavior or not. There were advocates in each of the
>two camps.

To decide whether this kind of construct is valid, consider whether
the alternative execution orders produce the same result:

p->next = q;
p = p->next;

p = p->next;
p->next = q;

What do you think?
If the alternative execution orders may not produce the same result,
the behaviour is undefined, and you need to introduce a sequence point
to ensure that the desired behaviour occurs.

>A crucial excerpt of language in the standard was this:
>
> "Furthermore, the prior value shall be read only to determine the
> value to be stored."

Some value of p is being used to locate another object to be stored.
So what do you think?

>Although it's clear that the statement here is central to resolving
>the question, it was not clear what conclusions are appropriate to
>draw. The meta-conclusion is that the language in the standard
>needs clarifying.
>
>So, here are my questions:
>
> 1. What language should be drafted for the case where constructs
> like the example given above _do_ cause undefined behavior?
>
> 2. What language should be drafted for the case where constructs
> like the example given above do _not_ cause undefined behavior
> (which presumably means that they have the "obvious" or "naive"
> semantics)?
>
> 3. Should the next revision of the standard express (1) or (2)?
> Or is there perhaps yet another alternative?

ISTM the current language is clear enough.

>I don't mean to re-start here the debate about what the current
>standard says about this question (which debate may still be ongoing
>in comp.lang.c). I meant only to ask questions about how the language
>in the standard might be clarified, and what that language should
>express.

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian....@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply

David Hopwood

unread,
Jan 8, 2005, 4:39:43 PM1/8/05
to
Brian Inglis wrote:
> Tim Rentsch <t...@alumnus.caltech.edu> wrote:
>
>>A recent thread in comp.lang.c discusses the question of whether
>>
>> p = p->next = q;
>>
>>evokes undefined behavior or not. There were advocates in each of the
>>two camps.
>
> To decide whether this kind of construct is valid, consider whether
> the alternative execution orders produce the same result:
>
> p->next = q;
> p = p->next;
>
> p = p->next;
> p->next = q;

This begs the question of whether the latter is a valid ordering. Why
should it be, given that it violates the usual semantics of statements
of the form x = y = z, in which the evaluation of z must causally precede
both assignments? As circumstantial evidence, this ordering is not
consistent with N843 Annex D (although it was dropped from the standard,
that is the nearest thing we have to a formal model).

The following would clearly be valid orderings:

temp = q;
p->next = temp;
p = temp;

and

temp = q;
p = temp;
p->next = temp;

but those produce the same result, assuming p->next and p are not
aliased.

<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n843.pdf>

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Wojtek Lerch

unread,
Jan 8, 2005, 8:12:32 PM1/8/05
to
"Brian Inglis" <Brian....@SystematicSW.Invalid> wrote in message
news:p590u058uib9akvt4...@4ax.com...

> On 07 Jan 2005 00:11:26 -0800 in comp.std.c, Tim Rentsch
> <t...@alumnus.caltech.edu> wrote:
>
>>A recent thread in comp.lang.c discusses the question of whether
>>
>> p = p->next = q;
>>
>>evokes undefined behavior or not. There were advocates in each of the
>>two camps.
>
> To decide whether this kind of construct is valid, consider whether
> the alternative execution orders produce the same result:
>
> p->next = q;
> p = p->next;
>
> p = p->next;
> p->next = q;

Would you also claim that

p = p_next = q;

is invalid because the results of

p_next = q;
p = p_next;

and

p = p_next;
p_next = q;

are different?


Douglas A. Gwyn

unread,
Jan 9, 2005, 5:34:44 AM1/9/05
to
It's not that the order of operations can vary,
it's (mainly) that the net result depends on that order.

David Hopwood

unread,
Jan 9, 2005, 2:16:58 PM1/9/05
to
Douglas A. Gwyn wrote:
> It's not that the order of operations can vary,
> it's (mainly) that the net result depends on that order.

Except that in this example, the only valid orderings are

temp = q;
p->next = temp;
p = temp;

and

temp = q;
p = temp;
p->next = temp;

Provided p and p->next are not aliased, the net result does not
depend on that order.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

lawrenc...@ugs.com

unread,
Jan 9, 2005, 6:09:16 PM1/9/05
to
David Hopwood <david.nosp...@blueyonder.co.uk> wrote:
>
> Provided p and p->next are not aliased, the net result does not
> depend on that order.

It does if the implicit &(p->next) is done after the assignment to p
rather than before.

-Larry Jones

I don't think that question was very hypothetical at all. -- Calvin

Douglas A. Gwyn

unread,
Jan 10, 2005, 12:08:21 AM1/10/05
to
David Hopwood wrote:
> Douglas A. Gwyn wrote:
>> It's not that the order of operations can vary,
>> it's (mainly) that the net result depends on that order.
> Except that in this example, the only valid orderings are
> temp = q;
> p->next = temp;
> p = temp;
> and
> temp = q;
> p = temp;
> p->next = temp;
> Provided p and p->next are not aliased, the net result does not
> depend on that order.

There is likely to be quite a difference between
p->next = q;
and
q->next = q;

Also see Larry Jones's response.

David Hopwood

unread,
Jan 10, 2005, 8:44:00 PM1/10/05
to
lawrenc...@ugs.com wrote:
> David Hopwood <david.nosp...@blueyonder.co.uk> wrote:
>
>>Provided p and p->next are not aliased, the net result does not
>>depend on that order.
>
> It does if the implicit &(p->next) is done after the assignment to p
> rather than before.

I stand corrected. The key clause here is C99 6.5.16 #4,
"The order of evaluation of the operands [of assignment] is unspecified."

In more detail:

"p = p->next = q;" is equivalent to "p = (p->next = q);"
(by right-associativity of =), which comprises the following events:

(a) t1 = q
(b) t2 = &(p->next) [unnecessary to break down further for this example]
(c) *t2 = t1
(d) p = t1

with the following causal ordering:

(a) precedes (c)
(a) precedes (d)
(b) precedes (c)

but it is not possible to infer that (b) precedes (d), because of
6.5.16 #4.

Therefore the behaviour of "p = p->next = q;" and of "p->next = p = q;",
are equivalent, and undefined due to C99 6.5 #2. Specifically, between
two consecutive sequence points, "p->next = p = q;" reads the value p
(in the expression p->next) other than to determine the value to be
stored into p.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Charlie Gordon

unread,
Jan 12, 2005, 9:49:01 AM1/12/05
to
"Brian Inglis" <Brian....@SystematicSW.Invalid> wrote in message
news:p590u058uib9akvt4...@4ax.com...
> On 07 Jan 2005 00:11:26 -0800 in comp.std.c, Tim Rentsch
> <t...@alumnus.caltech.edu> wrote:
>
> >A recent thread in comp.lang.c discusses the question of whether
> >
> > p = p->next = q;
> >
> >evokes undefined behavior or not. There were advocates in each of the
> >two camps.
>
> To decide whether this kind of construct is valid, consider whether
> the alternative execution orders produce the same result:
>
> p->next = q;
> p = p->next;
>
> p = p->next;
> p->next = q;
>
> What do you think?
> If the alternative execution orders may not produce the same result,
> the behaviour is undefined, and you need to introduce a sequence point
> to ensure that the desired behaviour occurs.

Is anyone aware of validation tools that would detect such subtle examples of
undefined behaviour ?

Chqrlie.


Christian Bau

unread,
Jan 12, 2005, 5:40:12 PM1/12/05
to
In article <cs3c19$6b3$1...@reader1.imaginet.fr>,
"Charlie Gordon" <ne...@chqrlie.org> wrote:

In this case, the undefined behavior is not the problem. Even if there
was no undefined behavior, of the two obvious possibilities for behavior
one will be a bug in your code, and not a subtle one.

Charlie Gordon

unread,
Jan 13, 2005, 7:29:38 AM1/13/05
to
"Christian Bau" <christ...@cbau.freeserve.co.uk> wrote in message
news:christian.bau-63E...@slb-newsm1.svr.pol.co.uk...

I don't agree : try this around you, ask fellow programmers what problem they
see in { p = p->next = q; }...
I would be quite surprised if more than a very small minority catch the real
issue.
This *is* a subtle problem.
The problem is related to undefined behaviour, more precisely to the fact that
most compilers will produce code for the first possibility (arguably what the
programmer intended) and very few for the second, causing an obscure bug when
porting the application or in situations where circumstances change potentially
behind the back of the programmer :
- change of compiler
- change of compiler version
- change of target
- change in compilation options : optimisation, debug...
- change in surrounding code
- probably many more...

There are many subtle problems in the C language, there is nothing wrong with
using tools to try and detect some of them. In fact it is stupid not to take
advandage of compiler warnings (gcc -Wall) or more elaborate tools when
available. Whether { p[i++] = i; } is more or less of a subtle issue than
strcpy(p, p + 1); } or { p = p->next = q; } can be argued aimlessly, my
question is : are there tools to detect these instances of undefined behaviour ?

Chqrlie.


Thomas Pornin

unread,
Jan 13, 2005, 7:56:20 AM1/13/05
to
According to Charlie Gordon <ne...@chqrlie.org>:

> Whether { p[i++] = i; } is more or less of a subtle issue than
> strcpy(p, p + 1); } or { p = p->next = q; } can be argued aimlessly,
> my question is : are there tools to detect these instances of
> undefined behaviour ?

The DEC (now Compaq now HP) C compiler for Alpha machines under OSF/1
(now Tru64) used to be able to detect some of those. If you wrote
"p[i ++] = i", then you would get a warning message (which included
a precise reference to the C standard !).

Note, however, that detecting all those occurrences is not possible in
the general case; all you can get is an approximation for some of those
instances, those that can be detected by an automatic tool.


--Thomas Pornin

Charlie Gordon

unread,
Jan 13, 2005, 9:09:38 AM1/13/05
to
"Thomas Pornin" <por...@nerim.net> wrote in message
news:cs5r5k$ap7$1...@biggoron.nerim.net...

While tools will only detect a subset of such constructs, I don't see why
detecting undefined behaviour constructs in C source would be impossible in the
general case...This has nothing to do with analysing the semantics of programs
or proving them, except maybe for the case of the dubious restrict keyword.

Chqrlie.


Richard Bos

unread,
Jan 13, 2005, 9:57:47 AM1/13/05
to
"Charlie Gordon" <ne...@chqrlie.org> wrote:

> "Thomas Pornin" <por...@nerim.net> wrote in message
> news:cs5r5k$ap7$1...@biggoron.nerim.net...

> > The DEC (now Compaq now HP) C compiler for Alpha machines under OSF/1
> > (now Tru64) used to be able to detect some of those. If you wrote
> > "p[i ++] = i", then you would get a warning message (which included
> > a precise reference to the C standard !).
> >
> > Note, however, that detecting all those occurrences is not possible in
> > the general case; all you can get is an approximation for some of those
> > instances, those that can be detected by an automatic tool.
>
> While tools will only detect a subset of such constructs, I don't see why
> detecting undefined behaviour constructs in C source would be impossible in the
> general case...This has nothing to do with analysing the semantics of programs
> or proving them, except maybe for the case of the dubious restrict keyword.

Hohum.

if (statement equivalent to the halting problem)
x=y;
else
x=y+4;
p[*x++] = *y++;

Have fun proving the halting problem.

Richard

Charlie Gordon

unread,
Jan 14, 2005, 3:56:58 AM1/14/05
to
"Richard Bos" <r...@hoekstra-uitgeverij.nl> wrote in message
news:41e68bf2...@news.individual.net...

You probably mean :

p[(*x)++] = (*y)++;

This is an aliasing issue. I agree that aliasing problems and the use of the
restrict keyword to try and hint the compiler about ignoring them is well beyond
analysing and proof.
Can you show an example of undetectable UB without aliasing ?

Chqrlie.


James Kuyper

unread,
Jan 14, 2005, 12:18:16 PM1/14/05
to
Charlie Gordon wrote:
> "Richard Bos" <r...@hoekstra-uitgeverij.nl> wrote in message
> news:41e68bf2...@news.individual.net...
>
>>"Charlie Gordon" <ne...@chqrlie.org> wrote:
...

>>>While tools will only detect a subset of such constructs, I don't see why
>>>detecting undefined behaviour constructs in C source would be impossible in
>
> the
>
>>>general case...This has nothing to do with analysing the semantics of
>
> programs
>
>>>or proving them, except maybe for the case of the dubious restrict keyword.
>>
>>Hohum.
>>
>> if (statement equivalent to the halting problem)
>> x=y;
>> else
>> x=y+4;
>> p[*x++] = *y++;
>>
>>Have fun proving the halting problem.
>
>
> You probably mean :
>
> p[(*x)++] = (*y)++;
>
> This is an aliasing issue. I agree that aliasing problems and the use of the
> restrict keyword to try and hint the compiler about ignoring them is well beyond
> analysing and proof.
> Can you show an example of undetectable UB without aliasing ?

The key feature that makes aliasing a relevant example here, is the fact
that whether or not it has undefined behavior depends upon the value of
the variables involved in the expression. It's always possible to set up
situations where the compiler can't tell in advance whether or not the
value is one that will trigger the problem. That's not a rare thing, in
fact it's quite common:


int *p1 = malloc(sizeof int);
int i = -1;
unsigned j,k=0xFFFF;
int *p2;

void (*p3)(void);
void f(void){ };
int g(void) { return 0;}
int *p4;
int *p5;
const int ci;
volatile int vi;


if (statement equivalent to halting problem)
{
free(p1);

*(char*)&i = 0;
// For this implementation, creates a trap representation

j = 0x8000; // a value which, xor'd with -1, which would produces
// (on this implementation) a representation of negative 0, except
// that this implementation doesn't support negative 0. Also, on this
// implementation, j > SHRT_MAX

p2 = &i; // (uintptr_t)&i happens to be > INT_MAX
// &i happens to be incorrectly aligned for type long*
p3 = (void (*)(void))g;
p4 = &ci;
p5 = &vi;
}
else
{
j = 0;
p2 = &j; // (uintptr_t)&j happens to be < INT_MAX
// &j happens to be correctly aligned for type long*
p3 = f;
p4 = &i;
p5 = &i;
}

*p1 = 1; // 6.2.4p2
i; // 6.2.6.1p5
k ^ j; // 6.2.6.2p4
(short)j; // 6.3.1.5p2
(int)p2; // 6.3.2.3p6
(long*)p2; // 6.3.2.3p7
*p3(); // 6.3.2.3p8
15 >> j; // 6.5.7p3
p2 > &j; // 6.5.8p5
*p4 = 1; // 6.7.3p5
*p5 = 2; // 6.7.3p5

This list is based only upon section 6, and only on those cases where
the standard contains the word "undefined". A whole additional range of
examples could be put together by looking for the word "shall", whenever
it occurs outside of constraints sections, and a few dozen more cases
could be constructed using the standard library.

This code was constructed rather hurriedly, and therefore might contain
a few mistakes, but the principle it demonstrates is unaffected by those
mistakes.

David Hopwood

unread,
Jan 14, 2005, 9:03:37 PM1/14/05
to
James Kuyper wrote:
> The key feature that makes aliasing a relevant example here, is the fact
> that whether or not it has undefined behavior depends upon the value of
> the variables involved in the expression. It's always possible to set up
> situations where the compiler can't tell in advance whether or not the
> value is one that will trigger the problem.

Note, however, that this does not imply that an implementation could not
define all behaviour (by implementing bounds checking, GC, deterministic
evaluation order, etc.); just that it cannot in general diagnose when
behavior is undefined by the C standard.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Tim Rentsch

unread,
Jan 15, 2005, 11:50:12 AM1/15/05
to
"Jun Woong" <wo...@icu.ac.kr> writes:

> "Tim Rentsch" <t...@alumnus.caltech.edu> wrote in message news:kfn6529...@alumnus.caltech.edu...
> > A recent thread in comp.lang.c discusses the question of whether
> >
> > p = p->next = q;
> >
> > evokes undefined behavior or not. There were advocates in each of the
> > two camps.
> >
> > A crucial excerpt of language in the standard was this:
> >
> > "Furthermore, the prior value shall be read only to determine the
> > value to be stored."
> >
> > Although it's clear that the statement here is central to resolving
> > the question, it was not clear what conclusions are appropriate to
> > draw. The meta-conclusion is that the language in the standard
> > needs clarifying.
> >
> > So, here are my questions:
> >
> > 1. What language should be drafted for the case where constructs
> > like the example given above _do_ cause undefined behavior?
> >
>
> It does now. What makes you think the above assignment does not invoke
> undefined behavior?

What I think is that the question of whether the assignment evokes
undefined behavior is unclear - which is to say, that reasonable
people may reasonably reach different conclusions about whether the
standard mandates undefined behavior or not.

Having said that, let me temporarily play devil's advocate in response
to the rest of the posting here. But please also consult the thread
in comp.lang.c, eg, comments by Lawrence Kirby.


> I can't say whether the formal model for the
> sequence point, if the committee succeeds to complete it, would agree
> with the current normative wording you cited, but anyway the CURRENT
> one makes it invalid.

A formal model in an earlier draft standard - N843 Annex D - clearly
makes the behavior well defined. The current language is unclear.


> Is the access to p in "p->next" ONLY for
> determining the value to be stored in p? You think "no" then you admit
> it's undefined behavior.

The statement about reading the prior value is expressed in an
informal language - which is to say English prose - that does not have
precise semantics. The wording around the word 'only' is sufficiently
imprecise that the question is arguable. (Reading ahead a bit, I may
elaborate more on this in responses to subsequent postings.)


> Note that the intent of the cited wording is to allow for
>
> i = i + 1;
>
> not to allow for the tricky assignments like one you showed.

The cited wording is a restriction. Knowing that the cited wording
intends to allow 'i = i + 1;' doesn't really tell us anything; what
matters is what sort of constructs it intends to prohibit. Pretty
clearly, a construct like

j = (i = 1) + i;

violates the restriction intended. But the access of 'i' in this
example is not like the access of 'p' in the statement under
discussion.

Again, my position is that the current language is unclear. I think
reasonable people have made reasonable arguments on both sides. I
wanted to ask what language would more clearly express each of the two
positions. Is this so hard to understand?

Wojtek Lerch

unread,
Jan 15, 2005, 1:16:33 PM1/15/05
to
"Tim Rentsch" <t...@alumnus.caltech.edu> wrote in message
news:kfnbrbq...@alumnus.caltech.edu...

>> "Tim Rentsch" <t...@alumnus.caltech.edu> wrote in message
>> news:kfn6529...@alumnus.caltech.edu...
>> > A crucial excerpt of language in the standard was this:
>> >
>> > "Furthermore, the prior value shall be read only to determine the
>> > value to be stored."
>
> The cited wording is a restriction. Knowing that the cited wording
> intends to allow 'i = i + 1;' doesn't really tell us anything; what
> matters is what sort of constructs it intends to prohibit. Pretty
> clearly, a construct like
>
> j = (i = 1) + i;
>
> violates the restriction intended. But the access of 'i' in this
> example is not like the access of 'p' in the statement under
> discussion.

Even something seemingly innocent like this violates the restriction:

i = ( j = i ) + 1;

I wonder if that was intentional?...


Wojtek Lerch

unread,
Jan 15, 2005, 7:34:18 PM1/15/05
to
<lawrenc...@ugs.com> wrote in message
news:g6k8b2-...@jones.homeip.net...

> David Hopwood <david.nosp...@blueyonder.co.uk> wrote:
>>
>> Provided p and p->next are not aliased, the net result does not
>> depend on that order.
>
> It does if the implicit &(p->next) is done after the assignment to p
> rather than before.

This is possible because the new value is stored at some unspecified point
between the previous and the next sequence points, not necessarily after the
evaluation of the right operand of the assignment is complete, right?

If we go one step further, doesn't that also mean that the new value could
sometimes be stored before the evaluation of the right argument even starts?
For instance, couldn't "x = 10-x" store 5 in x without paying attention to
its previous value?

;-)


Tim Rentsch

unread,
Jan 15, 2005, 10:30:55 PM1/15/05
to
Preface:

Again, my position is that the existing language is unclear - that


reasonable people may reasonably reach different conclusions about

whether the standard mandates undefined behavior or not. I will again
temporarily play devil's advocate for the other side to help the
discussion.

Also I'd like to express thanks to Doug Gwyn for taking the time
to articulate his comments. I hope I've done as good a job here in
formulating my response.


"Douglas A. Gwyn" <DAG...@null.net> writes:

> Tim Rentsch wrote:
> > p = p->next = q;
>
> > "Furthermore, the prior value shall be read only to determine the
> > value to be stored."
> > Although it's clear that the statement here is central to resolving
> > the question, it was not clear what conclusions are appropriate to
> > draw. The meta-conclusion is that the language in the standard
> > needs clarifying.
>
> It is clear that p->next reads p for a purpose other than
> determining the value to be stored, namely to compute an
> lvalue to store the value obtained by reading q, which
> (after possible type conversion) will also be the value
> stored in p.

That might be true, but the cited language doesn't say anything
about "purpose". The precise meaning of the cited language
is slippery. We will take that up below.


> Presumably the programmer intends that the value be stored
> in p->next *before* p receives the value obtained from q,
> but without any sequence point there is no guarantee that
> that will be what the code actually does; the storage
> operations might occur in either order or even concurrently.

What does 6.5.16 p 3 say?

"An assignment expression has the value of the left
operand after the assignment, but is not an lvalue."

^^^^^^^^^^^^^^^^^^^^ (my underlining)
and

"The side effect of updating the stored value of the
left operand after the assignment shall occur...."

^^^^^^^^^^^^^^^^^^^^ (my underlining)

So even though it's true that the two storage operations can occur in
either order, evaluating the lvalue of where to store in the rightmost
assignment must precede the assignment into 'p', since the value of
the rightmost assignment expression is needed to perform the leftmost
assignment. The side effects can occur in either order, but there is
a partial ordering on the evaluation of the operands, the performing
of the assignment operations, and the side effect of updating the
stored value(s).


> For example, the final value of p might be stored from the
> register caching the value lust loaded from q faster than
> the address of p->next can be computed in another of the
> concurrent processor pipelines, and it is possible that the
> new value of p would be the one used in both pipelines,
> especially if p is allocated to a register.

The value to be assigned into 'p' must be the value of the
sub-expression 'p->next' after the assignment into 'p->next'. So
'p->next' must have been computed (and assigned into, even if the side
effect of updating the stored value hasn't happened yet) before an
assignment can be made into 'p'.


> The only reason there is *any* exception to the sequence
> point requirement is that otherwise code like i = 2*i + 1
> would not be allowed, which would be a great inconvenience.

There is no question that expressions like 'i = 2*i + 1' are allowed.
The question is which expressions are prohibited. Another reasonable
view is that the cited language is intended to exclude things like

j = (i = 1) + i;

but the explicit language on partial ordering in assignments gives
well-defined semantics in cases like the compound assignment under
discussion.


> The phrasing used in the standard allows compilers to limit
> the scope of the exception so that better code can be
> generated overall.

Agreed. The question is, What is that limit? The language in the
standard doesn't make that clear.


> In more complex situations, programmers are advised to make
> use of sequence points whenever a particular sequencing of
> operations is needed. As the push accelerates for faster
> hardware and compiler-generated code, this will become
> increasingly important. I'm not saying that that is a good
> thing, just that it is inevitable.

Agreed. Again, the question is, when is that necessary? The language
in the standard doesn't make that clear.


> > So, here are my questions:
> > 1. What language should be drafted for the case where constructs
> > like the example given above _do_ cause undefined behavior?
>
> What do you mean, what language should be drafted? Were
> you appointed to some drafting committee, or what?

I'll ask the question another way. Hypothetically, if asked by the
committee to draft better language, what language might be reasonable
to suggest? What language would be clearer? Let's try a few
examples:

... shall be read only to determine ...
... shall be read only for determining ...
... shall be read only as part of determining ...
... shall not be read except to determine ...
... shall not be read unless used in determining ...

Of course, these examples are meant less as serious proposals than
they are meant to illustrate the nature of the problem. Consider
repositioning the word "only":

... shall only be read to determine ...
... shall be only read to determine ...
... shall be read only to determine ...
... shall be read to only determine ...
... shall be read to determine only ...

These examples have subtlely and sometimes not-so-subtlely different
meanings. To get back to the original statement, does the word "only"
in

... shall be read only to determine the value to be stored.

modify "to determine" or does it modify a compound phrase including
"read". To say this another way, if English were parenthesized,
should the fragment above be taken as

... shall be read {only to determine} the value to be stored.

or as

... shall be {read only {to determine the value to be stored}}.

which might be written more clearly in this style of bracketed English
thusly:

... shall be {read to determine the value to be stored} only.

My (devil's advocate) stance is that this latter meaning is more
consistent with other examples and other language in the standard.
Consider

i = a[i]++;

The prior value of 'i' is not "read {only to determine}" the next
value (in particular, it is also read to update the value of 'a[i]').
Should this example cause undefined behavior? Or, how about this:

extern volatile int v;

v = v + 1;

Suppose 'v' is some magic memory-mapped I/O register that when
read causes a light to go on on a circuit board. Here again,
the read of 'v' is not "{only to determine} the value to be stored."
Would anyone expect undefined behavior for this?

Now go back and read the last two examples under the wording

... shall be {read to determine the value to be stored} only.

and see if that seems more consistent with what most people would
expect.


> > 2. What language should be drafted for the case where constructs
> > like the example given above do _not_ cause undefined behavior
> > (which presumably means that they have the "obvious" or "naive"
> > semantics)?
> > 3. Should the next revision of the standard express (1) or (2)?
> > Or is there perhaps yet another alternative?
>
> I don't follow the logic. Surely a given instance either
> causes undefined behavior or it does not.

Sorry if this was unclear. Let me try re-phrasing.

1. What language would make cases like 'p = p->next = q;'
(and others like it) *clearly* evoke undefined behavior?

2. What language would make cases like 'p = p->next = q;'
(and others like it) *clearly* have well-defined semantics?

3. (see below)


> > I don't mean to re-start here the debate about what the current
> > standard says about this question (which debate may still be ongoing
> > in comp.lang.c). I meant only to ask questions about how the language
> > in the standard might be clarified, and what that language should
> > express.
>
> The standard already expresses what was deliberately decided.

I don't know what was decided. Regardless of what was decided and
whether it was deliberate or not, the existing language is unclear
both about is required and about what was intended (which lack of
clarity I hope has been made evident by my comments here).


> WG14 has already made progess on a couple of documents that
> are intended to clarify the sequence point rules (but not to
> change them). At least one of those may appear in a future
> revision of the standard.

That's good. Incidentally, that these documents are being worked
on is at least an implicit admission that the existing language
needs clarification.


Final note on question 3 -

Regardless of whether the current standard is clear or not or
what it expresses, I believe that 'p = p->next = q;' should
have well defined semantics. Here's my reasoning.

1. Client programmer writes program that has 'p = p->next = q;' in it.
Non-expected behavior causes program to go wrong and client programmer
to spend lots of time tracking down what he thinks is a compiler bug.

2. Client programmer files bug report with manufacturer: "Your
optimizer is screwing up code generation".

3. Manufacturer tech support reports back "that's not a bug, the C
standard says that's Undefined Behavior".

4. Client programmer thinks to himself "this guy is suffering from
recto-cranial inversion", and goes up the management chain to get
the problem fixed.

5. Manufacturer management respects the $$$'s spent by client and
tells developers, "standard be d***ed, OUR compiler is going to do
what the client expects."

If you want to test out my reasoning here, try asking your fellow
developers who are not regular comp.std.c or comp.lang.c readers
about what they expect for 'p = p->next = q;' and what they would
do if it "misbehaved."

Richard Bos

unread,
Jan 17, 2005, 11:55:24 AM1/17/05
to
David Hopwood <david.nosp...@blueyonder.co.uk> wrote:

It would have to be done at run time, though. You may be able to catch a
lot of UB at compile time, if you're lucky, but ultimately, you need to
either use run-time checks, or solve the halting problem. Mr Gordon was
talking about "detecting undefined behaviour constructs in C source",
which I take to mean at compile time.

Richard

Lawrence Kirby

unread,
Jan 17, 2005, 2:23:08 PM1/17/05
to
On Fri, 07 Jan 2005 22:59:23 +0000, Douglas A. Gwyn wrote:

> Tim Rentsch wrote:
>> p = p->next = q;
>
>> "Furthermore, the prior value shall be read only to determine the
>> value to be stored."
>> Although it's clear that the statement here is central to resolving
>> the question, it was not clear what conclusions are appropriate to
>> draw. The meta-conclusion is that the language in the standard
>> needs clarifying.
>
> It is clear that p->next reads p for a purpose other than
> determining the value to be stored,

That's not at all clear. In fact I believe that it is wrong.

> namely to compute an
> lvalue to store the value obtained by reading q, which
> (after possible type conversion) will also be the value
> stored in p.

What we can say is that p->next = q is the expression that is evaluated in
order to calculate the new value to be stored in p. I believe that is the
intent of the wording of the standard. Consider the following from 5.1.2.3:

"The semantic descriptions in this International Standard describe the
behavior of an abstract machine in which issues of optimization are
irrelevant."

and

"In the abstract machine, all expressions are evaluated as specified by
the semantics."

Now the semantics of = are 6.5.16.1p2

"In simple assignment (=), the value of the right operand is converted to
the type of the assignment expression and replaces the value stored in
the object designated by the left operand."

Here there is a clear sequencing, you get the value from the right convert
it to type on the left and replace the value. Note that this is in the
ABSTRACT MACHINE where issues of optimisation don't exist and expressions
are evaluated as per semantics. It is in the context of the abstract
machine where we determine behaviour (defined or undefined). Now we come
to 6.5p2:

"Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.


Furthermore, the prior value shall be read only to determine the value to
be stored."

This makes a specific statement that expressions that fail to meet
certain conditions are undefined, which would otherwise be defined. The
first sentence is fairly clear, the second is the interesting one; it is
that word "only". If you had an expression like:

y = (x = 2*x) + x;

that you wanted to make undefined how would you word the text? x is used
in the expession that calculates the new value to be stored in x, but it
is also used elsewhere which is the problem. So we say that it can *only*
be used to calculate the new value. This brings us to

p = p->next = q;

Is the second p here being used *only* to calculate the new value? It
certainly doesn't appear in a part of the expression that isn't being used
to calculate the new value. So a very reasonable answer to that is: yes.

But maybe there's another interpretation i.e. as you say above it is
used in the calculation of an lvalue used to store the value of q. But as
soon as we go down that road we're in deep trouble because the standard
defines no framework to determine what does or doesn't constitute other
use. Consider the following expressions, do they fall foul of the other
use concept?

x = x; /* Hopefuly not :-) */

x = x; /* x is volatile */

x = abs(x);

x = printf("%d", x);

x = myfunc(x); /* User defined function, may do anything */

Where do you draw the line? If you want all of these to be allowed how can
you justify the "only" requirement, especially for the last two cases?

Also consider whether this 2nd interpretation of "only" makes sense on
logical grounds. In p = p->next = q does it make sense to create
issues for p and p->next where they designate distinct objects? Is there
any prior art in this or anything that would make sense for the standard
to favour this interpretation?

Finally, it is easy to be thinking of the following in these discussions:

6.5.2.4p2

"The side effect of updating the stored value of the operand shall occur between
the previous and the next sequence point."

but note that is a semantic of the ++ and -- operators and is not
relevant to assignment. Indeed it is constraining on behaviour of the
implementation, not a source of undefined behaviour. The timing of
side-effects in the abstract machine are not in general as unconstrained
between sequence points as this text might suggest.

> Presumably the programmer intends that the value be stored in p->next
> *before* p receives the value obtained from q, but without any sequence
> point there is no guarantee that that will be what the code actually
> does; the storage operations might occur in either order or even
> concurrently.

It is important to realise that in the abstract machine sequence points
are not the only source of sequencing. Really the only things that aren't
sequenced are the order of evaluation of operands, and then not in all
cases. Side-effects of thing like ++ and -- can be considered sequenced in
the abstract machine but can easily fall foul of 6.5p2. The net result of
6.5p2 is that implementations have greater latitude in optimising
sequencing, but again we must focus on the abstract machine.

> For example, the final value of p might be stored from the
> register caching the value lust loaded from q faster than the address of
> p->next can be computed in another of the concurrent processor
> pipelines, and it is possible that the new value of p would be the one
> used in both pipelines, especially if p is allocated to a register.

You are talking about optimisation and code generation issues for an
implementation, again think abstract machine. This example is not based
on the wording of the standard. If we conclude from the standard that the
expression is well defined the implementation must simply make sure that
this situation cannot occur in the code it generates.

Lawrence

Charlie Gordon

unread,
Jan 18, 2005, 3:47:28 AM1/18/05
to
"Richard Bos" <r...@hoekstra-uitgeverij.nl> wrote in message
news:41ebed89....@news.individual.net...

I specifically asked if there were tools to detect such UB constructs from
source code analysis. Throwing in the halting problem to dismiss the issue
entirely is not very constructive. I of course understand that UB cannot be
analysed in the general case, but the discussion about p = p->next = q; does not
regard runtime UB but ambiguous semantics. The problem is independent of the
value of p or q.
If this expression invokes UB -- which remains to be proved -- that could be
detected at compile time. Are you aware of tools that do a better job at
detecting such problems ?

Chqrlie.

PS: I once had an unreal discussion with a reknowned professor in natural
language analysis :
me: I think we could vastly improve spelling checkers in English and French...
he: forget it, semantical analysis of natural language is a daunting task, after
20 years at it, I still cannot tell the subject from the object in many
pathological cases.


Brian Inglis

unread,
Jan 18, 2005, 2:13:33 PM1/18/05
to
On Thu, 13 Jan 2005 13:29:38 +0100 in comp.std.c, "Charlie Gordon"
<ne...@chqrlie.org> wrote:

>There are many subtle problems in the C language, there is nothing wrong with
>using tools to try and detect some of them. In fact it is stupid not to take
>advandage of compiler warnings (gcc -Wall) or more elaborate tools when
>available. Whether { p[i++] = i; } is more or less of a subtle issue than
> strcpy(p, p + 1); } or { p = p->next = q; } can be argued aimlessly, my
>question is : are there tools to detect these instances of undefined behaviour ?

gcc (3.4.3) with options -Wall -Wextra -ansi -pedantic reports:
"warning: operation on 'i' may be undefined" for the first
of those statements, but is silent on the other two.

David Hopwood

unread,
Jan 19, 2005, 12:47:46 PM1/19/05
to
Lawrence Kirby wrote:
> On Fri, 07 Jan 2005 22:59:23 +0000, Douglas A. Gwyn wrote:
>>Tim Rentsch wrote:
>>
>>> p = p->next = q;
>>
>>> "Furthermore, the prior value shall be read only to determine the
>>> value to be stored."
>>>Although it's clear that the statement here is central to resolving
>>>the question, it was not clear what conclusions are appropriate to
>>>draw. The meta-conclusion is that the language in the standard
>>>needs clarifying.
>>
>>It is clear that p->next reads p for a purpose other than
>>determining the value to be stored,
>
> That's not at all clear. In fact I believe that it is wrong.

I agree that it is not clear, but it is nevertheless true. See if you
agree with the reasoning in
<http://www.google.co.uk/groups?selm=AbGEd.60741%24C8.5809%40fe3.news.blueyonder.co.uk>.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

David Hopwood

unread,
Jan 19, 2005, 12:53:56 PM1/19/05
to

Right, but if a particular implementation defines all behaviour, then
for that implementation, the answer to whether there is any behaviour left
undefined by the implementation for a given C source program is trivial:
"no". It's only the corresponding question for behaviour undefined
*by the standard* that is undecidable.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Lawrence Kirby

unread,
Jan 19, 2005, 2:27:38 PM1/19/05
to

I don't have that article here, but I've grabbed it from Google:

>lawrenc...@ugs.com wrote:
>> David Hopwood <david.nosp...@blueyonder.co.uk> wrote:
>>

>>>Provided p and p->next are not aliased, the net result does not
>>>depend on that order.
>>
>> It does if the implicit &(p->next) is done after the assignment to p
>> rather than before.

The semantics of the abstract machine do not allow that.

>I stand corrected. The key clause here is C99 6.5.16 #4,
>"The order of evaluation of the operands [of assignment] is unspecified."

However this is not relevant. In the semantics of the abstract machine you
evaluate the operands of an operator and then execute the operator. The
operands may be evaluated in any order (subject to rules for operators
like &&) but the operator itself is executed afterwards. The storing of
the value in p is part of the execution of the operator, not the
evaluation of its operands. To justify this for the specific case of
assignment I quote from my previous article:

>>Consider the following from 5.1.2.3:
>>
>>"The semantic descriptions in this International Standard describe the
>> behavior of an abstract machine in which issues of optimization are
>> irrelevant."
>>
>>and
>>
>>"In the abstract machine, all expressions are evaluated as specified by
>> the semantics."
>>
>>Now the semantics of = are 6.5.16.1p2
>>
>>"In simple assignment (=), the value of the right operand is converted to
>> the type of the assignment expression and replaces the value stored in
>> the object designated by the left operand."
>>
>>Here there is a clear sequencing, you get the value from the right convert
>>it to type on the left and replace the value.

In the abstract machine you CANNOT store the new value in p before it has
been obtained by evaluating the right operand. You can certainly evaluate
the left operand before that, but all that does is produce an lvalue that
designates p's object. Evaluation of the left operand doesn't store a
value or generate a result, execution of the operator does that. So:

>In more detail:
>
>"p = p->next = q;" is equivalent to "p = (p->next = q);"
>(by right-associativity of =), which comprises the following events:
>
> (a) t1 = q
> (b) t2 = &(p->next) [unnecessary to break down further for this example]
> (c) *t2 = t1
> (d) p = t1
>
>with the following causal ordering:
>
> (a) precedes (c)
> (a) precedes (d)
> (b) precedes (c)
>
>but it is not possible to infer that (b) precedes (d), because of
>6.5.16 #4.

Right, 6.5.16p4 is not relevant to the issue because it only considers
evaluation order of the operands, and (d) is not part of operand
evaluation. But this ordering can be inferred from other parts of the
standard.

>Therefore the behaviour of "p = p->next = q;" and of "p->next = p = q;",
>are equivalent, and undefined due to C99 6.5 #2.

So this is an incorrect conclusion. Which is lucky because if a = b = c
was in any sense equivalent to b = a = c then C would have problems,
notably if a, b and c had different types.

>Specifically, between
>two consecutive sequence points, "p->next = p = q;" reads the value p (in

>the expression p->next) other than to determine the value to be stored
>into p.

My quoted discussion above is of course subject to 6.5p2. The discussion
boils down to how you read that. If you read "undefined behaviour" for an
expression there then sequencing discussion based on the rest of the
standard is not relevant. However in my previous article I tried to
explain why my interpretation is reasonable, and why the other
interpretation expressed in this thread is no, or at least has very
unfortunate for reasonable expressions which I doubt was the intention.

Lawrence

David Hopwood

unread,
Jan 19, 2005, 7:44:48 PM1/19/05
to
Lawrence Kirby wrote:
> In the abstract machine you CANNOT store the new value in p before it has
> been obtained by evaluating the right operand. You can certainly evaluate
> the left operand before that, but all that does is produce an lvalue that
> designates p's object. Evaluation of the left operand doesn't store a
> value or generate a result, execution of the operator does that. So:
>
>>In more detail:
>>
>>"p = p->next = q;" is equivalent to "p = (p->next = q);"
>>(by right-associativity of =), which comprises the following events:
>>
>> (a) t1 = q
>> (b) t2 = &(p->next) [unnecessary to break down further for this example]
>> (c) *t2 = t1
>> (d) p = t1
>>
>>with the following causal ordering:
>>
>> (a) precedes (c)
>> (a) precedes (d)
>> (b) precedes (c)
>>
>>but it is not possible to infer that (b) precedes (d), because of
>>6.5.16 #4.
>
> Right, 6.5.16p4 is not relevant to the issue because it only considers
> evaluation order of the operands, and (d) is not part of operand
> evaluation. But this ordering can be inferred from other parts of the
> standard.

Hmm. This can be argued either way, I think:

1. The RHS of "p = (p->next = q);" must be completely evaluated
before the outer assignment (by the *implicit* rule that evaluation
of operands precedes evaluation of an operator). Therefore we have
(c) precedes (d), and also (b) precedes (d) by transitivity.
or
2. The result of "(p->next = q)" needed to perform the outer assignment
is q, so only the evaluation of q precedes the outer assignment,
not the whole of "(p->next = q)".

I think this and similar examples (such as "a[a[i]] = i") demonstrate
clearly that the current situation is untenable. Because there is no
formal (or even semi-formal) model that specifies all the ordering
relations in one place, in principle answering any question like this
requires an exhaustive examination of the whole standard, and some
relations can only be inferred by unreliable application of "common sense",
otherwise known as reading the minds of the standard authors.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Brian Inglis

unread,
Jan 19, 2005, 7:47:11 PM1/19/05
to
On Wed, 19 Jan 2005 19:27:38 +0000 in comp.std.c, Lawrence Kirby
<lkn...@netactive.co.uk> wrote:

>On Wed, 19 Jan 2005 17:47:46 +0000, David Hopwood wrote:
>
>> Lawrence Kirby wrote:
>>> On Fri, 07 Jan 2005 22:59:23 +0000, Douglas A. Gwyn wrote:
>>>>Tim Rentsch wrote:
>>>>
>>>>> p = p->next = q;
>>>>
>>>>> "Furthermore, the prior value shall be read only to determine the
>>>>> value to be stored."

The inferences you may be missing is that lvalue operands l1=&p,
l2=&p->next, or rvalue operand r3=q, may be evaluated in any order,
and either = operator l1<-r3, or l2<-r3, can be performed in any order
once its operands are available.
So the abstract machine operation sequence: l1=&p, r3=q, l1<-r3,
l2=&p->next, l2<-r3, is valid, as are other sequences which satisfy
operand availability.
ISTM any cse or reg allocation is likely to produce an undesirable
result.

Brian Inglis

unread,
Jan 19, 2005, 7:54:02 PM1/19/05
to

Even if the implementation defines all behaviour, the results may not
be what is desired, or expected by the programmer, if UB is invoked.

Lawrence Kirby

unread,
Jan 20, 2005, 7:09:11 AM1/20/05
to
On Thu, 20 Jan 2005 00:44:48 +0000, David Hopwood wrote:

...

>> Right, 6.5.16p4 is not relevant to the issue because it only considers
>> evaluation order of the operands, and (d) is not part of operand
>> evaluation. But this ordering can be inferred from other parts of the
>> standard.
>
> Hmm. This can be argued either way, I think:
>
> 1. The RHS of "p = (p->next = q);" must be completely evaluated
> before the outer assignment (by the *implicit* rule that evaluation
> of operands precedes evaluation of an operator).

Implicit but deducible from the standard.

> Therefore we have
> (c) precedes (d), and also (b) precedes (d) by transitivity.
> or
> 2. The result of "(p->next = q)" needed to perform the outer assignment
> is q, so only the evaluation of q precedes the outer assignment,
> not the whole of "(p->next = q)".

This argument depends on an optimisation (i.e. getting the result of
an operation before fully performing that operation) and as such directly
violates 5.1.2.3p1

"The semantic descriptions in this International Standard describe the
behavior of an abstract machine in which issues of optimization are
irrelevant."

It also violates 5.1.2.3p3

"In the abstract machine, all expressions are evaluated as specified by

the semantics. An Actual implementation ...".

The sequence of events you describe above does not follow the semantics of
assignment as specified by the standard. The abstract machine is an
environment where you apply the rules exactly as written in the standard.
Those rules do allow latitude in some areas such as the order of
operand evaluation, and they also specify undefined behaviour for some
situations which gives an *implementation* added opportunity for
optimisation, but there is no similar license to execute an operator
before its operands are fully evaluated i.e. to override 5.1.2.3 p1 and
p3.

Also consider that interpretation of constructs in the standard is very
much syntax directed. In translation phase 7 you perform a syntax analysis
and for every construct identified by that you test the constraints for
that construct and apply the semantics. This is something that is so
fundamental that AFAIK the standard doesn't specify it. Perhaps it should
because it underpins the whole framework created by the standard.

> I think this and similar examples (such as "a[a[i]] = i") demonstrate
> clearly that the current situation is untenable.

This is undefined if initially i == a[i]. In that case a[i] is being
accessed to find the location of the object being assigned to, which is
something "other" than calculating the new value to be stored. If
you're saying that this is something that should be well defined, perhaps,
but at least there is no room for ambiguity (or I don't see it).

> Because there is no
> formal (or even semi-formal) model that specifies all the ordering
> relations in one place, in principle answering any question like this
> requires an exhaustive examination of the whole standard, and some
> relations can only be inferred by unreliable application of "common sense",
> otherwise known as reading the minds of the standard authors.

My interpretation pretty much resolves such ordering relations.

Lawrence

Wojtek Lerch

unread,
Jan 20, 2005, 9:02:00 AM1/20/05
to
"Lawrence Kirby" <lkn...@netactive.co.uk> wrote in message
news:pan.2005.01.20....@netactive.co.uk...

> On Thu, 20 Jan 2005 00:44:48 +0000, David Hopwood wrote:
>> 1. The RHS of "p = (p->next = q);" must be completely evaluated
>> before the outer assignment (by the *implicit* rule that evaluation
>> of operands precedes evaluation of an operator).
>
> Implicit but deducible from the standard.

How? The standard says explicitly that the side effect of an assignment
happens between the previous and the next sequence points. Obviously, the
previous sequence point is before the arguments have been evaluated. If
there's text somewhere that forbids the side effect to happen between the
previous sequence point and when the evaluation of the arguments completes,
why does 6.5.16p3 bother to mention the previous sequence point at all?

Lawrence Kirby

unread,
Jan 20, 2005, 12:39:16 PM1/20/05
to
On Thu, 20 Jan 2005 09:02:00 -0500, Wojtek Lerch wrote:

> "Lawrence Kirby" <lkn...@netactive.co.uk> wrote in message
> news:pan.2005.01.20....@netactive.co.uk...
>> On Thu, 20 Jan 2005 00:44:48 +0000, David Hopwood wrote:
>>> 1. The RHS of "p = (p->next = q);" must be completely evaluated
>>> before the outer assignment (by the *implicit* rule that evaluation
>>> of operands precedes evaluation of an operator).
>>
>> Implicit but deducible from the standard.
>
> How?

I've given the references in previous articles.

> The standard says explicitly that the side effect of an assignment
> happens between the previous and the next sequence points.

The actual text is:

"The side effect of updating the stored value of the left operand shall
occur between the previous and the next sequence point."

To me a "shall" clause in the standard is a restriction on permitted
behaviour not a relaxation. So this is saying that the side effect cannot
occur before the previous sequence point or after the next. It is *not*
saying that the side effect can extend to those boundaries even in the
presence of more restrictive considerations. The standard often uses "may"
as a terminology for relaxations.

> Obviously, the
> previous sequence point is before the arguments have been evaluated. If
> there's text somewhere that forbids the side effect to happen between the
> previous sequence point and when the evaluation of the arguments completes,
> why does 6.5.16p3 bother to mention the previous sequence point at all?

Good question, perhaps it is there for the sake of completeness. Also
consider that the concept of sequence points is employed in more than just
para 6.5p2 in the standard, notably in clause 5.1.2.3 which discusses real
implementations, volatile objects and signal handlers in addition to the
abstract machine. In the context of real implementations you can be more
flexible about timing of side-effects, but sequence points still give you
a framework to work from.

Whether you accept that or not does it make any sense to say that in
x = x + 1 that x could be written before it is read (in the abstract
machine)? Can you infer any distinction between this and the case of
p = p->next = q in this respect?

Lawrence


James Kuyper

unread,
Jan 20, 2005, 1:04:33 PM1/20/05
to
Lawrence Kirby wrote:
> On Thu, 20 Jan 2005 09:02:00 -0500, Wojtek Lerch wrote:
>
>
>>"Lawrence Kirby" <lkn...@netactive.co.uk> wrote in message
>>news:pan.2005.01.20....@netactive.co.uk...
>>
>>>On Thu, 20 Jan 2005 00:44:48 +0000, David Hopwood wrote:
>>>
>>>> 1. The RHS of "p = (p->next = q);" must be completely evaluated
>>>> before the outer assignment (by the *implicit* rule that evaluation
>>>> of operands precedes evaluation of an operator).
>>>
>>>Implicit but deducible from the standard.
>>
>>How?
>
>
> I've given the references in previous articles.

Yes, but you haven't made the argument that connects those references to
this issue.

...


> To me a "shall" clause in the standard is a restriction on permitted
> behaviour not a relaxation. So this is saying that the side effect cannot
> occur before the previous sequence point or after the next. It is *not*
> saying that the side effect can extend to those boundaries even in the
> presence of more restrictive considerations. The standard often uses "may"
> as a terminology for relaxations.

The issue, of course, is whether there's anything in the standard
imposing "more restrictive considerations"

> Whether you accept that or not does it make any sense to say that in
> x = x + 1 that x could be written before it is read (in the abstract
> machine)? Can you infer any distinction between this and the case of
> p = p->next = q in this respect?

Yes. The standard imposes no explicit requirement on the relative order
of the read and write of "x", just as it imposes no explicit requirement
on the relative order of the read and write of "p". The difference is a
result of the fact that there is an implicit requirement on the order
for "x", but not for "p".

The value of "x+1" can't be determined without first determining the
value of "x". "x" can't be assigned the correct new value until the
value of "x+1" has been determined. Therefore, assignment to "x"
necessarily follows the reading of "x", despite the fact that the
standard does not explicitly impose such a timing requirement.

The value of "p->next=q" can be determined without evaluating "p->next".
Therefore, there's no implicit timing requirement on the reading
and writing of "p", and in the absence of an explicit timing
requirement, an implementation is free to do them in either order.

Wojtek Lerch

unread,
Jan 20, 2005, 2:07:19 PM1/20/05
to
"Lawrence Kirby" <lkn...@netactive.co.uk> wrote in message
news:pan.2005.01.20....@netactive.co.uk...
> On Thu, 20 Jan 2005 09:02:00 -0500, Wojtek Lerch wrote:
>
>> "Lawrence Kirby" <lkn...@netactive.co.uk> wrote in message
>> news:pan.2005.01.20....@netactive.co.uk...
>>> On Thu, 20 Jan 2005 00:44:48 +0000, David Hopwood wrote:
>>>> 1. The RHS of "p = (p->next = q);" must be completely evaluated
>>>> before the outer assignment (by the *implicit* rule that
>>>> evaluation
>>>> of operands precedes evaluation of an operator).
>>>
>>> Implicit but deducible from the standard.
>>
>> How?
>
> I've given the references in previous articles.

You gave some references, but I don't recall any that support your claim
that the operands of an assignment must be fully evaluated before its side
effect can happen. Logic dictates that the side effect can't happen before
the object to be assigned to has been identified and the value to be stored
in it has been determined, but I can't see any text in the standard that
makes any stronger guarantees (except in the presence of additional sequence
points). I do see several places that seem to imply the opposite though --
for instance, in 6.5p3 ("Except as specified later (for the function-call
(), &&, ||, ?:, and comma operators), the order of evaluation of
subexpressions and the order in which side effects take place are both
unspecified").

BTW Do you also believe that if an assignment expression is an operand of a
bigger expression, its side effect must happen before its value is made
available to the operator whose it's an argument of? If not, why the
asymmetry?

>> The standard says explicitly that the side effect of an assignment
>> happens between the previous and the next sequence points.
>
> The actual text is:
>
> "The side effect of updating the stored value of the left operand shall
> occur between the previous and the next sequence point."
>
> To me a "shall" clause in the standard is a restriction on permitted
> behaviour not a relaxation. So this is saying that the side effect cannot
> occur before the previous sequence point or after the next. It is *not*
> saying that the side effect can extend to those boundaries even in the
> presence of more restrictive considerations. The standard often uses "may"
> as a terminology for relaxations.

OK, but what are the more restrictive considerations?

>> Obviously, the
>> previous sequence point is before the arguments have been evaluated. If
>> there's text somewhere that forbids the side effect to happen between the
>> previous sequence point and when the evaluation of the arguments
>> completes,
>> why does 6.5.16p3 bother to mention the previous sequence point at all?
>
> Good question, perhaps it is there for the sake of completeness. Also
> consider that the concept of sequence points is employed in more than just
> para 6.5p2 in the standard, notably in clause 5.1.2.3 which discusses real
> implementations, volatile objects and signal handlers in addition to the
> abstract machine. In the context of real implementations you can be more
> flexible about timing of side-effects, but sequence points still give you
> a framework to work from.

But all that 5.1.2.3 promises about side effects is that they don't move
across sequence points. There's no promise there that they can't move
across the boundaries of subexpressions, is there?

> Whether you accept that or not does it make any sense to say that in
> x = x + 1 that x could be written before it is read (in the abstract
> machine)? Can you infer any distinction between this and the case of
> p = p->next = q in this respect?

Of course: the value of "x+1" cannot be determined without reading the value
of "x". The value of "p->next = q" can be determined without reading the
value of "p".


Wojtek Lerch

unread,
Jan 20, 2005, 3:06:19 PM1/20/05
to
James Kuyper wrote:
> The value of "x+1" can't be determined without first determining the
> value of "x". "x" can't be assigned the correct new value until the
> value of "x+1" has been determined. Therefore, assignment to "x"
> necessarily follows the reading of "x", despite the fact that the
> standard does not explicitly impose such a timing requirement.
>
> The value of "p->next=q" can be determined without evaluating "p->next".
> Therefore, there's no implicit timing requirement on the reading and
> writing of "p", and in the absence of an explicit timing requirement, an
> implementation is free to do them in either order.

On the other hand, look at it this way:

You are arguing (and I agree) that there is no guarantee in the standard
that the lvalue-to-value conversion in "p=p->next=q" must produce the
old value of p rather than the new. What makes this case different from
"x=2-x"? If the "x" can refer to the new value of "x", it can be
determined without first determining the old value of "x", too: you just
need to solve the equation "x==2-x". (Of course, you can't apply this
trick to "x=x+1".)

James Kuyper

unread,
Jan 20, 2005, 4:06:13 PM1/20/05
to
Wojtek Lerch wrote:
> James Kuyper wrote:
>
>> The value of "x+1" can't be determined without first determining the
>> value of "x". "x" can't be assigned the correct new value until the
>> value of "x+1" has been determined. Therefore, assignment to "x"
>> necessarily follows the reading of "x", despite the fact that the
>> standard does not explicitly impose such a timing requirement.
>>
>> The value of "p->next=q" can be determined without evaluating
>> "p->next". Therefore, there's no implicit timing requirement on
>> the reading and writing of "p", and in the absence of an explicit
>> timing requirement, an implementation is free to do them in either order.
>
>
> On the other hand, look at it this way:
>
> You are arguing (and I agree) that there is no guarantee in the standard
> that the lvalue-to-value conversion in "p=p->next=q" must produce the
> old value of p rather than the new.

No, I'm saying that the lvalue-to-value conversion must always return
the current value. Other operations can occur either before or after the
conversion, so which value you get can be ambiguous, but it's value
can't just spontaneously change.

> ... What makes this case different from


> "x=2-x"? If the "x" can refer to the new value of "x", it can be
> determined without first determining the old value of "x", too: you just
> need to solve the equation "x==2-x". (Of course, you can't apply this
> trick to "x=x+1".)

The difference is that the value of the p->next=q is completely
independent of which value of 'p' you use. The value of "2-x" is not. An
implementation can't simply invent a result that allows it to rearrange
the order of operations; it could only do that if it already knew that x
had the value that allows such a rearrangement. For instance, "x=1; x =
2-x;" is one case where such a rearrangement would be legal.

The expression "x=2-x" is equivalent to the following breakdown:

int *t1 = &x; /* A */
int t2 = *t1; /* B */
int t3 = 2-t2; /* C */
*t1 = t3; /* D */

The way that each step depends on other steps creates the following
constraints (where T(Z) indicates the time at which step X is performed):

T(A) < T(B) because B needs the value of t1
T(B) < T(C) because C needs the value of t2
T(C) < T(D) because D needs the value of t3

There only one order consistent with these constraints: A B C D

p = (p->next=q), in contrast, can be broken down as follows:

struct something *t1 = &p; /* F */
struct something *t2 = t1->next; /* G */
*t2 = q; /* H */
*t1 = q; /* I */

The way these steps depend upon each other creates only the following
constraints:

T(F) < T(G) because G needs the value of t1
T(G) < T(H) because H needs the value of t2
T(F) < T(I) because I needs the value of t1

These constraints are consistent with any of the following orders:

F G H I
F G I H
F I G H

David Hopwood

unread,
Jan 20, 2005, 5:00:35 PM1/20/05
to

For any implementation, the results may not be what is desired, or
expected by the programmer, whether or not UB is invoked.

Deterministic behaviour, however, is generally easier to debug.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Christian Bau

unread,
Jan 20, 2005, 5:31:21 PM1/20/05
to
In article <41EFF2B1...@saicmodis.com>,
James Kuyper <kuy...@saicmodis.com> wrote:

> The value of "p->next=q" can be determined without evaluating "p->next".
> Therefore, there's no implicit timing requirement on the reading
> and writing of "p", and in the absence of an explicit timing
> requirement, an implementation is free to do them in either order.

Instead of

p = p->next = q;

take the statement

(*p1) = (*p2)->next = q;

The rules about undefined behavior usually give permission to the
implementation to read and write values between sequence points in
arbitrary order, except in those cases where logic tells us that a value
must be read in order to determine which value is written (and we have
to pretend that we have to read x in order to determine x*0, for
example).

But making p = p->next = q; or (*p1) = (*p2)->next = q; defined behavior
would mean that *p2 must be read _before_ q is written to *p1. There is
no logical requirement to do this.

Did anyone notice that this interpretation also means that assignment to
a volatile variable requires that the stored value is read immediately
after storing it? An assignment yields a value. The fact that for
example in x = 5; that value isn't used doesn't matter, it has to be
determined. It has to be determined by reading the value of x after
storing the number 5, as we have been told that it is not the value
stored that is the result of an assignment operator, but the value in
the variable, and how can we determine it without reading it? So the
abstract machine has to read x after storing the value 5. Usually, the
compiler can use the as-if rule to avoid reading x, but if x is
volatile, then the read _must_ be performed.

I very much doubt that that was the intention.

Christian Bau

unread,
Jan 20, 2005, 5:37:01 PM1/20/05
to
In article <35adr7F...@individual.net>,
"Wojtek Lerch" <Wojt...@yahoo.ca> wrote:

> Of course: the value of "x+1" cannot be determined without reading the value
> of "x". The value of "p->next = q" can be determined without reading the
> value of "p".

And the result of x = ++x; can be determined without problems (no matter
in which order the side effects happen, the total outcome should be that
x is increased by 1), but it is still undefined behavior, because the C
Standard says so. Logic (like "it must be defined behavior because the
compiler has no choice but to do what I think it must do") doesn't help;
if the C Standard says it is undefined than it is undefined.

You are saying "the logic that side effect X must happen before side
effect Y is broken. I agree completely. But it doesn't even matter,
because p is read not only to determine the result of the assignment
operator, but for another purpose as well, and therefore it is undefined
behavior.

Christian Bau

unread,
Jan 20, 2005, 5:41:16 PM1/20/05
to
In article <7SVHd.183834$48.4...@fe1.news.blueyonder.co.uk>,
David Hopwood <david.nosp...@blueyonder.co.uk> wrote:

Implementation defined behavior doesn't have to be deterministic either.
For example, an implementation could define that any uninitialised
variable of integer type has some unspecified value within the range of
that type, and that reading such a variable could yield a different
unspecified value every time.

int x, i;
for (i = 0; i < 1000; ++i) printf ("%d\n", x);

could print 1000 different numbers on such an implementation.

Douglas A. Gwyn

unread,
Jan 20, 2005, 7:00:10 PM1/20/05
to
Lawrence Kirby wrote:
> "In simple assignment (=), the value of the right operand is converted to
> the type of the assignment expression and replaces the value stored in
> the object designated by the left operand."
> Here there is a clear sequencing, ...

No, there are no "sequencing" words such as "then" in that spec.
And the order of the words in the (necessarily linear) text does
not imply the sequence; we could equivalently have said "the
value stored in the object designated by the left operand is
replaced by the value of the right operand converted to the type
of the assignment expression". The order of evaluation of the
operands is unspecified.

> "Furthermore, the prior value shall be read only to determine the value to
> be stored."

Which is what makes p = p->next = q undefined. The prior value
of the object designated by p is read (between sequence points)
for a second purpose, namely to determine the lvalue for the
target of the "inner" assignment, which usage of p isn't
determining the value to be stored in the "outer" assignment.
If it were p = q->next = p+1 then the requirement would be met.

> You are talking about optimisation and code generation issues for an

> implementation, ...

I was explaining how it could matter in practice.

Douglas A. Gwyn

unread,
Jan 20, 2005, 7:09:52 PM1/20/05
to
Christian Bau wrote:
> Did anyone notice that this interpretation also means that assignment to
> a volatile variable requires that the stored value is read immediately
> after storing it? An assignment yields a value. The fact that for
> example in x = 5; that value isn't used doesn't matter, it has to be
> determined. It has to be determined by reading the value of x after
> storing the number 5, as we have been told that it is not the value
> stored that is the result of an assignment operator, but the value in
> the variable, and how can we determine it without reading it? So the
> abstract machine has to read x after storing the value 5. Usually, the
> compiler can use the as-if rule to avoid reading x, but if x is
> volatile, then the read _must_ be performed.
> I very much doubt that that was the intention.

Actually the semantics of assignment don't require that the
lvalue be accessed after storing to determine the value of
the assignment expression. The value of the expression is
determined solely by the r.h.s.

Douglas A. Gwyn

unread,
Jan 20, 2005, 7:05:10 PM1/20/05
to
David Hopwood wrote:
> 2. The result of "(p->next = q)" needed to perform the outer assignment
> is q, so only the evaluation of q precedes the outer assignment,
> not the whole of "(p->next = q)".

Bingo! The value to be assigned in the "outer" expression
can be determined at the same time, or even before, the
value to be assigned in the "inner" subexpression, and the
type is known at compile time, so the outer assignment can
proceed concurrently with, or even before, the inner
assignment.

> I think this and similar examples (such as "a[a[i]] = i") demonstrate
> clearly that the current situation is untenable. Because there is no
> formal (or even semi-formal) model that specifies all the ordering
> relations in one place, in principle answering any question like this
> requires an exhaustive examination of the whole standard, and some
> relations can only be inferred by unreliable application of "common sense",
> otherwise known as reading the minds of the standard authors.

The problem isn't so much the spec as it is the desire on
the part of some readers to have it be otherwise.

David Hopwood

unread,
Jan 20, 2005, 8:43:40 PM1/20/05
to
Christian Bau wrote:
> James Kuyper <kuy...@saicmodis.com> wrote:
>
>>The value of "p->next=q" can be determined without evaluating "p->next".
>> Therefore, there's no implicit timing requirement on the reading
>>and writing of "p", and in the absence of an explicit timing
>>requirement, an implementation is free to do them in either order.
>
> Instead of
>
> p = p->next = q;
>
> take the statement
>
> (*p1) = (*p2)->next = q;
>
> The rules about undefined behavior usually give permission to the
> implementation to read and write values between sequence points in
> arbitrary order, except in those cases where logic tells us that a value
> must be read in order to determine which value is written (and we have
> to pretend that we have to read x in order to determine x*0, for
> example).

I'm not sure that there is any substantive difference between how
x*0 and (p->next = q) should be treated. In both cases the result *can*
be calculated without completely evaluating the expression, but there is
an argument that the ordering constraints should be determined as though
each expression had to be completely evaluated in order to calculate its
result.

I'm more and more convinced that the standard is ambiguous on this point,
and both interpretations are defensible.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Francis Glassborow

unread,
Jan 21, 2005, 4:32:52 AM1/21/05
to
In article <41F04850...@null.net>, Douglas A. Gwyn
<DAG...@null.net> writes

>Actually the semantics of assignment don't require that the
>lvalue be accessed after storing to determine the value of
>the assignment expression. The value of the expression is
>determined solely by the r.h.s.

That isn't exactly the way I would have expressed it. I certainly agree
that there is no requirement in C that the lhs be accessed after it has
been assigned to, however the type of the lhs is significant and can
modify the value determined by the rhs.


--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

David Hopwood

unread,
Jan 21, 2005, 10:23:38 AM1/21/05
to
Douglas A. Gwyn wrote:
> David Hopwood wrote:
>
>> 2. The result of "(p->next = q)" needed to perform the outer assignment
>> is q, so only the evaluation of q precedes the outer assignment,
>> not the whole of "(p->next = q)".
>
> Bingo! The value to be assigned in the "outer" expression
> can be determined at the same time, or even before, the
> value to be assigned in the "inner" subexpression, and the
> type is known at compile time, so the outer assignment can
> proceed concurrently with, or even before, the inner
> assignment.

Yes, but you've just picked the interpretation you prefer to comment on,
and not addressed the other possible interpretation at all. Consider that
essentially the same argument could be used, *incorrectly*, to conclude
that for "x*0" the evaluation of x need not precede the use of the result
of x*0 in the abstract machine.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Wojtek Lerch

unread,
Jan 21, 2005, 10:49:24 AM1/21/05
to
"James Kuyper" <kuy...@saicmodis.com> wrote in message
news:41F01D45...@saicmodis.com...

> Wojtek Lerch wrote:
>> You are arguing (and I agree) that there is no guarantee in the standard
>> that the lvalue-to-value conversion in "p=p->next=q" must produce the old
>> value of p rather than the new.
>
> No, I'm saying that the lvalue-to-value conversion must always return the
> current value. Other operations can occur either before or after the
> conversion, so which value you get can be ambiguous, but it's value can't
> just spontaneously change.

Of course; the conversion produces one value, and once it's been produced,
that's its value. The conversion returns the current value of the object at
the time of the coversion. All I'm saying is that since the side effect of
the assignment is one of the "other operations" and it writes to the object
at an unspecified time between the surrounding sequence points, the object
is modified either before or after the conversion reads the "current" value
from it, and therefore the "current" value that the conversion produces may
be either the value that the object had at the previous sequence point (the
"old" value), or the value that the assignment puts in it (the "new" value).

In other words, which of those two values you get is ambiguous.

In other words, there's no guarantee that the conversion must produce the

old value of p rather than the new.

That's a consequence of allowing the side effect of an assignment to happen
before its right argument has been completely evaluated.

>> ... What makes this case different from
>> "x=2-x"? If the "x" can refer to the new value of "x", it can be
>> determined without first determining the old value of "x", too: you just
>> need to solve the equation "x==2-x". (Of course, you can't apply this
>> trick to "x=x+1".)
>
> The difference is that the value of the p->next=q is completely
> independent of which value of 'p' you use. The value of "2-x" is not. An

But what is the general rule? You're saying that the side effect of the
assignment is allowed to happen before the evaluation of its right argument
is complete (for instance, before "p->next" has read "p"), but not before
certain subexpressions (for instance, the "x" in "2-x") are evaluated. What
is the rule that determine which subexpressions must happen before the side
effect and where is it in the standard?

Is it that the only subexpressions that can be evaluated after the side
effect are the ones whose value isn't affected by the side effect? No,
because the value of "p" in "p->next" does depend on the side effect.

Is it that the only subexpressions that can be evaluated after the side
effect are the ones whose value doesn't affect the value of the assignment?
How exactly would you define "doesn't affect" -- do you think that "i = j*0"
is allowed to store zero in i before reading the value of j, even if i and j
are declared volatile?

Or is it only the ones that read the object being assigned to for a purpose
other than to determine the value assigned to it, making the whole thing
undefined, like "p = p->next = q"? :-)

> implementation can't simply invent a result that allows it to rearrange
> the order of operations; it could only do that if it already knew that x
> had the value that allows such a rearrangement. For instance, "x=1; x =
> 2-x;" is one case where such a rearrangement would be legal.

No: the order of operations is unspecified, and therefore the implementation
is free to rearange them as it wants to, as long as the result is consistent
with the requirements of the standard.

As far as the expression "x=2-x" is concerned, I find the following the
requirements in the standard:

A A new value is written to x at some completely unspecified time (a)
between the surrounding sequence points.
B This new value of x is the same as the value obtained by:
(b) reading the "current" value of x, and then
(c) subtracting it from 2.
C The value of the expression is the same as the new value of x.

I don't see any words in the standard that require that (b) must happen
before (a), or that the value assigned to x must depend on the "old" value
of x.

(Of course, everybody knows that it *means* to require it. I'm not
seriously arguing that it's OK for "x=2-x" to always assign 1 to x, or for
"y=y" to store a random value in y -- only that the words in the standard
are not saying quite clearly what they were meant to say. If they were
meant to say that the evaluation (but not necessarily the side effects) of
the right operand of an assignment must be complete before its side effect
can take place, then I don't understand why the previous sequence point is
mentioned. If the side effect was meant to be allowed to happen before the
evaluation is complete, then I don't see any text that makes certain
subexpressions an exception.

> The expression "x=2-x" is equivalent to the following breakdown:
>
> int *t1 = &x; /* A */
> int t2 = *t1; /* B */
> int t3 = 2-t2; /* C */
> *t1 = t3; /* D */
>
> The way that each step depends on other steps creates the following
> constraints (where T(Z) indicates the time at which step X is performed):
>
> T(A) < T(B) because B needs the value of t1
> T(B) < T(C) because C needs the value of t2
> T(C) < T(D) because D needs the value of t3

No; the value that D stores in *t1 must be the same as the value that C
stores in t3, but there is no requirement that C must happen before D. The
expected way to make sure that those two values are identical is by doing C
first and then using its result in D, but pedantically speaking, no words in
the standard require that. "The order of evaluation of subexpressions and
the order in which side effects take place are both unspecified." If you
can make the two values identical by picking a value that will cause C to
store that value in t3 provided that D stores it in *t1 before B reads it
back, I don't see which requirement of the standard that violates.

James Kuyper

unread,
Jan 21, 2005, 11:40:28 AM1/21/05
to
Wojtek Lerch wrote:
> "James Kuyper" <kuy...@saicmodis.com> wrote in message
> news:41F01D45...@saicmodis.com...
>
>>Wojtek Lerch wrote:
...

>>>... What makes this case different from
>>>"x=2-x"? If the "x" can refer to the new value of "x", it can be
>>>determined without first determining the old value of "x", too: you just
>>>need to solve the equation "x==2-x". (Of course, you can't apply this
>>>trick to "x=x+1".)
>>
>>The difference is that the value of the p->next=q is completely
>>independent of which value of 'p' you use. The value of "2-x" is not. An
>
>
> But what is the general rule? You're saying that the side effect of the
> assignment is allowed to happen before the evaluation of its right argument
> is complete (for instance, before "p->next" has read "p"), but not before
> certain subexpressions (for instance, the "x" in "2-x") are evaluated. What
> is the rule that determine which subexpressions must happen before the side
> effect and where is it in the standard?

If you need certain information before you can determine the value that
must be stored in a given location, then all operations that must be
carried out to retrieve that information must be completed before that
side-effect can occur. In particular, if you need to retrieve a value
from the same location that you're about to store into, in order to
determine the new value to be stored in it, the value you retrieve must
be the one that was in there before the new value was stored.

I agree that the standard does not explicitly specify this. This is,
however, one of the few areas where I agree with those who say that it
is obvious enough from the things that the standard does say, that no
change in the wording is called for. This is a necessary consequence of
the description of what the value of each expression is.

> Is it that the only subexpressions that can be evaluated after the side
> effect are the ones whose value isn't affected by the side effect? No,
> because the value of "p" in "p->next" does depend on the side effect.
>
> Is it that the only subexpressions that can be evaluated after the side
> effect are the ones whose value doesn't affect the value of the assignment?
> How exactly would you define "doesn't affect" -- do you think that "i = j*0"
> is allowed to store zero in i before reading the value of j, even if i and j
> are declared volatile?

It's not clear to me what the standard does or does not require in cases
like j*0. The key difference between p->next=q and j*0 is that the fact
that p->next doesn't need to be evaluated is robust; it's true
regardless of the values of 'p' and 'q', it's an intrinsic feature of
the assignment operator. On the other hand, the fact that 'j' doesn't
need to be evaluated is a special case that depends upon the value of
the second operator. Consider the following code change:

Original code:
// count happens to have a value of 0, for reasons that the compiler
// can't anticipate.
i = j*count;

New code:
#define COUNT 0
i = j*COUNT;

I don't think such a change should change the validity of the code.
However, it's not clear to me that the standard says anything that
allows this distinction to be made.

It's mentioned because it's the only other constraint upon when the
assignment can occur. The primary constraint is the one that is not
explicitly mentioned because it's too obvious: you have to have
determined what the new value is before you can assign it to anything,
and the new value has to be assigned before it can be used to determine
the value of anything else. In one my few "Doug Gwynn"ish moments, I
don't think the standard needs to be modified to make something so
obvious more explicit.

>>The expression "x=2-x" is equivalent to the following breakdown:
>>
>>int *t1 = &x; /* A */
>>int t2 = *t1; /* B */
>>int t3 = 2-t2; /* C */
>>*t1 = t3; /* D */
>>
>>The way that each step depends on other steps creates the following
>>constraints (where T(Z) indicates the time at which step X is performed):
>>
>>T(A) < T(B) because B needs the value of t1
>>T(B) < T(C) because C needs the value of t2
>>T(C) < T(D) because D needs the value of t3
>
>
> No; the value that D stores in *t1 must be the same as the value that C
> stores in t3, but there is no requirement that C must happen before D.

The sole purpose of t3 is to communicate between C and D. The sole
purpose of C is to set the value of t3, and the sole purpose of D is to
retrieve the set value. If t3 hasn't been set yet, D doesn't have
anything to do. If D has been already done, there's no point in doing C.
I have no idea what it would mean for D to be performed before C. I
think that whatever it is that you actually mean, it's best described by
saying that D' can occur before C', where D' and C' are each different
from D and C, respectively.

> ... The


> expected way to make sure that those two values are identical is by doing C
> first and then using its result in D, but pedantically speaking, no words in
> the standard require that. "The order of evaluation of subexpressions and
> the order in which side effects take place are both unspecified." If you
> can make the two values identical by picking a value that will cause C to
> store that value in t3 provided that D stores it in *t1 before B reads it
> back, I don't see which requirement of the standard that violates.

I have no idea what you're saying there; except that it seems to be
senseless; I assume you didn't mean it to be senseless, so I'm probably
misunderstanding it. Could you provide a breakdown into a sequence of
C-like statements, similar to the ones I wrote, to express what you mean?

Douglas A. Gwyn

unread,
Jan 21, 2005, 2:14:01 PM1/21/05
to
James Kuyper wrote:

> Wojtek Lerch wrote:
> > Is it that the only subexpressions that can be evaluated after the side
> > effect are the ones whose value doesn't affect the value of the assignment?
> > How exactly would you define "doesn't affect" -- do you think that "i = j*0"
> > is allowed to store zero in i before reading the value of j, even if i and j
> > are declared volatile?

In the abstract machine, the (sub)expression j*0 performs
a read access of the value represented in j according to
j's type. Obviously in an actual compiler a very common
optimization will eliminate the actual access, *except*
when j has volatile qualification.

I'm not sure why this isn't considered quite clear.

Douglas A. Gwyn

unread,
Jan 21, 2005, 2:22:24 PM1/21/05
to

I try not to spend a lot of time analyzing what is wrong
with bogus arguments.. In the abstract machine, the value
of x*0 is not determined until x has been accessed, but
in the current context it doesn't matter anyway; what
matters is that in x=x*0 the previous value of x is
accessed, if at all, only to determine the result to be
stored into the lvalue x, not for other nefarious purposes.

Douglas A. Gwyn

unread,
Jan 21, 2005, 2:17:07 PM1/21/05
to
Francis Glassborow wrote:
> That isn't exactly the way I would have expressed it. I certainly agree
> that there is no requirement in C that the lhs be accessed after it has
> been assigned to, however the type of the lhs is significant and can
> modify the value determined by the rhs.

The standard says that the value of the assignment expression
is the value of the r.h.s., converted to an appropriate type.
Note that although that value is copied *into* the lvalue, the
value of the expression is not obtained *from* the lvalue,
which is what matters in the present context.

James Kuyper

unread,
Jan 21, 2005, 3:08:47 PM1/21/05
to
Douglas A. Gwyn wrote:
...

> In the abstract machine, the (sub)expression j*0 performs
> a read access of the value represented in j according to
> j's type. Obviously in an actual compiler a very common
> optimization will eliminate the actual access, *except*
> when j has volatile qualification.
>
> I'm not sure why this isn't considered quite clear.

What isn't clear is the timing of the read of "j" relative to the store
to "i". The same argument I'm using for p->next=q, if applicable to j*0,
would imply that the reading of "j" could be deferred until after the
write to "i" - which matters, of course, only if "i" is also volatile. I
think the argument I'm using is sound, but I'm not sure it's a good idea
for it to be applicable to j*0.

Wojtek Lerch

unread,
Jan 21, 2005, 3:56:18 PM1/21/05
to
"Douglas A. Gwyn" <DAG...@null.net> wrote in message
news:41F15479...@null.net...

It is. What is unclear is whether the write access to j can happen before
the read access to i, and why.


Dave Hansen

unread,
Jan 21, 2005, 4:22:52 PM1/21/05
to
On Fri, 21 Jan 2005 19:17:07 GMT, "Douglas A. Gwyn" <DAG...@null.net>
wrote:

[...]


>The standard says that the value of the assignment expression
>is the value of the r.h.s., converted to an appropriate type.

Actually, it says "An assignment expression has the value of the left
operand after the assignment, but is not an lvalue." (6.5.16p3)

>Note that although that value is copied *into* the lvalue, the
>value of the expression is not obtained *from* the lvalue,
>which is what matters in the present context.

The words of the standard seem to contradict you. The value is not
itself an lvalue, but gets its value from the lhs.

Regards,

-=Dave
--
Change is inevitable, progress is not.

Dave Hansen

unread,
Jan 21, 2005, 4:26:29 PM1/21/05
to

I presume you're talking about the expression "i = j*0;"

From my reading of the standard, the abstract machine will read j,
multiply it by zero, and store the result in i. Causality requires
the abstract machine to read j before storing 0 in i. If i and j are
both volatile, that order must hold. If only j is volatile, the read
and write may occur in either order (but both must occur). If j is
not volatile, the read need not occur at all. However, if j is an
expression with side effects, that expresion must be evaluated.

But I also hold that, given "a = b = 0;" and volatile b, that zero
must be written to b, b must be read, and the result of the read
stored in a. I know I'm in the minority (of maybe, oh, one ;-) on
that, and I don't really like the conclusion anyway, so I won't quote
the standard to support my position, and it's not really worth a
fight. But it seems to me to be what the standard _says_, regardless
of the intent.

Douglas A. Gwyn

unread,
Jan 21, 2005, 4:40:23 PM1/21/05
to
Wojtek Lerch wrote:
> It is. What is unclear is whether the write access to j can happen before
> the read access to i, and why.

(I think you mean the other way around.) The order of those
accesses is unspecified (no s.p.s in that tree, and no other
sequence-implying wording in the specs). Thus if both
variables are volatile-qualified there shall be one access
of each, but in an unspecified order.

I'm still not sure why anybody would be bothered by this.
i=i*0 works. i=j*0 works.

James Kuyper

unread,
Jan 21, 2005, 5:43:54 PM1/21/05
to

Within the C standard, there's no obvious reason to care. The C standard
provides rules for volatile, but doesn't actually define anything to
usefully be volatile. However, the typical uses for volatile are such
that the relative order of the read and write in i=j*0 might be
important for reasons that are outside the scope of the C standard.
Therefore, it can be important to know whether or not the C standard
guarantees that order, and it might be inconvenient if it doesn't.

David Hopwood

unread,
Jan 21, 2005, 6:07:22 PM1/21/05
to

What is unclear is whether the read of j must occur before the write to i,
and why. The answer to "why" has a bearing on the "p = p->next = q;" example
(especially if the reason is that evaluation of the operands of an
expression must precede evaluation of the expression in the abstract
machine).

--
David Hopwood <david.nosp...@blueyonder.co.uk>

David Hopwood

unread,
Jan 21, 2005, 6:19:20 PM1/21/05
to
Douglas A. Gwyn wrote:
> Lawrence Kirby wrote:
>
>>"In simple assignment (=), the value of the right operand is converted to
>> the type of the assignment expression and replaces the value stored in
>> the object designated by the left operand."
>>Here there is a clear sequencing, ...
>
> No, there are no "sequencing" words such as "then" in that spec.

Now I see why having a formal specification for a programming language is
essential, rather than just a good idea.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Wojtek Lerch

unread,
Jan 21, 2005, 9:42:58 PM1/21/05
to
"Douglas A. Gwyn" <DAG...@null.net> wrote in message
news:41F176C7...@null.net...

> Wojtek Lerch wrote:
>> It is. What is unclear is whether the write access to j can happen
>> before
>> the read access to i, and why.
>
> (I think you mean the other way around.) The order of those

Um, yes, sorry. The write access to i, before the read access to j.

> accesses is unspecified (no s.p.s in that tree, and no other
> sequence-implying wording in the specs). Thus if both
> variables are volatile-qualified there shall be one access
> of each, but in an unspecified order.

Now compare the two cases:

In "i = j * 0", the compiler is allowed to guess the value that the
evaluation of the right operand of the assignment will produce, and store it
in i before the evaluation of the operand even begins. The compiler can do
that despite the fact that writing to i may affect the value of a certain
subexpression in "j*0" (since i and j are volatile, they may be some sort of
hardware registers, and the value read from j may depend on whether j has
been written to yet).

In "i = 2 - i", the compiler could also guess the value that the evaluation
of the right operand of the assignment will produce, provided that it's
stored in i first. But it is not allowed to do that.

Apparently, there is a rule that dictates that the lvalue-to-value
conversion in the second example must happen before "i" is written to, or,
in other words, that the "i" is necessarily referring to the value of "i"
before the object is modified. Why doesn't the same rule disallow writing
to "i" before "j" is read in the first example? What exactly is the rule
that decides which subexpressions of the assignment must be evaluated before
the side effect of the assignment happens?


lawrenc...@ugs.com

unread,
Jan 22, 2005, 6:09:27 PM1/22/05
to
Christian Bau <christ...@cbau.freeserve.co.uk> wrote:
>
> Did anyone notice that this interpretation also means that assignment to
> a volatile variable requires that the stored value is read immediately
> after storing it?

Yes, most notably, the developers of GCC who dutifully implemented it
that way. But it's not true. The intent is that it's the value
*immediately* after the assignment -- before anything has an opportunity
to change it -- that's the result, so there's no need to reread the
variable. The somewhat obscure wording was to deal with things like
assigning to a bit field where the resulting value depends on more than
just the overt type of the destination.

-Larry Jones

I think your train of thought is a runaway. -- Calvin's Mom

Christian Bau

unread,
Jan 22, 2005, 7:12:42 PM1/22/05
to
In article <9t0bc2-...@jones.homeip.net>, lawrenc...@ugs.com
wrote:

Now that is one really helpful post. The "somewhat obscure wording" is
the whole source of the confusion - the words "An assignment expression
has the value of the left operand after the assignment."

Since in an assignment lhs = rhs; the rhs is converted to the type of
the lhs, it would have seemed much easier to say "The value of an
assignment expression is the value stored". As a consequence, the result
of the expression would have been available without evaluating lhs at
all, which means in "p->next = q; " p would _not_ be evaluated in order
to determine the value of the expression (p->next = q). So why did the C
Standard say something different?

The argument here was that because of the wording, the C Standard
indicates that rhs is converted to the type of the lhs, then stored,
then read again to find the value of the assignment statement. However,
you just gave a completely different reason for the strange wording: For
example, in an assignment p->i = 15; where i is a bitfield with 3 bits,
the value stored is 15. But because i has only storage for 3 bits, the
new value in p->i is actually 7. So the strange wording in the C
Standard is meant to state that in this case the value of the statement
is not 15 but 7. It is _not_ meant as an indication that the store must
actually happen and the value must be taken from the stored value to
determine the value of the expression; I can figure out that the new
value in p->i = 15; is 7 by just looking at the type of p->i and the
type of *p, without evaluating p at all.

So in "p->next = q;" p is _not_ evaluated to determine the value of the
expression (but in order to determine which object is modified by the
side effect of the expression). Consequently, "p = p->next = q; " has
undefined behavior, as the object p is both modified, and evaluated for
a different purpose than to determine the new value.

Wojtek Lerch

unread,
Jan 24, 2005, 12:23:37 PM1/24/05
to
"James Kuyper" <kuy...@saicmodis.com> wrote in message
news:41F1307C...@saicmodis.com...

Yes, IF you indeed need to retrieve it before the new value is stored. But,
as far as I can tell, all that the standard says about the order of those
two operations is that it is unspecified. I don't see a requirement that
the value you store in "x" must be produced by, and stored after, evaluating
the expression "2-x", only that they must be the same value. I can't see
any words in the standard saying that the "x" in "2-x" must refer to the
value that x had before the assignment, either.

A lot of programming languages have a simple general rule: if the there are
subexpressions on the right side of an assignment that name the same object
as the left side, they refer to the old value of the object. Every
programmer's brain is trained to recognize that rule as natural and
intuitive. But if you believe that C allows the "p" in "p->next" from our
example to refer to the new value of p, that means that the above rule is
not part of C, and therefore you can't use it to prove that the "x" in "2-x"
necessarily refers to the old value of x.

C, apparently, has a weaker rule, one that treats the "p" in "p->next"
differently from the "x" in 2-x". Problem is, I don't see any words in the
standard that explain how much weaker the rule is exactly, except the part
about the order of side effects being completely unspecified. The example
of "x=2-x" demonstrates that it wasn't really meant to be *completely*
unspecified, and that there must be a rule that draws a line somewhere
between p->next and 2-p. What makes me uncomfortable is that I could make
up a few slightly different variants of such a rule that draw the line in
slightly different places, and the standard doesn't seem to help me guess
which one is the correct one.

> I agree that the standard does not explicitly specify this. This is,
> however, one of the few areas where I agree with those who say that it is
> obvious enough from the things that the standard does say, that no change
> in the wording is called for. This is a necessary consequence of the
> description of what the value of each expression is.

The description of what the value of each expression is includes, among
other things, the facts that the lvalue-to-value conversion takes the
*current* value of the object and that the order of side effects and
evaluation of subexpressions is unspecified. What the standard *does* say
implies that it's OK to guess and store the value before the expression that
calculates it is evaluated. In order to make "x=2-x" do the expected thing,
you need to *add* an assumption that the standard forgets to mention,
namely, that the lvalue-to-value conversion in "2-x" must produce the old
value of x, and therefore it must be evaluated before the new value is
stored. You can argue that this extra assumption is so obvious that it
doesn't need to be mentioned; I agree in the case of 2-x, but have doubts in
such cases as p->next or i=j*0.

My point is that since the unspecified order of operations sometimes
produces results that are counter-intuitive to many programmers, it doesn't
sound like a good idea to rely on people's feeling about what is obvious
enough to determine exactly how unspecified the order was really meant to
be.

>> Is it that the only subexpressions that can be evaluated after the side
>> effect are the ones whose value isn't affected by the side effect? No,
>> because the value of "p" in "p->next" does depend on the side effect.
>>
>> Is it that the only subexpressions that can be evaluated after the side
>> effect are the ones whose value doesn't affect the value of the
>> assignment? How exactly would you define "doesn't affect" -- do you think
>> that "i = j*0" is allowed to store zero in i before reading the value of
>> j, even if i and j are declared volatile?
>
> It's not clear to me what the standard does or does not require in cases
> like j*0. The key difference between p->next=q and j*0 is that the fact
> that p->next doesn't need to be evaluated is robust; it's true regardless
> of the values of 'p' and 'q', it's an intrinsic feature of the assignment
> operator. On the other hand, the fact that 'j' doesn't need to be
> evaluated is a special case that depends upon the value of the second
> operator. Consider the following code change:

How would you translate "robust" and "intrinsic" to standardese? ;-)

> Original code:
> // count happens to have a value of 0, for reasons that the compiler
> // can't anticipate.
> i = j*count;
>
> New code:
> #define COUNT 0
> i = j*COUNT;
>
> I don't think such a change should change the validity of the code.

No, but it could change the order of operations. It's not unusual that
replacing a variable with a constant may cause a compiler to change the
evaluation order of subexpressions.

...


>> (Of course, everybody knows that it *means* to require it. I'm not
>> seriously arguing that it's OK for "x=2-x" to always assign 1 to x, or
>> for "y=y" to store a random value in y -- only that the words in the
>> standard are not saying quite clearly what they were meant to say. If
>> they were meant to say that the evaluation (but not necessarily the side
>> effects) of the right operand of an assignment must be complete before
>> its side effect can take place, then I don't understand why the previous
>> sequence point is mentioned.
>
> It's mentioned because it's the only other constraint upon when the
> assignment can occur.

No, read it again: IF the standard was meant to say that the evaluation of
the right operand must be complete before its value is stored in the left
operand, THEN the requirement about the previous sequence point is
redundant. But we do agree that in some cases the store can happen before
the evaluation is complete, don't we?

> The primary constraint is the one that is not explicitly mentioned
> because it's too obvious: you have to have determined what the new value
> is before you can assign it to anything, and the new value has to be
> assigned before it can be used to determine the value of anything else. In
> one my few "Doug Gwynn"ish moments, I don't think the standard needs to be
> modified to make something so obvious more explicit.

But if you can determine the value of the right operand without completely
evaluating it, you can store it in the object before the operand is
completely evaluated, even if that may change the results of some
subexpressions of the operand, like it does in "p = p->next = q" or in "i =
j * 0".

You seem to be saying that it's OK to do that, but only if the changed
subexpressions don't affect the final value of the assignment. And only if
the reason they can't affect it is "robust" and based on "intrinsic"
properties of the context they're in. That's the part that I find less than
obvious.

>>>The expression "x=2-x" is equivalent to the following breakdown:
>>>
>>>int *t1 = &x; /* A */
>>>int t2 = *t1; /* B */
>>>int t3 = 2-t2; /* C */
>>>*t1 = t3; /* D */
>>>
>>>The way that each step depends on other steps creates the following
>>>constraints (where T(Z) indicates the time at which step X is performed):
>>>
>>>T(A) < T(B) because B needs the value of t1
>>>T(B) < T(C) because C needs the value of t2
>>>T(C) < T(D) because D needs the value of t3
>>
>> No; the value that D stores in *t1 must be the same as the value that C
>> stores in t3, but there is no requirement that C must happen before D.
>
> The sole purpose of t3 is to communicate between C and D. The sole purpose
> of C is to set the value of t3, and the sole purpose of D is to retrieve
> the set value. If t3 hasn't been set yet, D doesn't have anything to do.
> If D has been already done, there's no point in doing C. I have no idea
> what it would mean for D to be performed before C. I think that whatever
> it is that you actually mean, it's best described by saying that D' can
> occur before C', where D' and C' are each different from D and C,
> respectively.

int t4 = what_C_assigns_to_t3(); *t1 = t4; /* D' */
ASSERT( t4 == t3 ); /* E */

If the compiler can come up with a what_C_assigns_to_t3() function that
doesn't access t3, step D' can be executed before C. If not, the default
that's guaranteed to always work is

int value_that_C_assigns_to_t3() { return t3; }


Tim Rentsch

unread,
Jan 25, 2005, 5:09:24 PM1/25/05
to
"Douglas A. Gwyn" <DAG...@null.net> writes:

> Lawrence Kirby wrote:
> > "In simple assignment (=), the value of the right operand is converted to
> > the type of the assignment expression and replaces the value stored in
> > the object designated by the left operand."
> > Here there is a clear sequencing, ...
>
> No, there are no "sequencing" words such as "then" in that spec.

In 6.5.16 p 3 there is the statement

An assignment expression has the value of the left operand

after the assignment, but is not an lvalue.

Are you claiming that "after" is not a sequencing word?

Note the wording: not "the value that is to be assigned to the left
operand" but "the value of the left operand".

The semantics required here seem clear and unambiguous: a value is
assigned into the left operand, and *then* the value of an assignment
expression is the value of the left operand. That's what "after"
means.

The side effect of updating the stored value of the left operand might
not have occurred before the value of the assignment expression is
delivered; updating is guaranteed only before the next sequence
point. But the assignment - which determines what value is to be
stored and where - happens before the value is produced.

Tim Rentsch

unread,
Jan 25, 2005, 5:11:40 PM1/25/05
to
"Douglas A. Gwyn" <DAG...@null.net> writes:

> David Hopwood wrote:
>
> > I think this and similar examples (such as "a[a[i]] = i") demonstrate
> > clearly that the current situation is untenable. Because there is no
> > formal (or even semi-formal) model that specifies all the ordering
> > relations in one place, in principle answering any question like this
> > requires an exhaustive examination of the whole standard, and some
> > relations can only be inferred by unreliable application of "common sense",
> > otherwise known as reading the minds of the standard authors.
>
> The problem isn't so much the spec as it is the desire on
> the part of some readers to have it be otherwise.

No disrespect intended, but the non-convergence of opinion
through this thread seems to demonstrate otherwise.

Christian Bau

unread,
Jan 25, 2005, 6:03:58 PM1/25/05
to
In article <kfnllah...@alumnus.caltech.edu>,
Tim Rentsch <t...@alumnus.caltech.edu> wrote:

> "Douglas A. Gwyn" <DAG...@null.net> writes:
>
> > Lawrence Kirby wrote:
> > > "In simple assignment (=), the value of the right operand is converted to
> > > the type of the assignment expression and replaces the value stored in
> > > the object designated by the left operand."
> > > Here there is a clear sequencing, ...
> >
> > No, there are no "sequencing" words such as "then" in that spec.
>
> In 6.5.16 p 3 there is the statement
>
> An assignment expression has the value of the left operand
> after the assignment, but is not an lvalue.
>
> Are you claiming that "after" is not a sequencing word?

An assignment expression has the same value (at the time when it yields
its value, which may be at any time before the value is used) that the
left operand will have after the side effect of the assignment will take
place, which is at some point between the previous and next sequence
point, and therefore may quite well be after the assignment expression
yields its value.


> Note the wording: not "the value that is to be assigned to the left
> operand" but "the value of the left operand".

And there is a good reason for that: When storing a value into a bit
field, which may not have enough space to hold the value, the value of
the assignment expression is not the value that the assignment operator
attempts to store, but the value reduced to the size of the bit field,
which may be smaller.

> The semantics required here seem clear and unambiguous: a value is
> assigned into the left operand, and *then* the value of an assignment
> expression is the value of the left operand. That's what "after"
> means.

Not at all. It is the value that the left hand sign will have at some
point in the future after the side effect takes place, not the value
that it had before. That means x = x+1; behaves like ++x; and not like
x++; . This is not at all obvious; the C Standard could instead have
said that the value of the assignment expression would be the value
_before_ the assignment, which would be quite useful in some situations.
Like previous_value = current_value = new_value; , which would copy
current_value to previous_value and then store new_value into
current_value if the C Standard was written in a different way. Instead
the authors of the C Standard decided that the result is the new value.

> The side effect of updating the stored value of the left operand might
> not have occurred before the value of the assignment expression is
> delivered; updating is guaranteed only before the next sequence
> point. But the assignment - which determines what value is to be
> stored and where - happens before the value is produced.

There is no need at all to evaluate the left hand side of an assignment
operator and determine _where_ a value is stored to figure out what the
result of the assignment expression is.

Douglas A. Gwyn

unread,
Feb 1, 2005, 3:39:36 PM2/1/05
to
James Kuyper wrote:
> ... However, the typical uses for volatile are such

> that the relative order of the read and write in i=j*0 might be
> important for reasons that are outside the scope of the C standard.
> Therefore, it can be important to know whether or not the C standard
> guarantees that order, and it might be inconvenient if it doesn't.

The programmer should always think "sequence point" in
such a situation, then provide one.

Douglas A. Gwyn

unread,
Feb 1, 2005, 3:41:49 PM2/1/05
to
David Hopwood wrote:
> What is unclear is whether the read of j must occur before the write to i,

No, there is no such requirement.
No s.p. is interposed, and the standard does not use sequential
terminology such as "then" for the semantics of assignment.

Douglas A. Gwyn

unread,
Feb 1, 2005, 3:45:04 PM2/1/05
to
David Hopwood wrote:
> Now I see why having a formal specification for a programming language is
> essential, rather than just a good idea.

What would that gain? The decision to not specify the sequence
was deliberate, and thus would have to be reflected in the
formal semantics.

Douglas A. Gwyn

unread,
Feb 1, 2005, 3:50:28 PM2/1/05
to
Tim Rentsch wrote:
> An assignment expression has the value of the left operand
> after the assignment, but is not an lvalue.
> Are you claiming that "after" is not a sequencing word?

It is meant to be read as "the value ... after the assignment",
i.e. an adjectival clause with the value as its object, for
reasons that Larry explained in a nearby posting (the value
can be modified according to the kind of target). There is
no intent that the value be fetched from the target object
after [in the time-sequencing sense] it is stored. The
"value after the assignment" can be, and normally is,
determined before any storage operation occurs.

David Hopwood

unread,
Feb 1, 2005, 7:15:17 PM2/1/05
to

Precisely. It would make the decision clear.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Tim Rentsch

unread,
Feb 2, 2005, 7:26:17 PM2/2/05
to
Christian Bau <christ...@cbau.freeserve.co.uk> writes:

> In article <kfnllah...@alumnus.caltech.edu>,
> Tim Rentsch <t...@alumnus.caltech.edu> wrote:
>
> > "Douglas A. Gwyn" <DAG...@null.net> writes:
> >
> > > Lawrence Kirby wrote:
> > > > "In simple assignment (=), the value of the right operand is converted to
> > > > the type of the assignment expression and replaces the value stored in
> > > > the object designated by the left operand."
> > > > Here there is a clear sequencing, ...
> > >
> > > No, there are no "sequencing" words such as "then" in that spec.
> >
> > In 6.5.16 p 3 there is the statement
> >
> > An assignment expression has the value of the left operand
> > after the assignment, but is not an lvalue.
> >
> > Are you claiming that "after" is not a sequencing word?
>
> An assignment expression has the same value (at the time when it yields
> its value, which may be at any time before the value is used) that the
> left operand will have after the side effect of the assignment will take
> place, which is at some point between the previous and next sequence
> point, and therefore may quite well be after the assignment expression
> yields its value.

Yes, the side effect may happen later. But the assignment operator
has already been performed. See below.


> > The semantics required here seem clear and unambiguous: a value is
> > assigned into the left operand, and *then* the value of an assignment
> > expression is the value of the left operand. That's what "after"
> > means.
>
> Not at all. It is the value that the left hand sign will have at some
> point in the future after the side effect takes place, not the value

> that it had before. [snip]

The important thing is that the value is produced after (and is a
result of) the evaluation of the assignment operator. The side effect
of updating the stored value of the left operand (also a result of the
evaluation of the assignment operator) is a separate event. It's not
necessary for the side effect of updating the stored value to have
occurred before producing a value; but it is necessary to perform the
operation (of assignment) before producing a value, since that's what
yields the value. See below.

(Of course Christian meant "left hand side" and not "left hand sign".)


> > The side effect of updating the stored value of the left operand might
> > not have occurred before the value of the assignment expression is
> > delivered; updating is guaranteed only before the next sequence
> > point. But the assignment - which determines what value is to be
> > stored and where - happens before the value is produced.
>
> There is no need at all to evaluate the left hand side of an assignment
> operator and determine _where_ a value is stored to figure out what the
> result of the assignment expression is.

A careful reading of language in the standard shows otherwise.
Relevant language appears in:

3.4.4 (unspecified behavior) p 1
3.17 (value) p 1
5.1.3.2 Program Execution p 1-3
6.4.6 Punctuators p 1-2
6.5 Expressions p 1-3
6.5.16 Assignment Expressions p 1-4
6.5.16.1 Simple Assignment p 1-3

(The complete language of these paragraphs is also appended below.)

So, how is the assignment 'p = p->next = q;' to be performed?
That starts in section 5.1.3.2 p1 and p3:

The semantic descriptions in this International Standard describe
the behavior of an abstract machine ...

In the abstract machine, all expressions are evaluated as specified
by the semantics.

To change the value of 'p', the leftmost assignment is evaluated per
6.5.16.1 p2:


In <i>simple assignment</i> (=), the value of the right operand


is converted to the type of the assignment expression and replaces
the value stored in the object designated by the left operand.

and 6.5.16 p3:

An assignment operator stores a value in the object designated
by the left operand.

What value is assigned? The value of the right operand. The value
of the right operand is the result of evaluating the assignment
expression on the right ('p->next = q'), 6.5.16 p3:

An assignment expression has the value of the left operand
after the assignment,

which is to say, after the assignment operator has been performed.
Not necessarily after updating the stored value - that is identified
with separate language, also in 6.5.16 p3:

The side effect of updating the stored value of the left operand

shall occur between the previous and the next sequence point.

Is there other support for this view? Yes there is, in discussing
'Punctuators', 6.4.6 p2:

Dependending on context, [a punctuator] may specify an operation
to be performed (which in turn may yield a value or a function
designator, produce a side effect, or some combination thereof)
in which case it is known as an <i>operator</i>

It is performing the operation of assignment (during the evalution of
the assignment expression) that supplies "in turn" the consequential
events of yielding a value or producing a side effect. Since the
value comes from performing the assignment operation, the evaluation
of the assignment expression has already been done.

What about language regarding ordering? 6.5 p3:

the order of evaluation of subexpressions and the order in


which side effects take place are both unspecified.

and 6.5.16 p4:

The order of evaluation of the operands is unspecified.

The word "unspecified" is not the same as "totally unconstrained".
Note 3.4.4 p1:

<b>unspecified behavior</b>

behavior where this International Standard provides two or
more possibilities and imposes no further requirements on which
is chosen in any instance

It is only the possible orderings (as implied by the rest of the
standard) that need be considered. As per previous discussion, the
value being used is produced after the assignment operation has been
performed, so all possible orderings have those events happening in
that order; and that's entirely consistent with the citations about
unspecified ordering.

Incidentally, note that 6.5 p3 reinforces the view that performing an
operation (which is necessary to evalute a subexpression with that
operator) and side effects are distinct events.

Of course there is the famous 6.5 p2:

Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be read only to
determine the value to be stored.

I discussed the key second sentence at length in another posting
(and have seen no followups to that posting). How that sentence
should be interpreted is at best debatable; but I think it's fair
to say none of those arguments are as convincing as the constructive
reasoning presented above.

Final comment: in framing my statements, I referenced language of the
standard itself, and deliberately did not appeal to motivations,
intentions, or statements of authority (for example, of committee
members). The language of the standard should stand on its own; if
to understand it there is a need to appeal to motivations or
intentions (not expressed in the standard itself), that's a clear sign
that the language of the standard is not yet clear enough.

(None of which is meant to detract from the efforts of committee
members or others who may have had a hand in writing the standard.
It's a remarkable document and a tremendous accomplishment.)


======================================================================
Cited sections:

3.4.4 p 1

<b>unspecified behavior</b>

behavior where this International Standard provides two or more
possibilities and imposes no further requirements on which is chosen
in any instance


3.17 p 1

<b>value</b>

precise meaning of the contents of an object when interpreted as
having a specific type


5.1.3.2 Program Execution

The semantic descriptions in this International Standard describe the
behavior of an abstract machine in which issues of optimization are
irrelevant.

Accessing a volatile object, modifying an object, modifying a file, or
calling a function that does any of these operations are all <i>side
effects</i>, which are changes in the state of the execution
environment. Evaluation of an expression may produce side effects.
At certain specified spoints in the execution sequence called
<i>sequence points</i>, all side effects of previous evaluations shall
be complete and no side effects of subsequent evaluations shall have
taken place. (A summary of the sequences points is given in
[informative] annex C.)

In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).


6.4.6 Punctuators

Syntax

<i>punctuator:</i> one of
[ ] ( ) { } . ->
++ -- & * + - ~ !
/ % << >> < > <= >= == != ^ | && ||
? : ; ...
= *= /= %= += -= <<= >>= &= ^= |=
, # ##
<: :> <% %> %: %:%:

Semantics

A punctuator is a symbol that has independent syntactic and semantic
significance. Dependending on context, it may specify an operation to
be performed (which in turn may yield a value or a function
designator, produce a side effect, or some combination thereof) in
which case it is known as an <i>operator</i> (other forms of operator
also exist in some contexts). An <i>operand</i> is an entity on which
an operator acts.


6.5 Expressions

An <i>expression</i> is a sequence of operators and operands that
specifies the computation of a value, or that designates an object or
a function, or that generates side effects, or that performs a
combination thereof.

Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored.

The grouping of operators and operands is indicated by syntax. Except
as specified later (for the function-call (), &&, ||, ?:, and comma
operators), the order of evaluation of subexpressions and the order in


which side effects take place are both unspecified.


6.5.16 Assignment operators

Syntax

<i>assignment-expression:</i>
<i>conditional-expression</i>
<i>unary-expression</i> <i>assignment-operator</i> <i>assignment-expression</i>

<i>assignment-operator:</i> one of
= *= /= %= += -= <<= >>= &= ^= |=

Constraints

An assignment operator shall have a modifiable lvalue as its left
operand.

Semantics

An assignment operator stores a value in the object designated by the
left operand. An assignment expression has the value of the left
operand after the assignment, but is not an lvalue. The type of an
assignment expression is the type of the left operand unless the left
operand has qualified type, in which case it is the unqualified
version of the type of the left operand. The side effect of updating
the stored value of the left operand shall occur between the previous
and the next sequence point.

The order of evaluation of the operands is unspecified. If an attempt
is made to modify the result of an assignment operator or to access it
after the next sequence point, the behavior is undefined.


6.5.16.1 Simple assignment

Constraints

One of the following shall hold

-- the left operand has qualified or unqualified arithmetic type
and the right has arithmetic type;

-- the left operand has a qualified or unqualified version of a
structure or union type compatible with the type of the right;

-- both operands are pointer to qualified or unqualified versions
of compatible types, and the type pointed to by the left has all
the qualifiers of the type pointed to by the right;

-- one operand is a pointer to an object or incomplete type and the
other is a pointer to void, and the type pointed to by the left
has all the qualifiers of the type pointed to by the right;

-- the left operand is a pointer and the right is a null pointer
constand; or

-- the left operand has type _Bool and the right is a pointer.

Semantics

In <i>simple assignment</i> (=), the value of the right operand is


converted to the type of the assignment expression and replaces the
value stored in the object designated by the left operand.

If the value being stored in an object is read from another object
that overlaps in any way the storage of the first object, then the
overlap shall be exact and the two objects shall have qualified or
unqualified versions of a compatible type; otherwise, the behavior is
undefined.

Lawrence Kirby

unread,
Feb 3, 2005, 12:10:47 PM2/3/05
to
On Fri, 21 Jan 2005 00:00:10 +0000, Douglas A. Gwyn wrote:

> Lawrence Kirby wrote:
>> "In simple assignment (=), the value of the right operand is converted to
>> the type of the assignment expression and replaces the value stored in
>> the object designated by the left operand."
>> Here there is a clear sequencing, ...
>
> No, there are no "sequencing" words such as "then" in that spec.

I think it is more fundamental than that. We have an abstract machine in
which from 5.1.2.3 "issues of optimisation are irrelevant" and "all
expressions are evaluated as specified by the semantics." So take the
expression

a + b + c

where a, b and c are all ints. After syntax analysis we have a parse
equivalent to

(a + b) + c

At the top level we have an additive-expression of the + form with
operands (a + b) and c. The semantics of additive-expression state "The
result of the binary + operator is the sum of the operands." Before we can
calculate "the sum of the operands" we have to know what the values of the
operands are. According to 6.5p3 the order of evaluation of the operands
is unspecified. I'll evaluate the RHS first which gives me the value of
the object designated by c and has type int. Evaluating the LHS means I
have to evaluate a + b. This is another + additive-expression so I apply
the same rules: I get the sum of the operands but to do that I have to get
the values of the operands first whuch are just the values of the
objects a and b. I now have the values of the operands of a + b,
constraints are met (I should probably do constraints checking as a first
pass but it doesn't matter for my purposes), so now I can execute the
+ operator and I get a result, also of type int. I now have both operand
values for the top level + operator, constraints are met, and I execute it
to get the result.

Fundamentaly the abstract machine is a sequenced machine. You have to
apply the rules by rote like this because this is the abstract machine
where issues of optimisation are irrelevant" and "all expressions are
evaluated as specified by the semantics". "The result of the binary +
operator is the sum of its operands" as such it can have no result
until the values of its operands are established. Its behaviour may even
turn out to be undefined once the values of its operands are established
and the summation is actually performed. The abstract machine is not an
oracle, it gets results by applying the rules, not plucking them out of
the air.

> And the order of the words in the (necessarily linear) text does
> not imply the sequence; we could equivalently have said "the
> value stored in the object designated by the left operand is
> replaced by the value of the right operand converted to the type
> of the assignment expression". The order of evaluation of the
> operands is unspecified.

What I'm saying is that in the abstract machine it is fundamentally the
case that you must evaluate an operator's operands before you can execute
the operator to produce a result (noting explicit non-evaluation semantics
for && || and ?: ). To do otherwise is an optimisation, which is forbidden
in the abstract machine. Also consider for example that in

i*0 + j

I still have to evaluate i to find the result of i*0 even if I have
already spotted that the result must be 0. To do otherwise would again be
an optimisation.

I guess you can see where I'm going to go with this, but for the moment
I'll leave it here because if we can't agree on this there's no point in
going further. 6.5p2 will come into it later but I've chosen an example
here that isn't affected by it.

>> "Furthermore, the prior value shall be read only to determine the value to
>> be stored."
>

> Which is what makes p = p->next = q undefined. The prior value
> of the object designated by p is read (between sequence points)
> for a second purpose,

Jumping track here

You call that a second purpose, but to me it is still part of the
evaluation of the value to be stored in p so is just a part of the same
purpose. There is nothing I can find in the standard to warrant such
a distinction of "purpose".

Lawrence

Douglas A. Gwyn

unread,
Feb 4, 2005, 5:27:28 PM2/4/05
to
Lawrence Kirby wrote:
> >> "Furthermore, the prior value shall be read only to determine the value to
> >> be stored."
> > Which is what makes p = p->next = q undefined. The prior value
> > of the object designated by p is read (between sequence points)
> > for a second purpose,
> You call that a second purpose, but to me it is still part of the
> evaluation of the value to be stored in p so is just a part of the same
> purpose. There is nothing I can find in the standard to warrant such
> a distinction of "purpose".

The first "to" pertains to purpose.
The "only" is all-important.
The question to be decided is whether reading p for
the purpose of selecting the "next" member is for
the purpose of determining the value to be stored
via the leftmost p.
It doesn't seem to me that determining the value
involves reading p at run time, just examining its
type at compile time (to determine the type of the
left operand of the right-hand assignment expression,
which enters into determining the value of the right-
hand assignment expression).

Douglas A. Gwyn

unread,
Feb 4, 2005, 5:17:06 PM2/4/05
to
Tim Rentsch wrote:
> In the abstract machine, all expressions are evaluated as specified
> by the semantics.

The relevance of that is twofold: it provides the "as if" that
constrains optimization; and it guides the access behavior for
volatile qualification.

If the semantics do not specify a time ordering, then it is
unspecified for the abstract machine.

> An assignment expression has the value of the left operand
> after the assignment,
> which is to say, after the assignment operator has been performed.

This has been addressed before in this thread, but: it means
that the value of the whole expression is the value that the
left operand *would have* after the assignment. As you noted,
storage into the left-operand object is a separate matter.

> Between the previous and next sequence point an object shall have
> its stored value modified at most once by the evaluation of an
> expression. Furthermore, the prior value shall be read only to
> determine the value to be stored.

Indeed, that's the most relevant part. If the program violates
that requirement, it has undefined behavior, and in particular
there are no guarantees that can be inferred from any amount of
interpretation of the other parts of the specification.

Tim Rentsch

unread,
Feb 9, 2005, 7:19:12 AM2/9/05
to
(I'm actually responding to two separate postings by Doug Gwyn, but
one of them has slipped out of my news server's cache so the replies
are collected into one here.)


"Douglas A. Gwyn" <DAG...@null.net> writes:

> Tim Rentsch wrote:
> > An assignment expression has the value of the left operand

> > after the assignment, but is not an lvalue.
> > Are you claiming that "after" is not a sequencing word?
>

> It is meant to be read as "the value ... after the assignment",
> i.e. an adjectival clause with the value as its object, for
> reasons that Larry explained in a nearby posting (the value
> can be modified according to the kind of target). There is
> no intent that the value be fetched from the target object
> after [in the time-sequencing sense] it is stored. The
> "value after the assignment" can be, and normally is,
> determined before any storage operation occurs.

When read in isolation, this proposed explanation sounds plausible.
The problem is that it doesn't mesh with language used elsewhere in
the standard. There seems to be (in the explanation paragraph above)
an implicit assumption that an "assignment" is the same as a "storage
operation". But the standard doesn't use the term "storage operation"
(at least, not that I could find). A reading that matches language in
the standard much better is "An assignment expression has the value of
the left operand after the assignment operator has been evaluated."

The reason for mentioning "the value of the left operand" is that
evaluating the assignment operator (optionally) modifies the value of
the right operand to conform to the type of the left operand. The
side effect of updating the stored value of the left operand is
*caused* by evaluating the assignment operator, but that's incidental.
What matters is the value of the assignment expression; and, it is
the evaluation of the assignment operator that yields that value.
Confer sections 6.4.6 p 2 and 5.1.3.2 p 3.


> Tim Rentsch wrote:
> > In the abstract machine, all expressions are evaluated as specified
> > by the semantics.
>
> The relevance of that is twofold: it provides the "as if" that
> constrains optimization; and it guides the access behavior for
> volatile qualification.

It also constrains the set of possible evaluation orderings, which may
limit what behaviors are possible, even if they are unspecified.
Confer section 3.4.4 p 1.

> If the semantics do not specify a time ordering, then it is
> unspecified for the abstract machine.

First, as noted previously, "unspecified" doesn't mean "completely
unconstrained"; it is only necessary that the semantics limit the set
of possible evaluations to produce implications about sequencing.

Second, the standard does contain language that (partially) specifies
evaluation order (meaning of course beyond the obvious specifications
for operators that have sequence points in them). Besides the sections
previously cited, consider for example the notes:

10) EXAMPLE 2 In executing the fragment

char c1, c2;
/* ... */
c1 = c1 + c2;

the "integer promotions" require that the abstract machine
promote the value of each variable to int size *and then*
add the two ints ... [emphasis mine]

and

71) The syntax specifies the *precedence* of operators in the
*evaluation* of an expression ... [emphasis mine]

Granted, these are just notes, but notes contain statements that are
implied by the main text. Now for that word, "precedence", what does
that mean? Some definitions:

precedence - The act or state of preceding or going before in order of
time; priority; as, one event has precedence of another.

operator precedence - An ordering rule defining the sequence of the
application of operators within an expression.

The second of these definitions is from ISO/IEC 2382-15 (second
edition).


> > An assignment expression has the value of the left operand
> > after the assignment,
> > which is to say, after the assignment operator has been performed.
>
> This has been addressed before in this thread, but: it means
> that the value of the whole expression is the value that the
> left operand *would have* after the assignment. As you noted,
> storage into the left-operand object is a separate matter.

The value that *is yielded* by evaluating the assignment operator,
which is what matters.


> > Between the previous and next sequence point an object shall have
> > its stored value modified at most once by the evaluation of an
> > expression. Furthermore, the prior value shall be read only to
> > determine the value to be stored.
>
> Indeed, that's the most relevant part. If the program violates
> that requirement, it has undefined behavior, and in particular
> there are no guarantees that can be inferred from any amount of
> interpretation of the other parts of the specification.

I agree that *if* the requirement is violated then the program has
undefined behavior. But I don't see a compelling argument that the
requirement has been violated here. There's been debate around the
meaning of the language in the paragraph above in this thread; I've
read through those arguments and am still not convinced. On top of
that, I posted a rather long message myself trying to bring out the
real meaning here, and as far as I know no one responded to it.

Again, I'm not intending to speak to the question of intent. I'm
discussing the language of the C standard document (or documents, as
other published documents may be relevant). If the document can't be
explained without needing to refer to some not-generally-available
"this is what we decided" discussion, then pretty clearly the
language of the document needs clarifying.

[Personal note: thanks again to Doug Gwyn for articulating the
comments he gave here. Looking back using Google groups, I read
through a long thread (over 300 messages) from about four or five
years ago on a related topic, and many of those were also from Doug
Gwyn...]

Lawrence Kirby

unread,
Feb 9, 2005, 9:43:49 AM2/9/05
to
On Fri, 04 Feb 2005 22:27:28 +0000, Douglas A. Gwyn wrote:

> Lawrence Kirby wrote:
>> >> "Furthermore, the prior value shall be read only to determine the value to
>> >> be stored."
>> > Which is what makes p = p->next = q undefined. The prior value
>> > of the object designated by p is read (between sequence points)
>> > for a second purpose,
>> You call that a second purpose, but to me it is still part of the
>> evaluation of the value to be stored in p so is just a part of the same
>> purpose. There is nothing I can find in the standard to warrant such
>> a distinction of "purpose".
>
> The first "to" pertains to purpose.

The problem with that is that the standard provides no definition of what
constitutes a "purpose". It is not at all obvious, although there appears
to be some connection with side-effects.

My interpretation of "purpose" is anything that needs to be done in the
abstract machine in order to determine the new value to be assigned, which
boils down to evaluation of the RHS of the assignment. I don't see
anything unreasonable in this interpretation, and it shows up clear
cases of undefined behaviour e.g. in

j = (i = k) + i;

> The "only" is all-important.

Indeed, but what does it mean? My example above shows a clear case of
other usage of i. If you also consider

j = (i = 2*i) + i;

then the need for the word "only" to rule out this expression is obvious.
Consider the following under your interpretation:

j = i = 2*i;

Is i being accessed *only* to determine the new value to be assigned to i,
or also to determine the new value to be assigned to j? Is this expression
well-defined?

> The question to be decided is whether reading p for the purpose of
> selecting the "next" member is for the purpose of determining the value
> to be stored via the leftmost p.
> It doesn't seem to me that determining the value involves reading p at
> run time, just examining its type at compile time (to determine the type
> of the left operand of the right-hand assignment expression, which
> enters into determining the value of the right- hand assignment
> expression).

We're back to the difference between the abstract machine and an
implementation. IMO you are applying optimisations principles that are
appropriate for an implementation to the abstract machine where they are
inappropriate and indeed prohibited. What is and isn't correct and
well-defined behaviour is determined in the abstract machine.
Optimisations in an implementation do not change that, they are simply
ways of producing the required behaviour more "efficiently" in some sense.
It is certainly true that areas of undefined behaviour in the
standard/abstract machine can provide greater scope for optimisation in
an implementation, but the abstract machine defines the required
behaviour, not the optimisations.

Lawrence

0 new messages