Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

HELP: Comma operator

1 view

Skip to first unread message

Bradford Chamberlain

unread,

Jul 23, 1996, 3:00:00 AM7/23/96

I'm trying to implement a double-squaring function using a macro using
the comma operator as follows:

static double tmp;

#define sqr(x) ((tmp=(x)),(tmp*tmp))

This works fine when I use it once per statement:

x = sqr(2);
y = sqr(x);

However, if I have multiple non-nested uses in a statement, it doesn't:

x = sqr(2) + sqr(3);

returns 8 or 18, depending on the compiler. Putting a print statement
between the two expressions in the sqr() macro indicates that in each
invocation, all the values are being used correctly. Therefore, I can
only assume that the right-hand expression of each invocation isn't
being evaluated until the left expression of the other invocation has
been (giving 2*sqr(2) or 2*sqr(3), depending on the compiler's order
of evaluation, presumably).

From what I've read in K&R, this seems incorrect. Since each
expression is fully parenthesized, I'd expect one of the "calls" to
sqr() to evaluate to its double result before starting the other,
giving the right answer. Is this a bug in my compilers, or my way
of thinking?

Any help would be appreciated. I'm compiling on a DEC alpha using gcc
version 2.6.3 and cc from OSF/1.

Thanks,
-Brad

@#$%!?!

unread,

Jul 23, 1996, 3:00:00 AM7/23/96

: #define sqr(x) ((tmp=(x)),(tmp*tmp))

: x = sqr(2) + sqr(3);

: returns 8 or 18, depending on the compiler. Putting a print statement

The problem is you are expected the assignments and multiplies to appear
in textual order. While it is true an assignment to tmp will complete
before a use of tmp, you have no guarentee on how two collateral
assignments will be orderred.

Don't assign the same variable twice in one statement.

: being evaluated until the left expression of the other invocation has

: been (giving 2*sqr(2) or 2*sqr(3), depending on the compiler's order
: of evaluation, presumably).

You could get 2*2+3*3 or 2*2+2*2 or 3*3+3*3 or 2*3+3*2 or 2*2+2*3 or ...

: expression is fully parenthesized, I'd expect one of the "calls" to

No. Parenthesisation defeats reassociation, (a+b)+c is not assumed
equivalent to a+(b+c). There is some orderring because (a+b) has to
complete before (a+b)+c can, but there is no constraints on when
(a+b) or c is evaluated. They can even be evaluated simulatenously.
--
In mirrored maze he met the Mother, | smr...@netcom.com PO Box 1563
the lost and breathless, lonely brother. | Cupertino, California
Both crone and child, now crying wild, | (xxx)xxx-xxxx 95015
her clinging clay will clothe and smother. | I don't use no smileys

Tim Behrendsen

unread,

Jul 23, 1996, 3:00:00 AM7/23/96

Bradford Chamberlain <br...@rosalyn.cs.washington.edu> wrote in article
<BRAD.96Ju...@rosalyn.cs.washington.edu>...

> I'm trying to implement a double-squaring function using a macro using
> the comma operator as follows:
> static double tmp;
> #define sqr(x) ((tmp=(x)),(tmp*tmp))
> This works fine when I use it once per statement:
> x = sqr(2);
> y = sqr(x);
> However, if I have multiple non-nested uses in a statement, it doesn't:

> x = sqr(2) + sqr(3);
> returns 8 or 18, depending on the compiler. Putting a print statement

> between the two expressions in the sqr() macro indicates that in each
> invocation, all the values are being used correctly. Therefore, I can
> only assume that the right-hand expression of each invocation isn't

> being evaluated until the left expression of the other invocation has
> been (giving 2*sqr(2) or 2*sqr(3), depending on the compiler's order
> of evaluation, presumably).
>

> From what I've read in K&R, this seems incorrect. Since each

> expression is fully parenthesized, I'd expect one of the "calls" to

> sqr() to evaluate to its double result before starting the other,
> giving the right answer. Is this a bug in my compilers, or my way
> of thinking?
>
> Any help would be appreciated. I'm compiling on a DEC alpha using gcc
> version 2.6.3 and cc from OSF/1.
> Thanks,
> -Brad

Man, that was a tricky one ... but I think I know the answer.

If you do this ...

printf("%d",(tmp=3,3*3) + (tmp=2,2*2))
printf(" tmp=%d", tmp);

I get "18 tmp=3".

Which means that the assignment to 3 is done after the
assignment to 2. Since we got a result of 18, that means
the assignments were done before any of the other expressions.
That's the clue.

The reason this makes sense is that '=' has higher precedence than
any of the other operators, and thus the C compiler did them
first before the mathematical expressions.

So, the entire expression was evaluated with 'tmp=3'.

-- Tim Behrendsen (t...@airshields.com)

Keith Edward O'hara

unread,

Jul 24, 1996, 3:00:00 AM7/24/96

In article <TANMOY.96J...@qcd.lanl.gov> (Tanmoy Bhattacharya) writes:
>In article <BRAD.96Ju...@rosalyn.cs.washington.edu>
>(Bradford Chamberlain) writes:
>BC:
>BC: I'm trying to implement a double-squaring function using a macro using
>BC: the comma operator as follows:
>BC:
>BC: static double tmp;
>BC: #define sqr(x) ((tmp=(x)),(tmp*tmp))
>BC:
>BC: x = sqr(2) + sqr(3);
>
I'm glad Brad posted this question, as Tanmoy Battacharya's response
prompts me to ask some questions.

After defining sequence points, Tanmoy notes
>
>There is also one just before every function call.
>
I take this to mean that all of the arguments are evaluated before the
function begins execution (which one might take for granted) and nothing
more.

>Now, let us look at the expression (tmp=(x),(tmp*tmp)) +
>(tmp=(y),(tmp*tmp)). The + operator does not define a sequence point,
>so execution can proceed on both branches of the expression
>simultaneously. tmp=(x) is followed by a sequence point, so the store
>of x has to be completed by then: similarly tmp=(y) is followed by a
>sequence point, so store of y is to be completed by then. However,
>there is nothing which says that the store of y cannot come before the
>fetch required to calculate tmp*tmp in the first expression.
>
Then the use of the phrase 'sequence point' does not imply that there is
one and only one sequence of operations. Is it true that there can be
arbitrarily many (sub)expressions, each with its own 'sequence', in a C
statement?

>As I said, the best help is the FAQ. It has an extensive discussion on
>this.
Actually, not as extensive as what you've just given. Also, understanding
how the FAQ section on Expressions relates to this problem (with its macro
invocations and comma operators) requires more sophistication than I had
upon finishing K&R.
-----
keith

Tim Behrendsen

unread,

Jul 24, 1996, 3:00:00 AM7/24/96

Tanmoy Bhattacharya <tan...@qcd.lanl.gov> wrote in article
<TANMOY.96J...@qcd.lanl.gov>...
> In article <01bb78bb$62355360$87ee...@timpent.airshields.com>
> "Tim Behrendsen" <t...@airshields.com> writes:
>
> B: The reason this makes sense is that '=' has higher precedence than
> B: any of the other operators, and thus the C compiler did them
> B: first before the mathematical expressions.
>
> This is incorrect: side effects have nothing to do with precedence
> and/or parentheses. Read the FAQ for more details.
>
> More care is necessary when posting answers: people might assume that
> you know what you are saying. At the minimum, please cross-check with
> the FAQ.

I think I'm going to have to disagree with you; what do you think
side effects are? The assignment operator is just an operator with
it's own precedence. Side effects come into play based on the order
of evaluation, which is *purely* defined by the precedence rules.

I read your previous post, which I admit was much more detailed than
mine. But, your "sequence points" are defined by the precedence of
the operators. My post looked at the problem a little differently
than yours, but mine was essentially correct (and easier to
understand).

More care is necessary when criticizing other's posts: people might
assume that you know what you are saying.

Regards, -- Tim Behrendsen (t...@airshields.com)

Tim Behrendsen

unread,

Jul 24, 1996, 3:00:00 AM7/24/96

Ken Nicolson (ke...@owl.co.uk> wrote:
> "Tim Behrendsen" <t...@airshields.com> wrote:
>
> [snip!]

>
> >I think I'm going to have to disagree with you; what do you think
> >side effects are? The assignment operator is just an operator with
> >it's own precedence. Side effects come into play based on the order
> >of evaluation, which is *purely* defined by the precedence rules.
> >
> >I read your previous post, which I admit was much more detailed than

> ^^^^^^^^
> You've mis-spelt "correct" here.

OK, I'm willing to be wrong here, I went back and re-read his original
post. It is much more detailed as to why side-effects happen, but it
seems to me that it is simply not more accurate. I would be sincerely
interested in knowing where I am mistaken on this.

> >mine. But, your "sequence points" are defined by the precedence of
> >the operators. My post looked at the problem a little differently
> >than yours, but mine was essentially correct (and easier to
> >understand).
>

> >From Tammoy's post:

>
> >>Now, let us look at the expression (tmp=(x),(tmp*tmp)) +
> >>(tmp=(y),(tmp*tmp)). The + operator does not define a sequence point,
> >>so execution can proceed on both branches of the expression
> >>simultaneously. tmp=(x) is followed by a sequence point, so the store
> >>of x has to be completed by then: similarly tmp=(y) is followed by a
> >>sequence point, so store of y is to be completed by then. However,
> >>there is nothing which says that the store of y cannot come before the
> >>fetch required to calculate tmp*tmp in the first expression.
>

> This may be "harder to understand", but it is (AFAIK) correct and
logically
> sound.

"Sequence point" is another name for "precedence rules", unless there is
something in the ANSI standard that I'm not familiar with. If this really
is a separate concept, I would appreciate being enlightened.

> >From your post:
>
> >:printf("%d",(tmp=3,3*3) + (tmp=2,2*2))

> >:printf(" tmp=%d", tmp);
> >:
> >:I get "18 tmp=3".
> >:
> >:Which means that the assignment to 3 is done after the
> >:assignment to 2. Since we got a result of 18, that means
> >:the assignments were done before any of the other expressions.
> >:That's the clue.
>

> Eh? Because your compiler prints 18, it must be correct. My compiler
prints
> 13 - how does your "essentially correct (and easier to understand)" post
> explain that?

Well, first of all, I had a type-o in my post. It should have been

printf("%d",(tmp=3,tmp*tmp) + (tmp=2,tmp*tmp));
printf(" tmp=%d", tmp);

You will note the "I get", I am referring to my compiler for purposes
of example. I was not only trying to help the fellow understand why
he got side effects, I was trying to show the process by which he
could have figure out what was going on.

Yes, I probably could have waxed much more rhetorically on the
subject of precedence and sub-expressions, but my post gave the
essence of the problem in far fewer words.

> >More care is necessary when criticizing other's posts: people might
> >assume that you know what you are saying.
>

> I don't think you've been here very long. Tammoy is one of the regulars
who
> knows what he is talking about.
>
> Ken

It's obvious he does know what he is talking about. The reason I added
that was in response to the arrogance of his original post. I didn't
imply his post was inaccurate (in fact, I said the opposite). I am
making the statement that he should be a bit more careful is his
criticism, particularly when it is arrogantly phrased.

If he is going to make the bald statement that "side effects have
nothing to do with precedence and/or parentheses", he is technically
correct in the sense that assignments are one issue, and order of
evaluation is another. But to understand why the original poster
got the result he did, the poster must understand precedence.

There is only one thing that is relevent when you are talking about
expression evaluation, and that is the precedence rules. Everything else
falls out from those.

Regards, -- Tim Behrendsen

Ken Nicolson

unread,

Jul 24, 1996, 3:00:00 AM7/24/96

"Tim Behrendsen" <t...@airshields.com> wrote:

[snip!]

>I think I'm going to have to disagree with you; what do you think
>side effects are? The assignment operator is just an operator with
>it's own precedence. Side effects come into play based on the order
>of evaluation, which is *purely* defined by the precedence rules.
>
>I read your previous post, which I admit was much more detailed than
^^^^^^^^
You've mis-spelt "correct" here.

>mine. But, your "sequence points" are defined by the precedence of

>the operators. My post looked at the problem a little differently
>than yours, but mine was essentially correct (and easier to
>understand).

From Tammoy's post:

>>Now, let us look at the expression (tmp=(x),(tmp*tmp)) +
>>(tmp=(y),(tmp*tmp)). The + operator does not define a sequence point,
>>so execution can proceed on both branches of the expression
>>simultaneously. tmp=(x) is followed by a sequence point, so the store
>>of x has to be completed by then: similarly tmp=(y) is followed by a
>>sequence point, so store of y is to be completed by then. However,
>>there is nothing which says that the store of y cannot come before the
>>fetch required to calculate tmp*tmp in the first expression.

This may be "harder to understand", but it is (AFAIK) correct and logically
sound.

From your post:

>:printf("%d",(tmp=3,3*3) + (tmp=2,2*2))
>:printf(" tmp=%d", tmp);
>:
>:I get "18 tmp=3".
>:
>:Which means that the assignment to 3 is done after the
>:assignment to 2. Since we got a result of 18, that means
>:the assignments were done before any of the other expressions.
>:That's the clue.

Eh? Because your compiler prints 18, it must be correct. My compiler prints
13 - how does your "essentially correct (and easier to understand)" post
explain that?

>More care is necessary when criticizing other's posts: people might
>assume that you know what you are saying.

I don't think you've been here very long. Tammoy is one of the regulars who
knows what he is talking about.

>Regards, -- Tim Behrendsen (t...@airshields.com)
>

Ken

Christian Bau

unread,

Jul 25, 1996, 3:00:00 AM7/25/96

In article <01bb798a$11913f80$87ee...@timpent.airshields.com>, "Tim
Behrendsen" <t...@airshields.com> wrote:

> printf("%d",(tmp=3,tmp*tmp) + (tmp=2,tmp*tmp));
> printf(" tmp=%d", tmp);
>

> There is only one thing that is relevent when you are talking about
> expression evaluation, and that is the precedence rules. Everything else
> falls out from those.

This is C (C++ is the same for this discussion) and not FORTRAN or BASIC
or Pascal. Sequence points are extremly important in C. A sequence point
is a point where all preceeding side effects are guaranteed to have
happened completely, and later side effects are guaranteed to not have
happened yet.

Operator precedence and brackets only tell you how the expression is
parsed, that is which operator is applied to which operands. An expression
will be evaluated before it is used, but its side effects are only
guaranteed to happen _before_ the next sequence point.

A C compiler must assign 3 to tmp before evaluating the first tmp*tmp, and
must assign 2 to tmp before evaluating the second tmp*tmp. However there
is no sequence point between tmp=3 and tmp=2, so these assignments can
happen in any order or at the same time.

Now lets say the values are not 2 and 3 but 255 and 256 and you run on an
8-bit processor, so the processor needs two instruction to store values
into tmp. Then it can do the following pseudo-code for (tmp=255,tmp*tmp) +
(tmp=256,tmp*tmp):

tmp.hi = 0; /* Part of side effect of tmp=255 */
tmp.lo = 255; /* Completes side effect of tmp=255 */
tmp.hi = 1; /* Part of side effect of tmp=256. Now tmp = 511 */
somevalue = tmp*tmp; /* Evalute the first half */
tmp.lo = 0; /* Completes side effect of tmp=256 */
othervalue = tmp*tmp;
completeresult = somevalue + othervalue

The result would be 511*511 + 256*256. This code would be a legal and
reasonable translation of the expression.

(Also the standard says that two assignments to tmp without a sequence
point between these assignments cause "undefined behavior" which means
anything can happen when you execute this code without violating the ANSI
C standard)

Mike McCarty

unread,

Jul 25, 1996, 3:00:00 AM7/25/96

In article <01bb798a$11913f80$87ee...@timpent.airshields.com>,
Tim Behrendsen <t...@airshields.com> wrote:

)"Sequence point" is another name for "precedence rules", unless there is
)something in the ANSI standard that I'm not familiar with. If this really
)is a separate concept, I would appreciate being enlightened.

Ok, I will enlighten you. A sequence point is not another name for
precedence rules. A sequence point is (I'm paraphrasing here) a place in
your source where you -know- that certain actions have taken place. For
example, in

x = 5;
y = 4;

the first semicolon is a place where you -know- that x is now set to 5.

x = 5, y = x, z = y + x;

each comma is a sequence point. When y is assigned, you -know- that x
has already been assigned. Likewise, you -know- that y has been assigned
before z is assigned. In

x = f(y = g(z),p = g(w))

you do -not- know whether y has been assigned before p is assigned. You
also do -not- know whether g(.) is even called with z as argument before
g(.) is called with w as argument, because no sequence point occurs.

Also, in

x = (y = g(z)) || (w = g(p))

you -know- that y is assigned before w is assigned. You don't know that
w -will- be assigned, but if it is, then it will be -after- y is
assigned.

x = (y = g(z)) | (w = g(p))

you -know- that y and w will both be assigned, but you don't know in
what order, nor do you know which argument will be passed to g(.) first,
z or p. Note that precedence does not enter into it. Parens have higher
precedence than either || or |.

Mike
--
----
char *p="char *p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}

I don't speak for DSC. <- They make me say that.

!@?*$%

unread,

Jul 25, 1996, 3:00:00 AM7/25/96

> "Sequence point" is another name for "precedence rules", unless there is

> something in the ANSI standard that I'm not familiar with. If this really

> is a separate concept, I would appreciate being enlightened.

No, precedence is a purely syntactic notion that has nothing to do with
evaluation. C uses strict evalution and side effects, they are what
determine the order expression evaluation.

I suspect you are confusing strict evaluation with precedence. In an
expression such as a+b*c, * has higher precedence and so the expression is
parsed as
+ ( a, * ( b, c ) )

Now the strict evalution comes into play: the function (or operator) name
and the function arguments must be evaluated before the function is
called. So *, b, and c, and maybe + and a, are evaluated in some order,
and then the multiplication b*c is evaluated, along with, after, or
before, + and a. And then the addition is evaluated.

Had the expression been, (a+b)*c, even though * still has a higher
precedence, the addition must be evaluated before the multiplication
because the addition is an argument to the multiplication.

(And there are such things as nonstrict evaluation, but not in C.)

In this case, the function names * and + are essentially constants, but
that need not always be the case: ((eta.alpha)(beta))(gamma)

Steve Summit

unread,

Jul 25, 1996, 3:00:00 AM7/25/96

Tim Behrendsen (t...@airshields.com) wrote:
>>> I think I'm going to have to disagree with you; what do you think
>>> side effects are? The assignment operator is just an operator with
>>> it's own precedence. Side effects come into play based on the order
>>> of evaluation, which is *purely* defined by the precedence rules.

This is a common misconception, but it's quite false. Order of
evaluation is affected by much more than "precedence rules";
precedence affects order of evaluation much less than many people
think.

> There is only one thing that is relevent when you are talking about
> expression evaluation, and that is the precedence rules. Everything else
> falls out from those.

There's the misconception again. Order of evaluation does
*not* "fall completely out of" precedence rules. It's true
that precedence seems to have something to do with expression
evaluation; in the expression

a + b * c

we frequently say things like, "The multiplication happens before
the addition, because * has higher precedence than +." But it
turns out this is a risky thing to say; because it gives the
listener the impression that precedence dictates order of
evaluation, which as we'll see it does not. In my classes, I try
desperately not to use the words "happens before" when explaining
operator precedence, although I'm afraid I rarely succeed.

Let's imagine that we had some kind of omniscient processor, which
could look at an expression and instantly give its value, without
doing step-by-step evaluations. If we gave it the expression

1 + 2 * 3

it would give us the result 7. Precedence tells us why it gave
us that answer and not 9. That is, even if thoughts of "order of
evaluation" never enter our heads, precedence is an important and
independent concept, because it tells us what an expression
*means*.

Another way to think about precedence is that it controls how
the compiler parses expressions. The expression

1 + 2 * 3

results in the "parse tree"

+
/ \
1 *
/ \
2 3

Once it has determined the parse tree, the compiler sets about
emitting instructions (in some order) which will implement that
parse tree. The shape of the tree may determine the ordering of
some of the instructions, and precedence affects the shape of the
tree. But the order of some operations may not even be strictly
determined by the shape of the tree, let alone the precedence.
So while precedence certainly has an influence on order of
evaluation, precedence *does* *not* "determine" order of
evaluation. I'd say that precedence is about 40% related to
order of evaluation -- order of evaluation depends about 40% on
precedence, and 60% on other things. (This 40% figure is of
course meaningless; it's just a number I pulled out of the air
that "sounds right" [Barry, 1994]. The point I'm trying to make
is that precedence is not entirely disconnected from order of
evaluation, but the connection is nowhere near 100%.)

We've seen how precedence can affect or influence order of
evaluation. Now let's start looking at all of the ways it does
*not*, or, stated another way, at all of the ways that order of
evaluation is determined by things other than precedence.
(As we'll see, many of the "other things" turn out to be the whim
of the compiler.)

Most of you have seen this example before, but I'll drag it out
again. Suppose we have an expression containing three function
calls:

f() + g() * h()

Now, the result of calling g() will be multiplied by the result
of calling h() "before" that product is added to the result of
calling f(), and indeed, precedence tells us that. But, unless
we have multiple parallel processors, those three functions f(),
g(), and h() are going to be called in some order. What order
will they be called in?

I don't know what order they'll be called in, and you don't know
what order they'll be called in. Precedence doesn't tell us, and
in fact *nothing* in K&R, the ANSI/ISO C Standard, or Dan
Streetmentioner's Boffo C Primer Triple Plus will tell us.
(C: The Complete Reference might tell us, but that's another
story.) There's absolutely nothing preventing the compiler from
calling f() first, even though its result will be needed "last."

If you have any doubts about this, I encourage you to compile and
run this little program:

#include <stdio.h>
main() { printf("%d\n", f() + g() * h()); return 0; }
f() { printf("f\n"); return 1; }
g() { printf("g\n"); return 2; }
h() { printf("h\n"); return 3; }

For the very first two compilers I tried this program with, one
printed: But the other printed:

g f
h g
f h
7 7

Even if your compiler(s) prints g h f 7 "as expected,"
bear with me while we look at this from yet another angle.

What do you do when the precedence is "wrong," that is, when
you want to override the default precedence? You use explicit
parentheses, of course. Parentheses force the operators and
operands within an expression to be grouped in a certain way, and
if you believe that precedence dictates order of evaluation, you
might believe that parentheses dictate order of evaluation, too.
But they do not.

Let's go back to the f() + g() * h() example. Suppose your
compiler were emitting code which called f() first, and for some
reason you wanted it to call f() last. How could you force this?
Could you write

f() + (g() * h())

where the parentheses are supposed to force the stuff inside them
to happen first? You could try it, but I doubt it would make any
difference. Those parentheses would tell the compiler that the
result of g() was supposed to be multipled by the result of h()
before the product was added to the result of f(), but the
compiler was already going to do it that way, anyway, based on
the default precedence. It would still be free to call f()
first. (Indeed, for the compiler of mine that calls f() first,
it still calls f() first even if I add the extra parentheses.)

Suppose, on the other hand, that your compiler is calling g() and
h() first, and f() last. What if, for some reason, you want it
to call f() first? How can you force this? If you belong to the
hit-or-miss, aimless meandering, or drug-induced hallucination
schools of programming, you might try

(f()) + g() * h()

but I'll pay you (or anyone) $100 if you can show me a compiler
(other than one you wrote for the purpose) that can be forced to
call f() first by putting a pair of parentheses around it like
that.

In ANSI/ISO Standard C, when you care about the order in which
things happen, you must take care to ensure that order by using
"sequence points." Sequence points have nothing to do with
precedence, and only partly to do with expressions; in my head
they're up there next to statements. In general, when you want
to control the order in which two things happen, you make the two
things separate statements, and you put one statement after the
other in the order you desire, perhaps using control flow
constructs (e.g. if statements, loops, etc.) to control whether a
statement gets executed at all, or how many times. Indeed, "the
end of an expression statement" is essentially one of the defined
sequence points in Standard C.

K&R C didn't have "sequence points"; Kernighan and Ritchie said
that "explicit temporary variables can be used to force a
particular order of evaluation" and "intermediate results can be
stored in temporary variables to ensure a particular sequence"
[K&R1 p. 49; K&R2 p. 53]. In fact, although Standard C gives us
sequence points, it doesn't give us a tool to get a hold on f()
in f() + g() * h(); if we needed f() called first, we'd have to
write

t = f(); t + g() * h()
or
t = f(), t + g() * h()

The most obvious sequence points in Standard C are the ones at
the ends of full expressions in expression statements, that is,
as exemplified by the semicolon in the first line just above.
Other sequence points are at the comma operator (as in the second
line just above), at the && and || operators, in the ?: operator,
and just before a function call. (I do not claim that this is an
exhaustive list.) On those (hopefully rare) occasions when you
care about the ordering of operations within a single expression,
you have to make sure that the expression contains one or more
sequence points, and in places which will in fact constrain the
order to the one you want. Most of the time, though, when you
have a troublesome expression with too many ordering
dependencies, the right response is the same as that of the
doctor in the old joke about the patient who complains that his
hand hurts if he shakes it in a certain way: "Well, don't do
that, then." Break the expression up into separate statements,
using temporary variables if you have to, and you'll be much
happier.

(I confess that this advice may be easier to give than to apply.
In the article that started this thread, the poster thought that
the insertion of a sequence point or two, in the form of comma
operators, should have constrained the order of evaluation
sufficiently, but unfortunately it did not, because there still
weren't *enough* sequence points. It didn't help that the poster
thought that full parenthesization should have resolved any
remaining evaluation-order problems.)

If the f() + g() * h() example seemed artificial and hence
unconvincing, let's look at a morphologically identical but
eminently realistic example. Suppose we're reading a two-byte
integer from a binary data file in big-endian (least significant
byte first) order. We could call fread() and then swap bytes if
necessary, but that's a cheesy solution, because if (as is
conventional) we implement the "if necessary" test with an
#ifdef, we've condemned everyone who ever compiles our program to
choose an appropriate setting for the #ifdef macro. Instead,
let's call getc() twice, to read the first and then the second
byte, and combine them like this:

i = (firstbyte << 8) | secondbyte;

(Remember, the first byte we read is the most-significant byte,
and the second byte is the least-significant.) So could we write
that as

i = (getc(ifp) << 8) | getc(ifp); /* WRONG */

? No! We could not! We *do* *not* *know* which call to getc()
will happen first, and we very definitely care, because the order
of those two calls is precisely what will determine whether we
read the integer in big-endian or little-endian order. So,
instead, we should write

i = getc(ifp) << 8; /* MSB */
i |= getc(ifp); /* LSB */

By writing this as two separate statements, we get a sequence
point, which ensures the order of evaluation we need. If, on the
other hand, we wanted to read in little-endian order, we'd write

i = getc(ifp); /* LSB */
i |= getc(ifp) << 8; /* MSB */

But, to repeat, we could not write

i = (getc(ifp) << 8) | getc(ifp); /* "big-endian", but WRONG */
or
i = getc(ifp) | (getc(ifp) << 8); /* "little-endian", WRONG */

Both of these seem to depend (and, in fact, would depend, if we
wrote them) on a left-to-right ordering of the actual calls to
getc(), which is *not* guaranteed. (And yes, I know that getc()
is usually implemented as a macro, which means that both of these
last two expressions would in fact expand to big, complicated
expressions without any necessary function calls, but they would
still not have any internal sequence points which would force the
two getc's to happen in a well-defined order.)

* * *

Unless you want to be a language lawyer, you don't have to
memorize the definition of a sequence points or the list of
sequence points which Standard C guarantees, or even use the
words "sequence point" in casual conversation. (As you saw
above, I can't remember the list off the top of my head with
confidence, either, and that's because I'm not usually interested
in being a language lawyer.) What I recommend you do, instead,
is develop a sense of expressions that are "clean" versus
expressions that are "ugly," and shy away from the ugly ones.
("Well, don't do that, then.") Ugly expressions are those that
have multiple assignments in them, or comma operators, or
multiple modifications of the same object, or which are so
complicated and hard to understand that all the king's horses and
all of comp.std.c require at least a week to figure out what
they're guaranteed to do (or not). The cleanest expressions are
those that calculate a single value, and perhaps assign it to a
single location, without caring about the order in which various
sub-operations (such as interior function calls) take place.

In between the cleanest expressions and those that are
unremittingly ugly are several classes of mildly-tricky
expressions which are well-defined and are useful and are
commonly used and which I'm not trying to discourage you from
using. It's easier to present these by example, as exceptions to
my qualifications above of ugly expressions as those containing
comma operators, ordering dependencies, or multiple assignments.

1. It's perfectly okay to depend on precedence, as long as
you understand what it does and doesn't guarantee you.
Precedence tells you that in the expression a + b * c,
the right-hand operand of the * operator is b, not b + c.
If you insist, precedence tells you that in that
expression, "the multiplication happens before the
addition." But precedence does not tell you what order
the function calls would happen in in f() + g() * h(),
and precedence would not give meaning to the expression

i++ + j * i++ /* XXX WRONG */

(In particular, you *cannot* say that since the
multiplication happens "first," the second i++ must
happen before the first.) This last expression is, you
guessed it, unremittingly ugly.

2. You can use the && and || operators to guarantee that
thing B happens after thing A, and not at all if thing A
tells the whole story. Both

n > 0 && sum / n > 0

and

p == NULL || *p == '\0'

are perfectly valid, perfectly acceptable expressions.

3. You can use the comma operator when you have two (or
more) things to do in the first or third controlling
expressions of a for() loop. There are precious few
other realistic opportunities for using comma operators;
chances are, if you find yourself using comma operators
anywhere else, you're doing something tricky and are
skirting the edge of ugliness. (It's no coincidence that
Java, I gather, doesn't allow comma operators anywhere
but the first and third expressions of a for loop.)

4. You can use multiple ++ and -- operators in a single
expression, perhaps in conjunction with an assignment
operator, as long as you're certain that the several
things being modified are all distinct, and that you
aren't ever in a position of using something that you've
already modified. The expressions

a[i++] = 0
and
*p++ = 0

are perfectly acceptable, because the things being
modified, i and a[i] in the first expression and p and
*p in the second, are probably distinct. For the same
reason, a[0] = i++ is okay, too. Expressions like

a[i++] = b[j++]
and
*p++ = *q++

are also okay, because i, j, a[i], and b[j],
and p, q, *p, and *q, are probably all distinct.
Finally, the old standby

i = i + 1

is perfectly acceptable, because although it both uses
and modifies i, it demonstrably uses the old value to
compute the new value.

On the other hand, the expressions

i++ * i++ /* XXX WRONG */
and
i = i++ /* XXX WRONG */

are no good, among other things because they modify i
twice. And

a[i] = i++ /* XXX WRONG */
and
a[i++] = i /* XXX WRONG */

are both wrong, because there's no telling whether i is
used before it's modified, or vice versa. (If you think
that either of these somehow *does* guarantee whether the
plain i on one side means the value of i before or after
the incrementation performed by the i++ on the other,
think again.)

5. Finally, it's perfectly acceptable to write something
like

i = j = 0

even though it contains multiple assignments, because
again, it's an unambiguous idiom and it's clear that two
different variables are being assigned to.

Now, I'm acutely aware that these guidelines (or any like them)
can never be complete, because there are an infinite number of
expressions out there, and some of them are clean and legal even
though my "rules" might seem to disallow them and my exceptions
don't cover them, and others of them are unremittingly ugly even
though my "rules" don't disallow them or my exceptions seem to
allow them. Somehow, successful programmers learn to calibrate
their own inner aesthetic such that the expressions which aren't
"ugly" are the ones that (a) are well-defined, and (b) are
sufficient for writing real programs. I encourage all of you to
nurture such an aesthetic as well, and to learn to write programs
without "ugly" expressions, rather than wasting time trying to
figure out what they do (and whether they're guaranteed to do it).
If you've got an example of a well-defined expression which you
feel is acceptable but which the guidelines I've presented seem
to discourage, or of an ill-defined expression which these
guidelines don't seem to discourage, I encourage you to post it
here or mail it to me; I'd love to expand and refine these
guidelines. (Equally importantly, I'd love to figure out how
to word them so that they're useful and meaningful to someone
who *hasn't* already developed a sense for good vs. ugly
expressions.)

Also, if you're still with me, I'd like to repeat the main moral
of this article, which is that "precedence" is not the same thing
as "order of evaluation." The one certainly has something to do
with the other (sort of like pointers and arrays), but it's a
delicate relationship to describe precisely, so unless you care
to figure out the whole story (and figure out what number I
should have used above instead of 40%), you might actually be
better off if you remember the statement that "Precedence has
nothing to do with order of evaluation." This statement isn't
true, of course, but it's much closer to the truth than
"Precedence has everything to do with order of evaluation," /* XXX WRONG */
which is not only false, but quite misleading.

Finally, for anyone who is still unsure on any of this, I urge
you to read section 3 of the FAQ list, or chapter 3 of the FAQ
list book, because there's more on this in there. (Of course,
it's written by the same mope you're reading now, so if you don't
like something I've written here, I can't guarantee that you'll
find relief in the FAQ list.)

Steve Summit
s...@eskimo.com

Michael Kluev

unread,

Jul 26, 1996, 3:00:00 AM7/26/96

In article <1996Jul25.1...@eskimo.com>, s...@eskimo.com (Steve
Summit) wrote:

> f() + g() * h()
>

>we have multiple parallel processors, those three functions f(),
>g(), and h() are going to be called in some order. What order
>will they be called in?

>I don't know what order they'll be called in, and you don't know
>what order they'll be called in. Precedence doesn't tell us, and
>in fact *nothing* in K&R, the ANSI/ISO C Standard, or Dan
>Streetmentioner's Boffo C Primer Triple Plus will tell us.
>(C: The Complete Reference might tell us, but that's another
>story.) There's absolutely nothing preventing the compiler from
>calling f() first, even though its result will be needed "last."

>In ANSI/ISO Standard C, when you care about the order in which
>things happen, you must take care to ensure that order by using
>"sequence points." Sequence points have nothing to do with
>precedence, and only partly to do with expressions; in my head
>they're up there next to statements. In general, when you want
>to control the order in which two things happen, you make the two
>things separate statements, and you put one statement after the
>other in the order you desire, perhaps using control flow

>in f() + g() * h(); if we needed f() called first, we'd have to
>write
>
> t = f(); t + g() * h()
>or
> t = f(), t + g() * h()
>

>that, then." Break the expression up into separate statements,
>using temporary variables if you have to, and you'll be much
>happier.
>

Many valid points skipped.

That is all right in the terms of current languages' standards.
I do not get one thing: Why there is so fundamental difference
between the statements and expressions? Why the evaluation order of
sub-expressions is undefined while the evaluation order of statements
within the statement sequence is defined? Why such a fundamental
difference between ';' and '+' or ',' symbols?

Why the following is "bad": f() + g() * h()

and this one is "good": t = f(); t + g() * h()

Sure, I know the answer: "this is C standard" or "this is how
language is defined", but I'm not looking for such an answer.
What I am looking for is the answer of the following question:
"Why the language was defined such a way".

If your answer is a form of: "This way compilers could do the better
job of optimising expressions", then remember, that compiler must
optimise not only expressions, but statements also. And they in fact
do optimise statement sequences provided that statements do not have
side effects. E.g. compilers could reorder the order of evaluation of
the:

a = b; c = d;
into c = d; a = b;

if a, b, c, d are simple variables. But they can't do this
if a, b, c, d are functions that have (or might have) side effects.
Why not the same story about expressions? E.g. like that: "The order
of sub-expression evaluations within expression is left to right,
but compiler is free to reorder them if sub-expressions do not have
side effects".

Anyway, that "good" way is too hardcoded into mind of programmers,
so it is too late to change rules.

If (by case) you want to answer this, e-mail me a copy of your
answer.

Michael.

----------------------------------------------------------------
Michael Kluev kl...@macsimum.gamma.ru
Macintosh Programmer Physics Grad, MSU
MACsimum Ltd. Moscow, Russia
----------------------------------------------------------------

Gabor Egressy

unread,

Jul 26, 1996, 3:00:00 AM7/26/96

Thanks, Steve. An excellent, albeit rare, article that is well written and
informing. I wish I had taken you intro C class when I was learning C.
There aren't many instructors out there who really know what they are
talking about.

: There's the misconception again. Order of evaluation does

: *not* "fall completely out of" precedence rules. It's true
: that precedence seems to have something to do with expression
: evaluation; in the expression

: a + b * c

: we frequently say things like, "The multiplication happens before
: the addition, because * has higher precedence than +." But it
: turns out this is a risky thing to say; because it gives the
: listener the impression that precedence dictates order of
: evaluation, which as we'll see it does not. In my classes, I try
: desperately not to use the words "happens before" when explaining
: operator precedence, although I'm afraid I rarely succeed.

[snip for brevity, you should still read the original though.]

: Finally, for anyone who is still unsure on any of this, I urge

: you to read section 3 of the FAQ list, or chapter 3 of the FAQ
: list book, because there's more on this in there. (Of course,
: it's written by the same mope you're reading now, so if you don't
: like something I've written here, I can't guarantee that you'll
: find relief in the FAQ list.)

: Steve Summit
: s...@eskimo.com

---------------------------------------------------------------------
Gabor Egressy gegr...@uoguelph.ca
Guelph, Ontario ga...@snowhite.cis.uoguelph.ca
Canada
---------------------------------------------------------------------

Michael Kluev

unread,

Jul 28, 1996, 3:00:00 AM7/28/96

In article <TANMOY.96J...@qcd.lanl.gov>, tan...@qcd.lanl.gov
(Tanmoy Bhattacharya) wrote:

..
>All that remains to explain now is why one did not just make a blanket
>rules: never reorder expressions (this assumes the standard specify an
>order of course) if they have side-effects. This was felt too rigid:
>partly because it would disallow optimizations in many useful
>cases. Thus, suppose I write
>
>(a[i] + (*p)++) + a[i]
>
>is the compiler allowed to reorder the statement and make it a[i] << 1 +
>(*p)++ (assume a[i] is unsigned int and p is unsigned int*) which may
>well be faster? What if p is the same as &a[i]? Should the compiler
>have to put in an extra check for p!=&a[i] and thereby lose the
>advantage it got by reordering? Or should such reorderings always be
>disallowed?

Personally, I don't see why the same questions coudn't be asked
about the following semi-equivalents:

s = a[i]; s += (*p)++; s += a[i]; // 2*a[i] ? or
s = a; s += b(); s += a; // 2*a ? or just
s = b(); s += b(); // 2*b() ?

But remember, "+" is as good point as ";" for me. I've just took off
"standard language paradigm glasses" :-)

If only compiler (may be with the help of language) could determine
whether or not the expression have side effects in terms of current context...

(BTW, the languages like Pascal that don't have expressions like a++
are more close to the above dream.)

Steve Summit

unread,

Jul 28, 1996, 3:00:00 AM7/28/96

In article <kluev-26079...@kluev.macsimum.gamma.ru>,

kl...@macsimum.gamma.ru (Michael Kluev) writes:
> I do not get one thing: Why there is so fundamental difference
> between the statements and expressions? Why the evaluation order of
> sub-expressions is undefined while the evaluation order of statements
> within the statement sequence is defined?

I always have a hard time with questions like these. Sometimes,
they hardly make sense: after all, the language is what it is,
and we simply have to...

> Sure, I know the answer: "this is C standard" or "this is how
> language is defined", but I'm not looking for such an answer.

Oh. Then I guess I'm not allowed to use that answer, then.

> What I am looking for is the answer of the following question:
> "Why the language was defined such a way".

The usual answer is that it's to avoid unnecessarily
constraining the compiler's ability to...

> If your answer is a form of: "This way compilers could do the better
> job of optimising expressions", then remember, that compiler must
> optimise not only expressions, but statements also.

Oh. So I guess I'm not allowed to use that answer, either.
(Now it seems as if you're really constraining *me*!)
So I'm afraid the only truly factual answer I'm left with is
"I don't know."

I posted a long article a year or two ago exploring some reasons
why C might leave certain aspects of evaluation order undefined.
I'd re-post it now, but it would take me too long to find it in
my magtape archives, and anyway the oxide is starting to flake
off (I'm not sure if it's the fault of the drive or the tapes),
and I'm thinking that the next time I spin those tapes should
really be to transfer them to some new media, which I haven't
selected yet. (If I ever track the article down, I'll try to
remember to mail you a copy.)

Instead, I'll explore a couple of different reasons. Bear in
mind that these are only my own speculations, so they absolutely
will not answer your question. If you simply must know why C
*is* defined the way it is on this point, you'll have to ask
Dennis Ritchie. The speculations I'll give you are some of the
ones which might now lead me to design a language the same way,
but I freely admit that my thinking on language design has been
*very* heavily influenced by C, which I somehow find pretty
congenial and easy to get along with.

A language specification is a huge set of tradeoffs. No, don't
worry, I won't talk about the forbidden tradeoff between the
programmer's freedom of expression and the compiler's license to
optimize. The tradeoff I'm invoking now is one of documentation:
how much time and how many words can we afford to spend defining
the language? As in so many other areas, the law of diminishing
returns sets in here, too. We have to ask ourselves whether an
attempt to nail down some aspect of the language more tightly is
worth it in terms of the number of programs (or programmers)
that absolutely need the extra precision.

A language specification (like any detailed specification of a
complex system) is also staggeringly difficult to write. The
harder you try, the more questions you end up inviting from
devious folk who take your previous round of ever-more-detailed
specifications as a grand puzzle, the challenge being to find
some loophole or ambiguous case or fascinating question which
remains unanswerable in the system as constructed.

Therefore, the standard makes certain simplifications. It makes
these not just to make compilers easier to write, not just to
make the standard easier to write, but to make it easier for the
rest of us to *read*, to wrap our brains around it and understand
it. The more exceptions it contains, or overly complex
explanations of overly complex devices inserted just to placate
the nitpickers and puzzlemongers, the more likely it is to be
unreadable by mere morals, or unimplementable by mere mortals,
and so ultimately to fail.

A largely forgotten aspect of the "Unix philosophy," and an
aspect which like many others is equally responsible for the
design of C as Unix, and an aspect which is responsible both for
the success of the operating system and the language *and* for
the heaping truckloads of acrimonious criticism which both
incessantly receive, is that neither system was ever intended to
satisfy everybody. The designers were shooting for about a 90%
solution, and were unapologetically willing to call that "good
enough." They knew, if they tried to satisfy everybody, that the
first 90% of the requirements would take 90% of their time, and
that 9% of the remaining requirements would take the other 90% of
their time, and that 0.9% of the remaining requirements would
take *another* 90% of their time, and so on [footnote 1].
Discretion being the better part of valor, they decided -- with
remarkable restraint, which I for one could never manage -- to
nip that infinite regression in the bud. And in one of X3J11's
admirable successes at preserving the "spirit of C," that
minimalist attitude largely pervades the C standard, as well.

Returning to the subject of expression evaluation, the
simplification of interest to us here is, in a nutshell, that how
you specify the order that things happen in is with *statements*,
and how you compute values where the order doesn't matter much
is with *expressions*. If you care about the order, you need
separate statements. (There were always a few exceptions to this
simplistic rule, of course; ANSI added a few more). The modern,
more precise statement is that if you need to be sure that side
effects have taken effect, you need to have a sequence point, but
to keep everybody's life simple, there are still relatively few
defined sequence points. If you have a complex expression with
complicated sequencing requirements, you may simply have to break
it up into several statements, and maybe use a temporary
variable. That was true in K&R (I've already quoted the relevant
sentences from section 2.12), and it's true today. It's a simple
rule to state, it's a simple rule to implement, and at least 90%
of expressions (and in fact far more, probably closer to 99.9%)
can in fact be written as single, simple expressions, because
they *are* simple and *don't* have complicated sequencing
requirements.

I honestly believe that the existing rules are good enough, and
that the excruciating discussions which we have about the issue
here from time to time tend to overstate its importance.
I probably break an expression up into two statements to keep
its sequence correct about twice a year [footnote 2]. The vast
majority of the time, you don't have to worry about the order
of multiple side effects (most of the time, because there aren't
multiple side effects), and when you do, I claim (though I
realize how patronizing this sounds to the people I most wish
would think about it) that the expression is probably too
complicated and that it should be probably be simplified if for
no other reason than so that people could understand it, and
incidentally so that it would be well-defined to the compiler.

In closing, let me offer another way of thinking about order of
evaluation, which I came up with a day or two ago and refined
during a brief e-mail exchange with James Robinson. As I
mentioned, compilers tend to build parse trees, and precedence
has some influence on the shape of parse trees which has some
influence on the order of evaluation. What's a good way of
thinking about the ways in which the shape of a parse tree does
and doesn't necessarily affect order of evaluation?

If it weren't for side effects -- assignment operators, ++, --,
function calls, and (for those of you writing device drivers)
fetches from volatile locations -- [footnote 3], the entire
meaning of a parse tree would be the computation of a single
value. Each node computes a new value from the values of its
children, so values percolate up the tree. We have to begin
evaluation at the bottom, of course, and we can therefore say
that there's some ordering of the evaluation from bottom to top,
but we don't care about the relative ordering of parallel
branches of the tree; in fact for all we care they could be
evaluated in parallel. The reason we don't care is precisely
that (for the moment) we are thinking about pure expression
evaluation, without side effects.

If, as I claim here, the primary purpose of a parse tree is to
generate the single value that pops out of the top of it, then a
good way to think about an expression (which is the basis for a
parse tree) is that its primary purpose is to generate a single
value, too. We should think about side effects as in some sense
secondary, because it turns out that the Standard (and, hence,
the compiler) accords them a good deal less respect, at least
with respect to their scheduling. We should imagine that the
compiler goes about the business of evaluating an expression by
working its way through the parse tree, applying no more ordering
constraints than the blatantly obvious one that you can't (in
general) evaluate a node until you've evaluated its children.
We should imagine that, whenever the compiler encounters a node
within the tree mandating a side effect, which would require it
to store some value in some location, it makes a little note to
itself: "I must remember to write that value to that location,
sometime", and then goes back to its true love, which is
evaluating away on that parse tree. Finally, we should imagine
that when the compiler reaches a point which the Standard labels
as a sequence point, the compiler says to itself: "Oh, well, I
guess I can't play all day, now I'll have to get down to business
and attack that `to do' list."

Of course, I'm not saying that you *have* to think about the
situation in this way. But if you're looking for a model which
will let you think about expression evaluation in a way that
matches the Standard's, I think this is a pretty good one.
Even though we only evaluate expressions for their side effects
(that is, the statement
i + 1;
does nothing), the right way to think about expression evaluation
is that we are, after all, *evaluating* an *expression*, or
figuring out what its value is. Its *one* value, singular.
The only ordering dependencies are those which must apply in
order to ensure that we compute the correct value. If there are
any intermediate values that we care about, because we expect
them to be stashed away via side effects, we must not care what
order they occur in. Therefore, if there are multiple side
effects, all of them had better write to different locations.
Also (again because we can't be sure when they'll happen) none of
them had better write to locations which we might later try to
read from within the same expression.

This may have seemed like a roundabout set of explanations, and
I'm sure it's still unsatisfying to those who insist on knowing
*why* C is as it is. To summarize the arguments I've tried to
present here, C specifies expression evaluation as it does
because it's simple and good enough for most purposes while still
allowing (taboo answer alert!) for decently optimized code
generation, and it does not provide more guarantees about
expression evaluation because few expressions would need them.

> If (by case) you want to answer this, e-mail me a copy of your
> answer.

Perhaps someone will.

Steve Summit
s...@eskimo.com

P.S. There was an error in my earlier article in this thread
(<1996Jul25.1...@eskimo.com>); "Suppose we're reading a

two-byte integer from a binary data file in big-endian (least

significant byte first)" should have said "most significant byte
first" in the parentheses.

Footnote 1. Melanie has dubbed this "Zeno's 90/90 rule."

Footnote 2. The number of times I've had to break expressions up
into separate statements decreased by about 2/3 when I realized
that the compiler does *not* have (and has probably never had)
license to rearrange

int i, isc, fac1, fac2;
...
isc = (long)i * fac1 / fac2;

which at one time I wrote as

long tmp = (long)i * fac1;
isc = tmp / fac2;

because I was worried about underflow in case the compiler
evaluated it as (long)i / fac2 * fac1 .

Footnote 3. I apologize for the two entirely different uses
of -- in this sentence.

Dik T. Winter

unread,

Jul 29, 1996, 3:00:00 AM7/29/96

In article <TANMOY.96J...@qcd.lanl.gov> tan...@qcd.lanl.gov (Tanmoy Bhattacharya) writes:
> MK: Personally, I don't see why the same questions coudn't be asked
> MK: about the following semi-equivalents:
> MK:
> MK: s = a[i]; s += (*p)++; s += a[i]; // 2*a[i] ? or
...
> Because a language to be useful has to provide a delicate balance
> between possibility for optimization, and the ability of the user to
> write code segments that should not be optimized. In C (and most other
> usable high level languages) this goal is very neatly accomplished by
> having the concept of `expression' versus statement: expressions are
> to be optimized aggressively, statements are to be optimized
> carefully. C, in addition, stops optimization in certain expressions:
> but, then, every language has idiosyncracies.

I do not agree with the "most other usable high level languages" above,
unless you have another definition than I am using. C is the only
language I know that makes a distinction between "statements" and
"expressions" with respect to optimization (and some usable languages
I know do not even make a distinction between expressions and
statements). The basic Algol like languages (Algol 60, Algol 68 and
Pascal) clearly define an order of evaluation in expressions like
a + b + c
while a language like Fortran does allow some very strong cross-statement
optimizations but restricts quite a number of in-expression optimizations
like:
a + (b + c)
where 'b' *must* be added to 'c' first (this was for floats only a late
addition to the C standard: the "when it makes no difference" rule).
On the other hand, the following Fortran code fragment:
FUNCTION F(A, B, N)
DIMENSION A(N), B(N)
Z = B(1)
A(1) = - A(1)
IF(Z .NE. B(1))
...
which attempts to check whether the actual array parameters A and B are
identical will fail because the compiler *can* assume they are not
identical and *may* assume the test always fails (and some compilers do
so). Moreover, if RANDOM is a function, the statement
Z = RANDOM() * RANDOM()
requires only *one* function call in Fortran while C requires *two*
function calls.

What we find is that different languages allow different kinds of
optimizations and assumptions of the compiler, and they are simply
defined by the language and we have to live with them. Whether making
some kinds of optimizations and assumptions possible is just a matter
of taste. In general it makes a language not more nor less usable.
If you are accustomed to the Pascal-like requirements (as most first-time
users of C are), some of the "undefineds" of C come as a surprise. If
you are accustomed to the Fortran-like requirements (which, I admit, many
Fortran programmers are not), there is nothing surprising in most of the
C undefinedness.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924098
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Nathan Sidwell

unread,

Jul 29, 1996, 3:00:00 AM7/29/96

Michael Kluev (kl...@macsimum.gamma.ru) wrote:
: In article <TANMOY.96J...@qcd.lanl.gov>, tan...@qcd.lanl.gov
: (Tanmoy Bhattacharya) wrote:
: >(a[i] + (*p)++) + a[i]

: >
: >is the compiler allowed to reorder the statement and make it a[i] << 1 +
: >(*p)++ (assume a[i] is unsigned int and p is unsigned int*) which may
: >well be faster? What if p is the same as &a[i]? Should the compiler
: >have to put in an extra check for p!=&a[i] and thereby lose the
: >advantage it got by reordering? Or should such reorderings always be
: >disallowed?

: Personally, I don't see why the same questions coudn't be asked
: about the following semi-equivalents:

: s = a[i]; s += (*p)++; s += a[i]; // 2*a[i] ? or

: s = a; s += b(); s += a; // 2*a ? or just

: s = b(); s += b(); // 2*b() ?

: But remember, "+" is as good point as ";" for me. I've just took off
: "standard language paradigm glasses" :-)

You can ask the same question, and compilers are allowed to reorder such
statements _provided_ they can prove that the semantics of the program
are unchanged (the 'as if' rule).

Now the difference between

s = a[i]; s += (*p)++; s += a[i];

and
s = a[i] + (*p)++) + a[i]

is that for the latter, the compiler can _assume_ that (*p)++ does not
alter a[i], whereas it must _prove_ that to be the case in the former.

: If only compiler (may be with the help of language) could determine
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
you seem to be objecting the the help the language provides!
: whether or not the expression have side effects in terms of current context...

nathan

--
Nathan Sidwell Holder of the Xmris home page
http://www.pact.srf.ac.uk/~nathan/ Tel 0117 9707182
nat...@inmos.co.uk or nat...@bristol.st.com or nat...@pact.srf.ac.uk

Zefram

unread,

Jul 29, 1996, 3:00:00 AM7/29/96

Bradford Chamberlain <br...@rosalyn.cs.washington.edu> wrote:
>From what I've read in K&R, this seems incorrect. Since each
>expression is fully parenthesized, I'd expect one of the "calls" to
>sqr() to evaluate to its double result before starting the other,
>giving the right answer. Is this a bug in my compilers, or my way
>of thinking?

In your way of thinking. Parentheses affect parsing, overriding
precedence; they have no effect on order of evaluation. The only
operators that do affect the order of evaluation are the comma
operator, the logical operators && and ||, and the ternary ? :
operator.

There is no portable way to have a macro do what you want in C. The
closest you can get is a function with internal linkage (declared with
the static keyword), which good compilers will inline. Some compilers
have a keyword that allows you to hint that inlining would be a good
idea; it might be spelled inline, __inline__ or something else, so you
can't rely on it in general, but configuration code such as that
generated by autoconf can, in a reasonably portable manner, give you
access to it if available.

-zefram

H.S. Prasad 4-8398

unread,

Jul 30, 1996, 3:00:00 AM7/30/96

to br...@rosalyn.cs.washington.edu

Brad,

Here is what I got running the same piece of code:

--- program code ---

#include <stdio.h>

#define sqr(x) ((tmp=(x)),(tmp*tmp))

main()
{

int tmp, x, y;

printf("x:%d\n", (x=sqr(2)));
printf("y:%d\n", (y=sqr(3)));

printf("x:%d\n", (x=((sqr(2))+(sqr(3)))));

exit(0);
}

--- end program code ---

--- results ---
x:4
y:9
x:13

--- end results ---

- Prasad

br...@rosalyn.cs.washington.edu (Bradford Chamberlain) wrote:
>
>I'm trying to implement a double-squaring function using a macro using

>the comma operator as follows:
>

> static double tmp;
>
> #define sqr(x) ((tmp=(x)),(tmp*tmp))
>
>
>This works fine when I use it once per statement:
>
> x = sqr(2);
> y = sqr(x);
>
>
>However, if I have multiple non-nested uses in a statement, it doesn't:
>

> x = sqr(2) + sqr(3);
>

>returns 8 or 18, depending on the compiler. Putting a print statement
>between the two expressions in the sqr() macro indicates that in each
>invocation, all the values are being used correctly. Therefore, I can
>only assume that the right-hand expression of each invocation isn't
>being evaluated until the left expression of the other invocation has
>been (giving 2*sqr(2) or 2*sqr(3), depending on the compiler's order
>of evaluation, presumably).
>
>

>From what I've read in K&R, this seems incorrect. Since each
>expression is fully parenthesized, I'd expect one of the "calls" to
>sqr() to evaluate to its double result before starting the other,
>giving the right answer. Is this a bug in my compilers, or my way
>of thinking?
>

>Any help would be appreciated. I'm compiling on a DEC alpha using gcc
>version 2.6.3 and cc from OSF/1.
>
>
>Thanks,
>-Brad

--
/***********************************************************************/
H.S. Prasad Standard Disclaimer
Email: pras...@jpmorgan.com
Phone: (302)634-8398
Fax : (302)634-8563
/***********************************************************************/

Bob Cousins

unread,

Jul 31, 1996, 3:00:00 AM7/31/96

s...@eskimo.com (Steve Summit) wrote:

>In article <kluev-26079...@kluev.macsimum.gamma.ru>,
>kl...@macsimum.gamma.ru (Michael Kluev) writes:
>> I do not get one thing: Why there is so fundamental difference
>> between the statements and expressions? Why the evaluation order of
>> sub-expressions is undefined while the evaluation order of statements
>> within the statement sequence is defined?

>I always have a hard time with questions like these. Sometimes,
>they hardly make sense: after all, the language is what it is,
>and we simply have to...

> [other fine stuff]

I have enjoyed reading your illuminating posts on this Steve. They
prompted me to go and look up langauge specs for Pascal, C, FORTRAN
and Algol 68, as well as a few books on compilers I have. [I think '68
is an example of a language spec that is very well (formally even)
defined, but I just can't make any sense of it!]

Your reasons are very sound practical reasons, but is there a more
fundamental theoretical reason? I got the clue from reading "Compiler
Construction" [1]. Here they say that in the mathematical evaluation
of an expression, side-effects are *not allowed*. i.e. a mathematical
function is always deterministic. Therefore in any language that
allows side-effects in functions, this condition is potentially
violated, which leads to all the difficulty in defining semantics.

It is noted that FORTRAN is closest to the goal of mathemetical
equivalence [hopefully for obvious reasons]. It is also noted that in
the language Euclid, an attempt was made to restrict side-effects by
prohibiting assignments to result parameters and global variables and
use of i/o in functions.

So what we end up with is a trade-off between functional purity (cf
Prolog as an extreme) and procedural convenience and efficiency for
conventional 4GLs.

I may have over-paraphrased Waite and Goos, I would refer you to the
original source.

Regards

[1] M.W. Waite and G. Goos, "Compiler Construction". Springer-Verlag,
New York, 1984. Section 2.3

--
Bob Cousins, Software Engineer.
http://www.demon.co.uk/sirius-cybernetics/

Steve Summit

unread,

Jul 31, 1996, 3:00:00 AM7/31/96

In article <kluev-26079...@kluev.macsimum.gamma.ru>,

kl...@macsimum.gamma.ru (Michael Kluev) wrote:
> I do not get one thing: Why there is so fundamental difference
> between the statements and expressions? Why the evaluation order of
> sub-expressions is undefined while the evaluation order of statements
> within the statement sequence is defined?

And in article <Dv9v5...@eskimo.com>, I wrote:
> I posted a long article a year or two ago exploring some reasons
> why C might leave certain aspects of evaluation order undefined.
> I'd re-post it now, but it would take me too long to find it in

> my magtape archives...

It turned out I had it on-line, after all, which I was reminded
of when someone forwarded me a copy, thinking it was the one I
was desperately seeking the other day. I append it below. Of
its four answers, the one I was most thinking of in response to
Michael's question is its fourth one (in particular, the three
paragraphs beginning with "When people talk about why" and ending
with "so handle them, and we're done"), although since that
answer talks about optimizing compilers I suppose it is taboo
under Michael's requirements.

* * *

Newsgroups: comp.lang.c
From: s...@eskimo.com (Steve Summit)
Subject: Re: Quick C test - whats the correct answer???
Message-ID: <D5swG...@eskimo.com>
Summary: why is undefined behavior undefined?
References: <fjm.58....@ti.com> <D5JLG...@tigadmin.ml.com>
Date: Tue, 21 Mar 1995 17:26:59 GMT

In article <D5JLG...@tigadmin.ml.com>, Jim Frohnhofer
(ji...@nottingham.ml.com) writes:
> In article 000B...@ti.com, f...@ti.com (Fred J. McCall) writes:
>> That's right, and that's why you should listen to him (and to me, and to
>> everyone else telling you this). The REASON it is undefined is BECAUSE THE
>> STANDARD SAYS SO.
>
> I may regret this, but I'll jump in anyway. I accept that the behaviour
> of such a construct is undefined simply because the Standard says so. But
> isn't it legitimate for me to ask why the Standard leaves it undefined?

It is, but you should probably be careful how you ask it.
If you're not careful (you were careful, but most people aren't),
it ends up sounding like you don't believe that something is
undefined, or that you believe -- and your compiler happens
to agree -- that it has a sensible meaning. But as long as
we're very clear in our heads that we're asking the abstract,
intellectual question "Why are these things undefined?", and not
anything at all practical like "How will these things behave in
practice?", we may proceed.

The first few answers I give won't be the ones you're looking
for, but towards the end I may present one which you'll find more
satisfying.

First of all, you might want to take a look at who's saying what.
I'll grant that statements such as Fred's above can be annoying.
I agree completely that it's usually very good to know the
reasons behind things. But if the people who have been posting
to comp.lang.c the longest, and who have been programming in C
for the longest and with the most success, keep saying "it's
undefined, it's undefined, quit worrying about why, just don't do
it, it's undefined", they might have a good reason, even if they
don't -- or can't -- say what is, and that might be a good enough
reason for you, too.

I've been programming in C for, oh, 15 years now. For at least
10 of those years, I've been regarded as somewhat of an expert.
For going on 5 of those years, I've been maintaining the
comp.lang.c FAQ list. I am someone who is usually incapable of
learning abstract facts unless I understand the reasons behind
them. When I used to study for math tests, I wouldn't memorize
formulas, I'd just make sure that I could rederive them if I
needed them. Yet for most of the 15 years I've been programming
in C, I simply could not have told you why i=i++ is undefined.
It's an ugly expression; it's meaningless to me; I don't know
what it does; I don't want to know what it does; I'd never write
it; I don't understand why anyone would write it; it's a mystery
to me why anyone cares what it does; if I ever encountered it in
code I was maintaining, I'd rewrite it. When I was learning C
from K&R1 in 1980 or whenever it was, one of their nice little
sentences, which they only say once, and which if you miss you're
sunk, leaped up off the page and wrapped itself around my brain
and has never let go:

Naturally, it is necessary to know what things to avoid,
but if you don't know *how* they are done on various
machines, that innocence may help to protect you.

As it happens, it's possible to read K&R *too* carefully here:
the discussion at the end of chapter 2 about a[i] = i++ suggests
that it's implementation-defined, and the word "unspecified"
appears in K&R2, while ANSI says that the behavior is undefined.
The sentence I've quoted above suggests not knowing how things
are done on various machines, while in fact what we really want
to know is that maybe they *can't* be done on various machines.
Nevertheless, the message -- that a bit of innocence may do you
good -- is in this case a good one.

That's my first answer. I realize that it's wholly unsatisfying
to anyone who's still interested in this question. On to answer
number two.

> As far as I know, it was created by a committee not brought down by Moses
> from the mountain top. If I want to become a better C programmer, won't
> it help me to know why the committee acted as it did.

Perhaps, but again, only if you're very careful.

Let's say you're wondering why i++ * i++ is undefined. Someone
explains that it's because no one felt like defining which order
the side effects happen in. That's a nice answer: it's a bit
incomplete and perhaps a bit incorrect, but it's certainly easy
to understand, and since you insisted on an answer you could
understand (as opposed to something inscrutable like "it's just
undefined; don't do it"), it's the kind of answer you're likely
to get.

So next week, you find yourself writing something like

i = 7;
printf("%d\n", i++ * i++);

and you say to yourself, "Whoah, that might be undefined.
How did that explanation go again? It's undefined... because
nobody felt like saying... which order the side effects happened
in. So either the first i gets incremented first, and it's 7
times 8 is 56, or the second i gets incremented first, and it's
8 times 7 is 56. Hey! It may be undefined, but I get a
predictable answer, so I can use the code after all!"

So in this case, knowing a reason why has not made you a better
programmer, it has made you a worse programmer, because some day
when you're not around to defend it (or when you are around, but
you don't have time to debug it), that code is going to print 49,
or maybe 42, or maybe "Floating point exception -- core dumped".

If, on the other hand, you didn't know why it was undefined,
just that it was undefined, you would have instead written

printf("%d\n", i * (i+1));
i += 2;

(or whatever it was that you meant), and your code would have
been portable, robust, and well-defined.

Now, some of you may be thinking that

printf("%d\n", i++ * i++);

is a ridiculous example which no one would ever write. That may
be true, but it's the example I use in the FAQ list (and it's the
oldest of the undefined-evaluation-order questions in the FAQ
list) because it illustrates, I hope better than a "real" example
would, the kind of contortions that people *do* get into when
they start thinking too hard (but not too carefully) about
undefined expressions. There was an actual question posted to
comp.lang.c years ago (which I could probably find in my archives
if I looked hard enough) in which the poster used essentially the
same argument: "the evaluation order may be undefined, but no
matter which order the compiler picks, I'll get the answer I
want", and that was the inspiration for the i++ * i++ question
(4.2 on the current list). (Remember, it's not the evaluation
order that's undefined, it's the entire expression that's
undefined.)

Two paragraphs back, I suggested that the programmer who did not
know why something was undefined might have been better off than
the programmer who did. I am not stating (not in the current
climate, anyway) that you should not know why things are
undefined. But if you insist on knowing, you are going to have
to get the full story, not just some simplistic justifications.
And you are going to have to be excruciatingly careful that you
don't use your knowledge of why something is undefined to try to
second-guess how a certain undefined construct (which you've
decided you simply have to use in your program) is going to
behave. Undefined behavior is slippery stuff: it really, truly
is undefined; it really, truly can do anything at all; yet some
people (particularly those who are always clamoring to know *why*
things are undefined) are always trying to salvage some remnants
of definedness out of an undefined situation, and are always
getting themselves in trouble, and end up convincing the people
who have to come along and pick up the mess that yes, it really
was a mistake trying to explain why it was undefined, and we
probably should have just said it was undefined because we said
it was, after all.

This has been answer number two, and I realize it's still not
satisfying, because I'm still suggesting that maybe you don't
want to know why.

For answer number three, I'll quit beating around the bush and
try to explain why, although I have to admit that, for me, the
answers from here on are going to get less satisfying, and less
nicely worded, because we get down into some realms that I don't
usually think about (because I really have taken to heart K&R's
advice about maintaining some innocence).

An international Standard like X3.159/ISO-9899 is a tremendously
difficult document to write, even for a relatively simple
programming language like C. When a Standard says something,
it must say it definitively and unambiguously. (When it is
inaccurate, it must be definitively inaccurate, and when it
contains any areas of doubt or uncertainty, they must be rigidly
defined areas of doubt or uncertainty.) The Standard must
withstand intense scrutiny, for many years, from all sorts of
observers, including language lawyers, professional nitpickers,
grudging ex-users of other languages, semicompetent implementors
of spiffola new compilers for scruffola old computers, and
xenophobic adherents of other sociopolitical systems. (If you
think that abstract constructions like programming languages are
inherently removed from sociopolitical concerns, you haven't
followed the crafting of any international Standards.)

Since it's so hard to specify things precisely enough for a
Standard like this, the wise Standard-writer (especially for a
Standard that hews to existing practice) won't specify anything
more than is necessary. Features that everyone will use (or that
someone might be reasonably expected to use) must be specified
precisely, but features that no sane person could be expected to
use in 6,000 years might be relegated to the dustbin, instead.
(Naturally, since we're being precise, we'll have to precisely
define the boundaries of the parts we've decided to be imprecise
about; the paraphrase above of Vroomfondel's demand from The
Hitchhiker's Guide to the Galaxy is not facetious.)

In practice, you can only tell what a computer program has done
when it causes a side effect. In a nutshell, then, what a
Standard for a programming language does is to define the mapping
between the programs you write and the side effects they produce.
Consequently, we are very concerned with side effects: how
they're expressed in our programs, and how the Standard defines
them to behave.

The fragment

int i;
i = 7;
printf("%d\n", i);

contains two obvious side effects, and it is blatantly obvious
what their effect should be, and what the Standard says agrees
with what you think they should do.

The fragment

int a[10] = {0}, i = 1;
i = i++;
a[i] = i++;
printf("%d %d %d %d\n", i, a[1], a[2], a[3]);

contains a few more side effects, but (I assert) it is *not*
obvious how they should behave. What does the second line do?
In the third line, do we decide which cell of a[] we assign to
before or after we increment i again? Plenty of people can
probably come up with plenty of opinions of how this fragment
ought to work, but they probably won't all agree with each other,
as the situation is not nearly so clear-cut.

Furthermore, a Standard obviously cannot talk about individual
program fragments like i = i++ and a[i] = i++, because there are
an infinite number of those. It must make general prescriptions
about how the various primitive elements of the language behave,
which we then use in combination to determine how real programs
behave. If you think you know what a[i] = i++ should do, you
can't just say that; instead you have to come up with a general
statement which says how *all* expressions which modify and then
use the same object should act, and you have to convince yourself
that this rule is appropriate not only for the a[i] = i++ case
you thought about but also for all the other expressions
(infinitely many of them, remember) that you did *not* think
about, *and* you have to convince a bunch of other people of your
conviction.

The people writing the C Standard decided that they could not do
that. Instead, they decided that expressions such as a[i] = i++
and i++ * i++ and i = i++ would be undefined. They decided this
because these expressions are too hard to define and too stupid
to be worth defining. They came up with some definitions (no
small feat in itself) of which expressions this undefinedness
applies to. You've seen these definitions; they're the ones
that say

Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior
value shall be accessed only to determine the value to
be stored.

Now, I'll grant that these definitions are at least as hard to
understand as expressions such as a[i] = i++. But there are
plenty of more complicated expressions which we can't begin to
make sense of (or even decide if they make any sense) without
these definitions. Perhaps I'll say more about them later, but
for the moment we're still trying to answer the question of why
some things (such as the expressions not defined by the two
definitions quoted above) are undefined, and my opinion is, as I
stated above: because they're too hard to define, and too stupid
to be worth defining. (If you're still counting, call this
answer number three.)

Now, you may think that I'm being overly pessimistic. You may
think that it's easy to define what a[i] = i++ or i++ * i++ or
i = i++ should do. Perhaps you think they should be evaluated
in exactly the order suggested by precedence and associativity.
Perhaps you think that i++ should increment i immediately after
giving up i's value and before evaluating any of the rest of the
expression. (These rules would say that the incremented value of
i is used to decide which cell of a[] to assign to, and that the
left-hand i++ in i++ * i++ happens first, and that i = i++ ends
up -- here's a surprise -- being exactly equivalent to i = i,
unless perhaps if i is volatile.) But -- and you're going to
have to take my word for it here, because I'm taking other
people's word for it -- these hypothetical "well-defined"
expression rules, though they're easy enough to state and
probably precise enough and probably comprehensive for the
cases we're interested in, would result in significantly poorer
performance than pre-ANSI C traditionally had and that we're
used to. Optimizing compilers would have to generate code which
evaluated expressions in little itsy bitsy steps, in lockstep
with the precedence- and associativity-based parse. They would
not be able to rearrange parts of the expression to make the best
use of the target machine's instruction repertoire or available
registers or pipelining or parallelization or whatnot.

When people talk about why it's good that an expression like
i++ * i++ is undefined, they usually speak of modern, parallel
machines which might get really confused if they try to do, not
the left-hand i++ first or the right-hand one first, but instead
both at once somehow. Instead of that example, let's imagine a
very simple CPU, analogous to a four-function calculator with
some memory registers. Let's play compiler, and imagine what
buttons on the calculator we'll push, operating under the
"well-defined" rules of the previous paragraph.

Since the (hypothetical) "well-defined" rules tell us exactly
which order to evaluate the expression in, our task is
straightforward. First, the left-hand side of the multiplication
operator: the value we want to multiply is i's previous value.
Whoops, before we can think about doing the multiplication, the
rules say we have to do the increment. Whoops, once we do the
increment, we'll have lost the previous value. So we recall
from i, store the value in a temporary register, add one to to
the value, and store it back in i. Now we can recall from the
temporary, and do the multiplication... no. Because first we
have to do the same thing on the right-hand side: recall from i,
store it in a second temporary, add one to it, and store it back
in i. Now, finally, we can recall the two values from the two
temporaries and do the multiplication.

If we remove the (still hypothetical, just for the purposes of
this article, not part of Standard C) "well-defined" rules, and
revert to ANSI's rules, under which we only have to make sure
that the increments to i happen sometime before the next sequence
point, look how much easier things become: Recall from i. Make a
note to increment its value later. Multiply it by: recall from i
again, make a note to increment it later. Now we've got the
product, do whatever we have to with it. We've still got these
two notes to increment i, so handle them, and we're done.

I'd never actually gone through the analysis in this level of
detail until just now, but this is exactly how the example

int i = 7;
printf("%d\n", i++ * i++);

from question 4.2 of the FAQ list could print 49 instead of 56.
(No, I'm not going to come up with a scenario under which it
could print 42.)

So, if you're still with me (and after 300 lines, I can see how
you might not be, but if you're one of the people who's been
insisting on a reason why, you better be :-) ), here is my
fourth, last answer: these things are undefined because if you
made them defined, compilers would have to be paranoid and
generate lockstep code which would be bulkier or slower or both.

Having come this far, I have to repeat that this last answer
isn't one I'm particularly pleased with, partly because I'm not a
code generation expert, and partly because I don't usually worry
about efficiency very much. (On the other hand, it doesn't
matter whether I'm pleased with it, or whether you're pleased
with it either; it *is* one of the reasons.) You're probably
also still harboring some doubts; you may be thinking that a
compiler would only have to use the slow, bulky, lockstep code
for expressions with multiple side effects, which many people
(even some of the Doubting Thomases) agree that we shouldn't be
writing anyway. Perhaps you're right. Perhaps we could have our
cake and eat it too; perhaps we could have blazingly efficient
code generated for polite little expressions and still have
defined behavior for rogue ones. Perhaps we could, but not
under the current Standard: it still *does* say that the rogue
expressions are undefined. But the C Standard is under revision:
perhaps, if this is important enough to you, you can convince the
committee to pronounce defined behavior upon the rogue
expressions, too. Good luck.

Finally, if you've read this far, do me a favor. I've spent some
time writing this up, not because I had nothing better to do
today, but because I do like to try to come up with ways of
answering these questions. Please (all of you, not just Jim) let
me know if this explanation worked for you, or not; or if some of
it worked and some didn't, which bits did and which bits didn't.
Thanks.

Steve Summit
s...@eskimo.com

William Clodius

unread,

Jul 31, 1996, 3:00:00 AM7/31/96

Even when Waite and Goos wrote their book, 1984, there were languages
that were closer than Fortran to being side effect free. Prologue, and
Backus's FP come to mind, and I believe the first implementation of
Sisal was released about that time. Nowadays there are a number of
languages, Clean and Haskell come to mind, that have extremely limited
side effects, no assignment of any kind, or single assignment, special
semantics for I/O, etc. Surprisingly, code in Clean and Sisal can be
simpler than that of most procedural languages, and very similar in
efficiency.

Note Fortran does not forbid side effects in functions, but the use of
side effects results in undefined behavior under more circumstances
than most people, even very knowledgeable ones, believe. Few, if any,
compilers exercise the full freedom allowed by the standard except,
sometimes, at optimization levels marked as unsafe. Such unsafe levels
typically also involve non-standard conforming optimizations. For
example, not respecting statement order in function call order is not
forbidden by the standard in many situations, but very few compilers
exercise that option for other than the intrinsic functions.
--

William B. Clodius Phone: (505)-665-9370
Los Alamos National Laboratory Email: wclo...@lanl.gov
Los Alamos, NM 87545

Zefram

unread,

Aug 6, 1996, 3:00:00 AM8/6/96

Tanmoy Bhattacharya <tan...@qcd.lanl.gov> wrote:
>Possibly. I do not know Pascal well enough when it comes to what the
>standard requires for functions with side-effects.

As I understand it, in the Pascal expression "a and b", b may or may
not be evaluated if its value is not required to determine the value of
the whole expression. Consequently, if b is an expression containing a
call of a function with side effects, those side effects may or may not
occur.

This allows the greatest possible freedom to the implementor for
optimisation. In the case of the logical operators, it is also the
worst possible semantic from the user's point of view: one isn't
guaranteed that side effects will occur, nor that they won't, and it's
not safe as it is in C to use an expression that can only be evaluated
safely depending on the left-hand operand.

>On the other hand a language which allows arbitrary interstatement
>rearrangements is clearly useless. If I calculate a value and print it
>out, I do want to be assured that the calculation precedes the
>output.

Obviously what we really need is a language in which one can explicitly
indicate which statements have sequence points between them. (1/2 :-))

>DTW: If you are accustomed to the Pascal-like requirements (as most first-time
>DTW: users of C are), some of the "undefineds" of C come as a surprise. If

When I learned C, from knowing Pascal, I found the *definedness* of
certain language features refreshing, particularly the logical
operators. Multiple side effects of a single non-compound statement,
other than by function call, are impossible in Pascal, so that being
undefined in C was really not surprising. (The sensible number of
precedence levels was also a pleasant change, but that's a separate
issue.)

-zefram

Chris Engebretson

unread,

Aug 6, 1996, 3:00:00 AM8/6/96

In article <1996Aug6.1...@dcs.warwick.ac.uk>, A.M...@dcs.warwick.ac.uk (Zefram) writes:

|> As I understand it, in the Pascal expression "a and b", b may or may
|> not be evaluated if its value is not required to determine the value of
|> the whole expression. Consequently, if b is an expression containing a
|> call of a function with side effects, those side effects may or may not
|> occur.

AFAIK this is correct, although I'm by no means a reliable source. :-)
One of the consequences is that a statement such as

if ((i > UPPER_ARRAY_BOUND) || (array[i] == SOME_VALUE))
{ /* ... */ }

is not "safe" in Pascal, because you're not guaranteed such things as
short-circuited left-to-right evaluation. Since Pascal may very well go
ahead and evaluate _both_ sides of the "or", the right expression introduces
a "range check error" if the left-hand expression is true.

As in most respects, C outdistances Pascal in this way; it allows you to say
what you mean, instead of forcing you to write it less clearly than it
probably ought to be.

--
/*-------------------------------------------------------------
Chris Engebretson // CSB - SED - Scientific Systems Development
United States Geological Survey - National Mapping Division
Mundt Federal Building, USGS EROS Data Center
Sioux Falls, SD 57198 http://edcwww.cr.usgs.gov/eros-home.html
email: enge...@edcserver1.cr.usgs.gov phone: 605-594-6829
-------------------------------------------------------------*/

@#$%!?!

unread,

Aug 7, 1996, 3:00:00 AM8/7/96

: worst possible semantic from the user's point of view: one isn't

Only because pascal doesn't have conditional expressions.

: not safe as it is in C to use an expression that can only be evaluated

: safely depending on the left-hand operand.

Excessively constrained since you can control order of evaluation with
a conditional expression, and it makes optimising booleans for riscs
very difficult.

Zefram

unread,

Aug 15, 1996, 3:00:00 AM8/15/96

@#$%!?! <smr...@netcom.com> wrote:
>Only because pascal doesn't have conditional expressions.

Exactly my point.

>Excessively constrained since you can control order of evaluation with
>a conditional expression,

That's the point of the operators in question. It's a very useful ability.

> and it makes optimising booleans for riscs
>very difficult.

Not really. In a case where you don't need the short-circuiting[1],
such as "a < b || a < c", that is easily detected by the compiler,
which is then free to evaluate both subexpressions -- simultaneously if
the capability exists -- because it makes no difference to the
program.

[1] In that example it *can* be needed, but only if c might have an
undefined value that would cause a trap if evaluated. On most
architectures, where overflow, underflow and so on don't trap, the
class of boolean operations that don't actually require short-circuited
evaluation is very large.

-zefram
--
Andrew Main <zef...@dcs.warwick.ac.uk>

0 new messages