**argv -v- *argv[]

Warren Dale

unread,

May 15, 2001, 11:52:35 PM5/15/01

to

I have been programming in C for over ten years.
But I am still confused between "char **argv" and "char *argv[]"

Please:
* are they interchangeable?
* what is the real difference?
* is either right or wrong?
* which is preferred?
* does it really matter?

Warren.

Greg Comeau

unread,

May 16, 2001, 12:02:48 AM5/16/01

to

In article <3b01f6ce....@news.ozemail.com.au>,

Warren Dale <wd...@ozemail.com.au> wrote:
>I have been programming in C for over ten years.
>But I am still confused between "char **argv" and "char *argv[]"
>
>Please:
>* are they interchangeable?

If we're talking about argv as in the second argument to main,
then yes, they are interchaneable. In other contexts they
are not interchangeable.

>* what is the real difference?

There is none. The first dimension of array parameters
collapse into pointers, so the compiler is going to internally
rewrite char *argv[] as char **argv anyway.

>* is either right or wrong?

Neither is wrong.

>* which is preferred?

I think that you're going to get into a six of one vs
1/2 dozen of the other on this kind of thing.

>* does it really matter?

Not to the compiler. Again, we're just talking about this
one context of the 2nd arg to main.
--
Greg Comeau Comeau C/C++ 4.2.45.2
ONLINE COMPILER ==> http://www.comeaucomputing.com/tryitout
NEW: Try out libcomo! NEW: Try out our C99 mode!
com...@comeaucomputing.com http://www.comeaucomputing.com

Warren Dale

unread,

May 16, 2001, 12:27:17 AM5/16/01

to

On 16 May 2001 00:02:48 -0400, com...@panix.com (Greg Comeau) wrote:

>In article <3b01f6ce....@news.ozemail.com.au>,
>Warren Dale <wd...@ozemail.com.au> wrote:
>>I have been programming in C for over ten years.
>>But I am still confused between "char **argv" and "char *argv[]"
>>
>>Please:
>>* are they interchangeable?
>
>If we're talking about argv as in the second argument to main,
>then yes, they are interchaneable. In other contexts they
>are not interchangeable.
>
>>* what is the real difference?
>
>There is none. The first dimension of array parameters
>collapse into pointers, so the compiler is going to internally
>rewrite char *argv[] as char **argv anyway.
>
>>* is either right or wrong?
>
>Neither is wrong.
>
>>* which is preferred?
>
>I think that you're going to get into a six of one vs
>1/2 dozen of the other on this kind of thing.
>
>>* does it really matter?
>
>Not to the compiler. Again, we're just talking about this
>one context of the 2nd arg to main.

Thank you very much for your reply.

You write: "If we're talking about argv as in the second argument to
main, then yes, they are interchangeable. In other contexts they are not
interchangeable." -and- "Again we're just talking about this one context
of the 2nd arg to main." I understand.

Please would you elaborate in relation to the 'general' case? For data
specifications and for function specifications.

Thanks,
Warren.

Nick Maclaren

unread,

May 16, 2001, 4:20:33 AM5/16/01

to

In article <9dsu58$8k4$1...@panix3.panix.com>,

Greg Comeau <com...@comeaucomputing.com> wrote:
>In article <3b01f6ce....@news.ozemail.com.au>,
>Warren Dale <wd...@ozemail.com.au> wrote:
>>I have been programming in C for over ten years.
>>But I am still confused between "char **argv" and "char *argv[]"

With good reason. I tried to get this clarified in both C90 and
C99, and failed dismally both times.

>>Please:
>>* are they interchangeable?
>
>If we're talking about argv as in the second argument to main,
>then yes, they are interchaneable. In other contexts they
>are not interchangeable.

Um. Not quite. When used in precisely that way, which is the one
standard-specified way of providing arguments to main, they are
interchangeable. But those reservations are necessary.

>>* what is the real difference?
>
>There is none. The first dimension of array parameters
>collapse into pointers, so the compiler is going to internally
>rewrite char *argv[] as char **argv anyway.

That is definitely misleading. C90 is seriously ambiguous on the
matter, and relies on the "but everyone knows" clause. C99 has
added an explanatory footnote, which does not completely eliminate
the ambiguity. For example, it is unclear whether the following
declarations are all valid and compatible:

extern int main (int argc, char **argv);
extern int main (int argc, char *argv[]);
extern int main (int argc, char *argv[5]);
extern int main (int argc, char *argv[0]);
extern int main (int argc, char *argv[0/0]);

These have been debated at length, on the reflector and here, and
most people have taken up positions, but I cannot remember ever
having seen a clear statement that is backed up by unambiguous
wording from the standard.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679

James Kuyper Jr.

unread,

May 16, 2001, 9:41:20 AM5/16/01

to

Nick Maclaren wrote:
>
> In article <9dsu58$8k4$1...@panix3.panix.com>,
> Greg Comeau <com...@comeaucomputing.com> wrote:

...

> >There is none. The first dimension of array parameters
> >collapse into pointers, so the compiler is going to internally
> >rewrite char *argv[] as char **argv anyway.
>
> That is definitely misleading. C90 is seriously ambiguous on the
> matter, and relies on the "but everyone knows" clause. C99 has
> added an explanatory footnote, which does not completely eliminate
> the ambiguity. For example, it is unclear whether the following
> declarations are all valid and compatible:
>
> extern int main (int argc, char **argv);
> extern int main (int argc, char *argv[]);
> extern int main (int argc, char *argv[5]);

...

Those first three would all be equivalent declarations for ordinary
functions. The footnote you refer to is on a normative sentence saying
that equivalent declarations for main() are allowed. What's the issue
that makes them unclear?

Greg Comeau

unread,

May 16, 2001, 9:43:17 AM5/16/01

to

In article <9dtd8h$1m1$1...@pegasus.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>In article <9dsu58$8k4$1...@panix3.panix.com>,
>Greg Comeau <com...@comeaucomputing.com> wrote:
>>In article <3b01f6ce....@news.ozemail.com.au>,
>>Warren Dale <wd...@ozemail.com.au> wrote:
>>>I have been programming in C for over ten years.
>>>But I am still confused between "char **argv" and "char *argv[]"
>
>With good reason. I tried to get this clarified in both C90 and
>C99, and failed dismally both times.
>
>>>Please:
>>>* are they interchangeable?
>>
>>If we're talking about argv as in the second argument to main,
>>then yes, they are interchaneable. In other contexts they
>>are not interchangeable.
>
>Um. Not quite. When used in precisely that way, which is the one
>standard-specified way of providing arguments to main, they are
>interchangeable. But those reservations are necessary.

Then it is quite. That's the qestion he asked.

>>>* what is the real difference?
>>
>>There is none. The first dimension of array parameters
>>collapse into pointers, so the compiler is going to internally
>>rewrite char *argv[] as char **argv anyway.
>
>That is definitely misleading. C90 is seriously ambiguous on the
>matter, and relies on the "but everyone knows" clause. C99 has
>added an explanatory footnote, which does not completely eliminate
>the ambiguity. For example, it is unclear whether the following
>declarations are all valid and compatible:
>
> extern int main (int argc, char **argv);
> extern int main (int argc, char *argv[]);
> extern int main (int argc, char *argv[5]);
> extern int main (int argc, char *argv[0]);
> extern int main (int argc, char *argv[0/0]);
>
>These have been debated at length, on the reflector and here, and
>most people have taken up positions, but I cannot remember ever
>having seen a clear statement that is backed up by unambiguous
>wording from the standard.

I can see possible suspicion about [0] and [0/0], but I don't
see how [5] is debatable. Anyway, I for some reason never
saw the debate. Is the debate about main specifically or
for any function parameter that's an array?

Jun Woong

unread,

May 16, 2001, 10:24:22 AM5/16/01

to

In article <9dtd8h$1m1$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren says...

>
>In article <9dsu58$8k4$1...@panix3.panix.com>,
>Greg Comeau <com...@comeaucomputing.com> wrote:
>>In article <3b01f6ce....@news.ozemail.com.au>,
>>Warren Dale <wd...@ozemail.com.au> wrote:
>>>I have been programming in C for over ten years.
>>>But I am still confused between "char **argv" and "char *argv[]"
>
>With good reason. I tried to get this clarified in both C90 and
>C99, and failed dismally both times.
>
>>>Please:
>>>* are they interchangeable?
>>
>>If we're talking about argv as in the second argument to main,
>>then yes, they are interchaneable. In other contexts they
>>are not interchangeable.
>
>Um. Not quite. When used in precisely that way, which is the one
>standard-specified way of providing arguments to main, they are
>interchangeable. But those reservations are necessary.
>
>>>* what is the real difference?
>>
>>There is none. The first dimension of array parameters
>>collapse into pointers, so the compiler is going to internally
>>rewrite char *argv[] as char **argv anyway.
>
>That is definitely misleading. C90 is seriously ambiguous on the
>matter, and relies on the "but everyone knows" clause.

The Committee supplemented the C90's problem with a DR.

>C99 has
>added an explanatory footnote, which does not completely eliminate
>the ambiguity. For example, it is unclear whether the following
>declarations are all valid and compatible:
>
> extern int main (int argc, char **argv);
> extern int main (int argc, char *argv[]);
> extern int main (int argc, char *argv[5]);

These three declarations are equivalent.

> extern int main (int argc, char *argv[0]);
> extern int main (int argc, char *argv[0/0]);
>

some_type argv[0]; or
some_type argv[0/0];

argv has not-strictly-conforming array type. Thus, these two
declaration result in undefined behavior, I think. In a relevant DR
(of C90), the Committee says:
"there is nothing to suggest that a not-strictly-conforming array type
can magically be transformed into a strictly conforming pointer
parameter via this rule"
("this rule" here refers the rule to rewrite array type parameters)

And the Standard (at least C90) does not define the time when
converting array type to pointer type (adjusting parameter types)
takes place.

[...]

--
Jun Woong (myco...@hanmail.net)
Dept. of Physics, Univ. of Seoul

Nick Maclaren

unread,

May 16, 2001, 10:46:37 AM5/16/01

to

In article <9du05l$f01$1...@panix3.panix.com>,

I was nitpicking about your first sentence. Remember that
implementations may define other interfaces to main - and it is
quite likely that some of them have a second argument argv.
Neither he nor you said that you were talking SOLELY about the
standard-specified argv - you both were probably assuming it,
but you didn't SAY so.

This IS comp.std.c and therefore a newsgroup where relevant
pedantry and nitpicking is acceptable :-)

|> > extern int main (int argc, char **argv);
|> > extern int main (int argc, char *argv[]);
|> > extern int main (int argc, char *argv[5]);
|> > extern int main (int argc, char *argv[0]);
|> > extern int main (int argc, char *argv[0/0]);
|> >
|> >These have been debated at length, on the reflector and here, and
|> >most people have taken up positions, but I cannot remember ever
|> >having seen a clear statement that is backed up by unambiguous
|> >wording from the standard.
|>
|> I can see possible suspicion about [0] and [0/0], but I don't
|> see how [5] is debatable. Anyway, I for some reason never
|> saw the debate. Is the debate about main specifically or
|> for any function parameter that's an array?

Any function. The reason that [5] is debateable is that both
parameter adjustment and type matching are translation phase 7
activities, and there is no statement anywhere in the standard
that says in which order they occur. There are at least three
possibilities that make sense - "but everyone knows" which one
is used. Let us add another few nasties:

extern int main (int argc, char *argv[fred]); /* fred not defined */
extern int main (int argc, char *argv[++++++]);
extern int main (int argc, char *argv[$]);

Translation phase 7 says "The resulting tokens are syntactically
and semantically analyzed and translated as a translation unit.
That is a fat lot of help. In fact, "but everyone knows" that
it is done in textually sequential order. Well, except for C99
inline functions, of course. And the resolution of certain
linkage problems and the detection of related constraints, but
they can be done in the single pass, provided that there is an
extra sub-pass of detection at the end.

However, NOWHERE does the standard say this, and a bit of further
thought indicates that things are not so simple. "Everyone knows"
that syntactic and type errors within the operand of sizeof are
faults, but evaluation errors are not. Which is NOT what you would
get from the single pass model. So you might be excused for using
the phase model for the individual analysis operations. If you try
that, you get:

Parsing, textually serialised at the end of declarators.
Name definition, textually serialised at the end of declarators.
[ I am not 100% sure whether the standard states the ordering
of the above two, or just implies it. ]
Constant expression calculation (in the strict sense).
Parameter adjustment.
Type matching and analysis, including sizeof evaluation.
Optional (i.e. not constant) expression optimisation..
Code generation.

Most of the above can be deduced from the standard, though little
is stated, but the location of parameter adjustment cannot. It
is, however, somewhat clarified by (informative) footnote 84 in
C99. It STILL doesn't deal with cases like sizeof(*(char[][])),
which can relate to the parameter adjustment issue in sufficiently
contorted code.

Greg Comeau

unread,

May 16, 2001, 11:17:32 AM5/16/01

to

In article <9du3sd$q7f$1...@pegasus.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>In article <9du05l$f01$1...@panix3.panix.com>,
>com...@panix.com (Greg Comeau) writes:

I didn't know this problem, so I'll take your words on it for now then.
However, you said that you brought this to the C committe twice.
Did they not agree there was a problem too? Not know how to handle it?
What?

Also, a footnote is not normative, so it in and of itself should
not be the sole statement of something. Though often they are use
for things that are implied. But you are saying that that is not
even the case on this one. So, since the committee knew/knows this,
why wouldn't they fix it?

It sounds like there is a definite disagreement on this?

Nick Maclaren

unread,

May 16, 2001, 11:39:58 AM5/16/01

to

In article <9du5mc$9d$1...@panix3.panix.com>,
com...@panix.com (Greg Comeau) writes:

[ On the ordering of actions within translation phase 7. ]

|> I didn't know this problem, so I'll take your words on it for now then.
|> However, you said that you brought this to the C committe twice.
|> Did they not agree there was a problem too? Not know how to handle it?
|> What?

The official response was "The standard is clear enough as it is."
When this is in response to a comment that says that there are
two possible interpretations, spells them out and says that there
seems to be no wording that specifies which, it isn't very good.

I STILL don't know whether tan(0.0) is allowed to set errno to
ERANGE, and got that answer three times in response to my query
asking whether or not it could.

|> Also, a footnote is not normative, so it in and of itself should
|> not be the sole statement of something. Though often they are use
|> for things that are implied. But you are saying that that is not
|> even the case on this one. So, since the committee knew/knows this,
|> why wouldn't they fix it?

I don't think that they thought that I was right, but couldn't
find any wording to prove me wrong, so ducked the issue.

|> It sounds like there is a definite disagreement on this?

Some. Not all that much, because this is one case where "but
everyone knows" is genuinely correct - I have never seen much
variation, and it has mostly been in constructions that no
right-minded programmer would use.

It did cause a couple of flurries in the context of C99, which
introduces a few more cases where the ambiguities are visible.
But even then, it got skipped over. For example, C99 introduces
the possibility of:

static const int n = 0;

extern void fred (double a[0]);

extern void joe (double a[n]);

Which are likely to be treated differently by compilers. I am
not 100% certain whether the following uses the standard form
of main's declaration in C99, for example:

int main (int argc, char *argv[argc]);

int main (int argc, char *argv[1/(argc-argc)]);

Nick Maclaren

unread,

May 16, 2001, 12:09:21 PM5/16/01

to

In article <q4wM6.3926$6j3.3...@www.newsranger.com>,

Which implies that they felt that the parameter adjustment should
come after the type validity checking but before the identifier
is associated with a type. But they still don't say so.

However, that DR is superseded by C99, for reasons that I give in
another posting on this thread.

|> And the Standard (at least C90) does not define the time when
|> converting array type to pointer type (adjusting parameter types)
|> takes place.

Quite. And it can be significant.

Clive D. W. Feather

unread,

May 25, 2001, 8:22:28 AM5/25/01

to

In article <9dtd8h$1m1$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren

<nm...@cus.cam.ac.uk> writes

> extern int main (int argc, char **argv);
> extern int main (int argc, char *argv[]);
> extern int main (int argc, char *argv[5]);

You know that I think these are equivalent.

> extern int main (int argc, char *argv[0]);

This violates the constraint in 6.7.5.2#2. This constraint is *not*
qualified by "after adjustment", or anything like that; it's simply
violated.

> extern int main (int argc, char *argv[0/0]);

This one is actually quite interesting. Is "0/0" an integer constant
expression ? If it isn't, then that line is equivalent to:

extern int main (int argc, char *argv [*]);

(that is, a variable length array). If it is, then does it have a value
greater than zero ? If not, 6.7.5.2#2 is violated. If it does, then it's
equivalent to the case with "5" as the size. So, I would argue, if no
diagnostic is produced by an implementation then it is valid.

Or does 0/0 always violate the constraint of 6.6#4 ?

--
Clive D.W. Feather, writing for himself | Home: <cl...@davros.org>
Tel: +44 20 8371 1138 (work) | Web: <http://www.davros.org>
Fax: +44 20 8371 4037 (D-fax) | Work: <cl...@demon.net>
Written on my laptop; please observe the Reply-To address

Clive D. W. Feather

unread,

May 25, 2001, 8:37:05 AM5/25/01

to

In article <9du3sd$q7f$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
<nm...@cus.cam.ac.uk> writes

> extern int main (int argc, char *argv[fred]); /* fred not defined */
> extern int main (int argc, char *argv[++++++]);
> extern int main (int argc, char *argv[$]);
>
>Translation phase 7 says "The resulting tokens are syntactically
>and semantically analyzed and translated as a translation unit.
>That is a fat lot of help. In fact, "but everyone knows" that
>it is done in textually sequential order.

Nothing to do with textually sequential order. The last two violate a
syntax rule, and no amount of "adjustment" can solve that.

TP7 says "... as a translation unit". What is a translation unit ? Well,
it's the syntactic term defined in 6.9#1. So the program must conform to
the syntax no matter what. I don't think you'll find anywhere that the
Standard says you can ignore a syntax rule under certain circumstances.

Furthermore, 5.1.1.3#1 requires the diagnostic no matter what.

Your first example violates a syntax rule as well, though people often
misunderstand this. In 6.5.1, the production "identifier" only applies
when a declaration of that identifier is in scope, as noted in #2.

>"Everyone knows"
>that syntactic and type errors within the operand of sizeof are
>faults,

Same reason.

> but evaluation errors are not.

When the operand is not evaluated, that is.

>It STILL doesn't deal with cases like sizeof(*(char[][])),
>which can relate to the parameter adjustment issue in sufficiently
>contorted code.

Um, you want to explain how ? 6.7.5.2#1 applies - you cannot
array-qualify an incomplete type. Therefore this is not a declaration of
an array and therefore adjustment doesn't apply.

Greg Comeau

unread,

May 25, 2001, 4:37:13 PM5/25/01

to

In article <DS6yXfxE...@romana.davros.org>,

Clive D. W. Feather <cl...@davros.org> wrote:
>In article <9dtd8h$1m1$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
><nm...@cus.cam.ac.uk> writes
>> extern int main (int argc, char **argv);
>> extern int main (int argc, char *argv[]);
>> extern int main (int argc, char *argv[5]);
>
>You know that I think these are equivalent.
>
>> extern int main (int argc, char *argv[0]);
>
>This violates the constraint in 6.7.5.2#2. This constraint is *not*
>qualified by "after adjustment", or anything like that; it's simply
>violated.

As in my previous post on this, I agree with you up to here.

>> extern int main (int argc, char *argv[0/0]);
>
>This one is actually quite interesting. Is "0/0" an integer constant
>expression ? If it isn't, then that line is equivalent to:
>
> extern int main (int argc, char *argv [*]);
>
>(that is, a variable length array).

I may be havinga bad day, so why is it equiv to that?

>If it is, then does it have a value
>greater than zero ? If not, 6.7.5.2#2 is violated. If it does, then it's
>equivalent to the case with "5" as the size. So, I would argue, if no
>diagnostic is produced by an implementation then it is valid.
>
>Or does 0/0 always violate the constraint of 6.6#4 ?

Somebody somewhere's gotta know for sure :)

Greg Comeau

unread,

May 25, 2001, 4:40:25 PM5/25/01

to

In article <ADHiDkyx...@romana.davros.org>,

Clive D. W. Feather <cl...@davros.org> wrote:

>In article <9du3sd$q7f$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
><nm...@cus.cam.ac.uk> writes
>> extern int main (int argc, char *argv[fred]); /* fred not defined */
>> extern int main (int argc, char *argv[++++++]);
>> extern int main (int argc, char *argv[$]);
>>
>>Translation phase 7 says "The resulting tokens are syntactically
>>and semantically analyzed and translated as a translation unit.
>>That is a fat lot of help. In fact, "but everyone knows" that
>>it is done in textually sequential order.
>
>Nothing to do with textually sequential order. The last two violate a
>syntax rule, and no amount of "adjustment" can solve that.
>
>TP7 says "... as a translation unit". What is a translation unit ? Well,
>it's the syntactic term defined in 6.9#1. So the program must conform to
>the syntax no matter what. I don't think you'll find anywhere that the
>Standard says you can ignore a syntax rule under certain circumstances.
>
>Furthermore, 5.1.1.3#1 requires the diagnostic no matter what.
>
>Your first example violates a syntax rule as well, though people often
>misunderstand this. In 6.5.1, the production "identifier" only applies
>when a declaration of that identifier is in scope, as noted in #2.

I had hope for somebody saying something like this.
Nick, why is this not acceptable?

Nick Maclaren

unread,

May 25, 2001, 6:50:28 PM5/25/01

to

In article <ADHiDkyx...@romana.davros.org>,
Clive D. W. Feather <cl...@davros.org> wrote:

>In article <9du3sd$q7f$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
><nm...@cus.cam.ac.uk> writes
>> extern int main (int argc, char *argv[fred]); /* fred not defined */
>> extern int main (int argc, char *argv[++++++]);
>> extern int main (int argc, char *argv[$]);
>>
>>Translation phase 7 says "The resulting tokens are syntactically
>>and semantically analyzed and translated as a translation unit.
>>That is a fat lot of help. In fact, "but everyone knows" that
>>it is done in textually sequential order.
>
>Nothing to do with textually sequential order. The last two violate a
>syntax rule, and no amount of "adjustment" can solve that.

Why? A syntax rule is a phase 7 activity, as is adjustment. Why
should the application of syntax rules take precedence over
adjustment. Chapter and verse, please :-)

>TP7 says "... as a translation unit". What is a translation unit ? Well,
>it's the syntactic term defined in 6.9#1. So the program must conform to
>the syntax no matter what. I don't think you'll find anywhere that the
>Standard says you can ignore a syntax rule under certain circumstances.

And "adjustment" is a syntactic operation. After all, you yourself
have said that several aspects of syntax are specified in the
"Semantics" sections and not in the grammar.

>Furthermore, 5.1.1.3#1 requires the diagnostic no matter what.

Why? If the adjustment takes precedence over the syntactic
evaluation of the operand, there IS no violation!

>Your first example violates a syntax rule as well, though people often
>misunderstand this. In 6.5.1, the production "identifier" only applies
>when a declaration of that identifier is in scope, as noted in #2.

See above.

>>"Everyone knows"
>>that syntactic and type errors within the operand of sizeof are
>>faults,
>
>Same reason.

See above.

>> but evaluation errors are not.
>
>When the operand is not evaluated, that is.

Precisely. Any why not? The decision of whether to evaluate the
ooerands is a phase 7 activity, just like adjustment. Please show
me EXPLICIT WORDING that states that they have different 'priorities'.

>>It STILL doesn't deal with cases like sizeof(*(char[][])),
>>which can relate to the parameter adjustment issue in sufficiently
>>contorted code.
>
>Um, you want to explain how ? 6.7.5.2#1 applies - you cannot
>array-qualify an incomplete type. Therefore this is not a declaration of
>an array and therefore adjustment doesn't apply.

Because the adjustment can take place before that rule is applied.
Therefore that rule is not applied. No problem. Well, ....

Nick Maclaren

unread,

May 25, 2001, 6:53:29 PM5/25/01

to

In article <9emfvp$nr5$1...@panix3.panix.com>,

Greg Comeau <com...@comeaucomputing.com> wrote:
>In article <ADHiDkyx...@romana.davros.org>,
>Clive D. W. Feather <cl...@davros.org> wrote:
>>In article <9du3sd$q7f$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
>><nm...@cus.cam.ac.uk> writes
>>> extern int main (int argc, char *argv[fred]); /* fred not defined */
>>> extern int main (int argc, char *argv[++++++]);
>>> extern int main (int argc, char *argv[$]);
>>>
>>>Translation phase 7 says "The resulting tokens are syntactically
>>>and semantically analyzed and translated as a translation unit.
>>>That is a fat lot of help. In fact, "but everyone knows" that
>>>it is done in textually sequential order.
>>
>>Nothing to do with textually sequential order. The last two violate a
>>syntax rule, and no amount of "adjustment" can solve that.
>>
>>TP7 says "... as a translation unit". What is a translation unit ? Well,
>>it's the syntactic term defined in 6.9#1. So the program must conform to
>>the syntax no matter what. I don't think you'll find anywhere that the
>>Standard says you can ignore a syntax rule under certain circumstances.
>>
>>Furthermore, 5.1.1.3#1 requires the diagnostic no matter what.
>>
>>Your first example violates a syntax rule as well, though people often
>>misunderstand this. In 6.5.1, the production "identifier" only applies
>>when a declaration of that identifier is in scope, as noted in #2.
>
>I had hope for somebody saying something like this.
>Nick, why is this not acceptable?

Since you ask :-)

Where, in the wording of the standard, does it say that 6.5.1 should
be applied 'before' parameter adjustment is? If a parameter is
adjusted, its size becomes irrelevant much like the operand of sizeof.
Now, where in the wording of the standard, does it say that such
operands are analysed syntactically before being ignored?

Max TenEyck Woodbury

unread,

May 29, 2001, 6:54:09 PM5/29/01

to com...@comeaucomputing.com

Greg Comeau wrote:
> Clive D. W. Feather <cl...@davros.org> wrote:
>> Nick Maclaren <nm...@cus.cam.ac.uk> writes
>>> extern int main (int argc, char **argv);
>>> extern int main (int argc, char *argv[]);
>>> extern int main (int argc, char *argv[5]);
>>
>>You know that I think these are equivalent.

I can see that all three are valid and equally useful,
but they are not exactly equivalent. If for some reason,
sizeof(argv) and sizeof(*argv) is needed, they produce
different results.

#1 sizeof(argv) = sizeof(char * *)
#2 sizeof(argv) = undefined
#3 sizeof(argv) = 5 * sizeof(char *)

I believe sizeof(*argv) is sizeof(char *) in all three.

Is my memory correct that sizeof(char * *) need not be
(but usually is) == sizeof(char *)?

mt...@cds.duke.edu

James Kuyper Jr.

unread,

May 29, 2001, 7:34:03 PM5/29/01

to

Max TenEyck Woodbury wrote:
>
> Greg Comeau wrote:
> > Clive D. W. Feather <cl...@davros.org> wrote:
> >> Nick Maclaren <nm...@cus.cam.ac.uk> writes
> >>> extern int main (int argc, char **argv);
> >>> extern int main (int argc, char *argv[]);
> >>> extern int main (int argc, char *argv[5]);
> >>
> >>You know that I think these are equivalent.
>
> I can see that all three are valid and equally useful,
> but they are not exactly equivalent. If for some reason,
> sizeof(argv) and sizeof(*argv) is needed, they produce
> different results.
>
> #1 sizeof(argv) = sizeof(char * *)
> #2 sizeof(argv) = undefined
> #3 sizeof(argv) = 5 * sizeof(char *)

According to section 6.7.5.3p7: '... A declaration of a parameter as
"array of _type_" shall be adjusted to "qualified pointer to _type_"
...'. Are you suggesting that this adjustment doesn't happen until after
the sizeof() is evaluated?

Max TenEyck Woodbury

unread,

May 29, 2001, 8:47:21 PM5/29/01

to James Kuyper Jr.

Excellent point and a good question!

On the one hand, how would you specify the 'sizeof' the
entry in the parameter list {==sizeof(char * *)}, and on
the other hand, how would you specify the 'sizeof' the
array of 'char *'s {== sizeof(char *), undefined and
sizeof(char *) * 5 in the three sample lines respectively}.

On yet another hand how do you specify the 'sizeof'
the 'char *'s that make up the array and finally if you
used some other type beside char (say wchar), how do you
reference its size all starting only with the name 'argv'?

sizeof(argv[0][0]) should get you the last one.
sizeof(argv[0]) should get you the next to last one.

But how do you distinguish between the first two?

This is not a trivial problem.

You need the parameter size as part of scanning a varadic
argument list. The fact that the standard hides the use of
this information inside a macro does not mean that the
information is not available.

You need the array size if you want to make a copy of it.
This is obviously not the right thing to do with the argv
argument to main, but it might well be needed as part of
some other procedure's interface. The fact that the
dimension of this array might be lost is a bit surprising.

mt...@cds.duke.edu

James Kuyper Jr.

unread,

May 30, 2001, 2:22:30 AM5/30/01

to

Since they're different ways of saying exactly the same thing,
distinction between the first two is not necessary. It's only the third
case that represents a potentially meaningful difference, but one that
gets ignored, due to 6.7.5.3p7.

> This is not a trivial problem.
>
> You need the parameter size as part of scanning a varadic
> argument list. The fact that the standard hides the use of
> this information inside a macro does not mean that the
> information is not available.
>
> You need the array size if you want to make a copy of it.
> This is obviously not the right thing to do with the argv
> argument to main, but it might well be needed as part of
> some other procedure's interface. The fact that the
> dimension of this array might be lost is a bit surprising.

That is, nonetheless, the way that every implementation of C I've ever
heard of interprets that clause. It's not the obvious way for newbies,
and many have been tripped up by it, but the reason it works that way is
normally considered pretty clear once the relevant section is quoted.
There's been no debate over the intent of that clause, only a debate
over whether the intent has been successfully translated into the actual
words of the standard.

Nick Maclaren

unread,

May 30, 2001, 4:48:08 AM5/30/01

to

In article <3B1491A6...@wizard.net>,

There isn't much doubt about that - it hasn't :-( The debate is
about whether the NORMATIVE wording specifies the intent AT ALL, and
whether even the informative wording specifies enough of the intent
to make it clear EXACTLY when diagnostics are required.

In the case of sizeof and related constructions in C90, I agree that
there is no doubt about the intent - but the information used to
know that intent comes from using existing C compilers or a knowledge
of BCPL, and not the C standard. And there are doubts about
EXACTLY how much syntax and constraint checking is required in 'code'
that is thrown away by the parameter adjustment rule. Several people
have posted examples where they HAD read the relevant section and had
interpreted it differently.

The area has become slightly more confused by the addition of variably
modified arrays in C99, but this is more a matter of bringing a
previous ambiguity to the fore rather than introducing a new one.

Max TenEyck Woodbury

unread,

May 30, 2001, 12:55:35 PM5/30/01

to

"James Kuyper Jr." wrote:
>
> Max TenEyck Woodbury wrote:
>>
>> "James Kuyper Jr." wrote:
>>>
>>> Max TenEyck Woodbury wrote:
>>>> Greg Comeau wrote:
>>>>> Clive D. W. Feather <cl...@davros.org> wrote:
>>>>>> Nick Maclaren <nm...@cus.cam.ac.uk> writes
>>>>>>> extern int main (int argc, char **argv);
>>>>>>> extern int main (int argc, char *argv[]);
>>>>>>> extern int main (int argc, char *argv[5]);
>>>>

>>>> ...

>>>
>>> According to section 6.7.5.3p7: '... A declaration of a parameter as
>>> "array of _type_" shall be adjusted to "qualified pointer to _type_"
>>> ...'. Are you suggesting that this adjustment doesn't happen until after
>>> the sizeof() is evaluated?
>>
>> Excellent point and a good question!
>>
>> On the one hand, how would you specify the 'sizeof' the
>> entry in the parameter list {==sizeof(char * *)}, and on
>> the other hand, how would you specify the 'sizeof' the
>> array of 'char *'s {== sizeof(char *), undefined and
>> sizeof(char *) * 5 in the three sample lines respectively}.
>>

>> ...

>>
>> But how do you distinguish between the first two?
>
> Since they're different ways of saying exactly the same thing,
> distinction between the first two is not necessary. It's only the third
> case that represents a potentially meaningful difference, but one that
> gets ignored, due to 6.7.5.3p7.

The first two are NOT exactly equivalent. char * * argv is
effectively char * argv[1] and assures only one sizeof(char *)
element, while char * argv[] explicitly says that the array size
is not specified. If I am not mistaken, one is a complete type
and the other is an incomplete type.

If an implementation extended the language to do array bounds
checking (i.e. it is VERY strict about the definition of
'undefined behavior'), #1 would only allow an array index
of 0 for access, #2 could not check the array bound (which is
how the standard says this parameter should be treated) and #3
would only allow indexes in the 0...4 range for actual access
and 0...5 for address calculations.

>> ...

>>
>> You need the array size if you want to make a copy of it.
>> This is obviously not the right thing to do with the argv
>> argument to main, but it might well be needed as part of
>> some other procedure's interface. The fact that the
>> dimension of this array might be lost is a bit surprising.
>
> That is, nonetheless, the way that every implementation of C I've ever
> heard of interprets that clause. It's not the obvious way for newbies,
> and many have been tripped up by it, but the reason it works that way is
> normally considered pretty clear once the relevant section is quoted.
> There's been no debate over the intent of that clause, only a debate
> over whether the intent has been successfully translated into the actual
> words of the standard.

OK. The standard almost certainly can not be faulted on its
wording, but the question remains - How do you access the
'sizeof' information that 6.7.5.3p7 hides?

mt...@cds.duke.edu

James Kuyper

unread,

May 30, 2001, 4:11:53 PM5/30/01

to

None of them defines an array, because 6.7.5.3p7 converts what looks
like an array declaration into a pointer declaration. Therefore, none of
them declares an incomplete type. All three declarations define a
pointer to a pointer, which might or might not be the first pointer in
an arbitrarily long array of pointers (argc tells you how long it is).

Modulo, of course, the objections that Nick has been raising.

...

> > That is, nonetheless, the way that every implementation of C I've ever
> > heard of interprets that clause. It's not the obvious way for newbies,
> > and many have been tripped up by it, but the reason it works that way is
> > normally considered pretty clear once the relevant section is quoted.
> > There's been no debate over the intent of that clause, only a debate
> > over whether the intent has been successfully translated into the actual
> > words of the standard.
>
> OK. The standard almost certainly can not be faulted on its
> wording, but the question remains - How do you access the
> 'sizeof' information that 6.7.5.3p7 hides?

You can't access that information on any implementation of C that I've
ever used. You'll have to transmit that information by some other means.
That's what argc is for. You can argue this is inconvenient, and you'd
be right, but it is the way C works, the way a lot of legacy code
expects it to work, the way it was intended to work, and according to
some people, the way the standard actually requires it to work. Don't
expect any change any time soon.

In particular, don't expect Nick to win his argument that it requires
clarification; I think he may be right, but I've seen nothing to
indicate that anyone with the authority to do so is willing to make such
a clarification.

Nick Maclaren

unread,

May 30, 2001, 6:33:48 PM5/30/01

to

In article <3B155409...@wizard.net>,

James Kuyper <kuy...@wizard.net> wrote:
>
>In particular, don't expect Nick to win his argument that it requires
>clarification; I think he may be right, but I've seen nothing to
>indicate that anyone with the authority to do so is willing to make such
>a clarification.

Nor have I :-(

Greg Comeau

unread,

May 30, 2001, 10:20:45 PM5/30/01

to

In article <3B142891...@cds.duke.edu>,

Max TenEyck Woodbury <mt...@cds.duke.edu> wrote:
>Greg Comeau wrote:
>> Clive D. W. Feather <cl...@davros.org> wrote:
>>> Nick Maclaren <nm...@cus.cam.ac.uk> writes
>>>> extern int main (int argc, char **argv);
>>>> extern int main (int argc, char *argv[]);
>>>> extern int main (int argc, char *argv[5]);
>>>
>>>You know that I think these are equivalent.
>
>I can see that all three are valid and equally useful,
>but they are not exactly equivalent.

Syntactically no, but semantically they are.

>If for some reason,
>sizeof(argv) and sizeof(*argv) is needed, they produce
>different results.

If they do, then, IMO, your compiler is broken. In each
of the above, argv is a ptr to ptr to char, therefore
sizeof(argv) and sizeof(*argv) has all better be the
same value, respectively.

>#1 sizeof(argv) = sizeof(char * *)
>#2 sizeof(argv) = undefined
>#3 sizeof(argv) = 5 * sizeof(char *)

This is wrong. sizeof(argv) is sizeof(char**) in all 3.

>I believe sizeof(*argv) is sizeof(char *) in all three.

Correct, because argv is char ** in all 3 cases.

>Is my memory correct that sizeof(char * *) need not be
>(but usually is) == sizeof(char *)?

Right, it need not be.

Greg Comeau

unread,

May 30, 2001, 10:29:12 PM5/30/01

to

In article <3B144319...@cds.duke.edu>,

Max TenEyck Woodbury <mt...@cds.duke.edu> wrote:

But they are not those.

>On yet another hand how do you specify the 'sizeof'
>the 'char *'s that make up the array and finally if you
>used some other type beside char (say wchar), how do you
>reference its size all starting only with the name 'argv'?
>
>sizeof(argv[0][0]) should get you the last one.
>sizeof(argv[0]) should get you the next to last one.
>
>But how do you distinguish between the first two?
>
>This is not a trivial problem.

It seems to be to me. The parameter is adjusted, so there
is not other hands.

>You need the parameter size as part of scanning a varadic
>argument list. The fact that the standard hides the use of
>this information inside a macro does not mean that the
>information is not available.

Array arguments are adjusted too. This is "as usual."

>You need the array size if you want to make a copy of it.
>This is obviously not the right thing to do with the argv
>argument to main, but it might well be needed as part of
>some other procedure's interface. The fact that the
>dimension of this array might be lost is a bit surprising.

It's quite surprising, and is a killer for newbie's.
One could argue that it's a fundamental flaw, as much
as being able to do this:

char c;
char array[99];
char *p;
p = &c;
p = &array[9];

Anyway, that's the behavior. Array's are not passed "by copy"
so raising that seems neither here nor there.

Greg Comeau

unread,

May 30, 2001, 10:40:37 PM5/30/01

to

In article <3B152607...@cds.duke.edu>,

Max TenEyck Woodbury <mt...@cds.duke.edu> wrote:

>>>>>>>> extern int main (int argc, char **argv);....

>
>The first two are NOT exactly equivalent. char * * argv is
>effectively char * argv[1] and assures only one sizeof(char *)
>element,

You may want to _think_ of it this way in some cases if you want,
but otherwise: What?

>while char * argv[] explicitly says that the array size
>is not specified. If I am not mistaken, one is a complete type
>and the other is an incomplete type.

What?

>If an implementation extended the language to do array bounds
>checking (i.e. it is VERY strict about the definition of
>'undefined behavior'), #1 would only allow an array index
>of 0 for access, #2 could not check the array bound (which is
>how the standard says this parameter should be treated) and #3
>would only allow indexes in the 0...4 range for actual access
>and 0...5 for address calculations.

If an implementation extended the language, then it can
do whatever it wants.

James Kuyper Jr.

unread,

May 31, 2001, 1:57:17 AM5/31/01

to

Greg Comeau wrote:
>
> In article <3B152607...@cds.duke.edu>,
> Max TenEyck Woodbury <mt...@cds.duke.edu> wrote:

...

> >If an implementation extended the language to do array bounds
> >checking (i.e. it is VERY strict about the definition of
> >'undefined behavior'), #1 would only allow an array index
> >of 0 for access, #2 could not check the array bound (which is
> >how the standard says this parameter should be treated) and #3
> >would only allow indexes in the 0...4 range for actual access
> >and 0...5 for address calculations.
>
> If an implementation extended the language, then it can
> do whatever it wants.

Array bounds checking is allowed, without violating conformance, but
it's not allowed in the sense that he mentioned. It's the actual size of
the array argument that matters, not the declared size of the parameter
"array". For all three cases, the limit on offsets from argv is set by
argc.

Nick Maclaren

unread,

May 31, 2001, 4:47:35 AM5/31/01

to

In article <3B15DD3D...@wizard.net>,

James Kuyper Jr. <kuy...@wizard.net> wrote:

There was a thread on this on the reflector, which indicated that even
the Committee was uncertain about whether bounds in array parameters
had any effect. This all relates to what an object is and when a part
of one object becomes a new object in its own right, and the now
well-thrashed issue of when adjustment takes place.

This isn't TOO serious in C90, because it is clear in most contexts.
The main ones where it gets murky are array parameters with actual
bounds (rarely used) and memcpy etc. in C90 before C95 (the missing
sequence point problem). In a simple case (such as the one that
started this thread), there is no problem anyway.

For reasons that I regretfully have to agree were good, the restrict
qualifier bypassed the whole mess and defined a new meaning of what
an object is. This is also unclear but, again, only in circumstances
that will rarely occur.

However, this creates HAVOC in standards that provide asynchronous
or threaded support based on the C model - POSIX, MPI, OpenMP etc.
I have pointed this out to several of those, and said that they need
to provide their own 'C extension' document, but all have said that
the C standard forbids it (which is incorrect), it is inappropriate
and C should do it (which is arguable) or denied the problem (which
is negligent).

What is more, there have been and are architectures out there which
make it IMPOSSIBLE to implement any of those parallel systems based
on C and achieve ANY performance improvement. So what the compilers
do is to assume that programmers don't write the constructions that
will fail or, rather, will never be able to pin the failure on the
vendor (which is essentially true).

Mutter.

Greg Comeau

unread,

May 31, 2001, 9:01:57 PM5/31/01

to

In article <3B15DD3D...@wizard.net>,
James Kuyper Jr. <kuy...@wizard.net> wrote:

And since it is not allowed, then it would be an extension,
upon which the vendor can do whatever they want (clearly violating
conformance).

James Kuyper Jr.

unread,

May 31, 2001, 10:45:48 PM5/31/01

to

Greg Comeau wrote:
>
> In article <3B15DD3D...@wizard.net>,
> James Kuyper Jr. <kuy...@wizard.net> wrote:
> >Greg Comeau wrote:
> >> In article <3B152607...@cds.duke.edu>,
> >> Max TenEyck Woodbury <mt...@cds.duke.edu> wrote:
> >...
> >> >If an implementation extended the language to do array bounds
> >> >checking (i.e. it is VERY strict about the definition of
> >> >'undefined behavior'), #1 would only allow an array index
> >> >of 0 for access, #2 could not check the array bound (which is
> >> >how the standard says this parameter should be treated) and #3
> >> >would only allow indexes in the 0...4 range for actual access
> >> >and 0...5 for address calculations.
> >>
> >> If an implementation extended the language, then it can
> >> do whatever it wants.
> >
> >Array bounds checking is allowed, without violating conformance, but
> >it's not allowed in the sense that he mentioned. It's the actual size of
> >the array argument that matters, not the declared size of the parameter
> >"array". For all three cases, the limit on offsets from argv is set by
> >argc.
>
> And since it is not allowed, then it would be an extension,
> upon which the vendor can do whatever they want (clearly violating
> conformance).

I'm getting confused about what you're saying. Consider the following
case:

#include <header.h>
// header.h defines L, M, and N as integer constant expressions with
positive values,
// that I'm deliberately not specifying, and has no other relevant
effect.

void func(int parameter[N])
{
for(int i=0; i<L; i++)
parameter[i]++;
}

int main(void)
{
int argument[M]={0};

func(argument);
return 0;
}

I contend that under a perfectly conforming implementation of C, that
program could abort due to performing a run-time array-bounds check if
M<L<N, but that a compiler which made it abort for N<L<M would be
non-conforming. Max TenEyck Woodbury seems to be arguing that the second
case would be conforming. I can't tell which of us you're agreeing with.

The relevant section is 6.5.6p8, which defines legal pointer offsets in
terms of the limits of an array object. 'argument' is the relevant array
object. Thanks to 6.7.5.3p7, 'parameter' is NOT an array object, but a
pointer object. Therefore, M<L<N allows undefined behavior - a
conforming implementation of C can have the program abort due to a
bounds-check failure. However, N<L<M does NOT allow undefined behavior,
so a bounds-check which caused that program to abort would render the
compiler non-conforming.

In either case, the run-time array bounds check could be called an
extension; in the first case, it would be a legal extension; in the
second case that extension would render the compiler non-conforming.

Personally, I wouldn't use the term "extension" for this feature. The
standard doesn't define what an extension is. To me, it seems that an
extension should be something that allows a program to do something that
un-extended C wouldn't allow. The list of common extensions in Annex J.5
all fit this definition. Run-time array-bounds checking may be a useful
feature for some purposes, (I'd love to have it available for debugging
runs!) but it doesn't actually allow anything that wasn't already
allowed; what it does do is make fatal something that the standard only
makes "undefined".

Larry Jones

unread,

May 31, 2001, 3:25:49 PM5/31/01

to

James Kuyper (kuy...@wizard.net) wrote:
>
> In particular, don't expect Nick to win his argument that it requires
> clarification; I think he may be right, but I've seen nothing to
> indicate that anyone with the authority to do so is willing to make such
> a clarification.

Only because we don't think it's a serious problem and no one has made a
concrete proposal that is clearly better than the status quo. The
wording *was* changed in C99 to address Nick's point, and I think he
agrees that it's better than it was (I'm sure he'll correct me if I'm
wrong), but he still doesn't think it's sufficient.

-Larry Jones

Aw Mom, you act like I'm not even wearing a bungee cord! -- Calvin

Nick Maclaren

unread,

Jun 1, 2001, 12:37:46 PM6/1/01

to

In article <9f65rt$7...@nfs0.sdrc.com>, scj...@thor.sdrc.com (Larry Jones) writes:
|> James Kuyper (kuy...@wizard.net) wrote:
|> >
|> > In particular, don't expect Nick to win his argument that it requires
|> > clarification; I think he may be right, but I've seen nothing to
|> > indicate that anyone with the authority to do so is willing to make such
|> > a clarification.
|>
|> Only because we don't think it's a serious problem and no one has made a
|> concrete proposal that is clearly better than the status quo. The
|> wording *was* changed in C99 to address Nick's point, and I think he
|> agrees that it's better than it was (I'm sure he'll correct me if I'm
|> wrong), but he still doesn't think it's sufficient.

Yes, that is all correct. The two main reasons that I don't regard
it as sufficient are (a) that it is only informative and (b) that it
addresses only one aspect, and still leaves some ambiguities. It
does make the intent clear, as far as the OPERATION of the program
goes, but not as far as the diagnostics and error detection is
concerned.

However, I do agree that it is not a serious problem, in itself,
because this is a case where "but everybody knows" is close enough
to the truth. I have never heard of an implementation or even tool
that did anything different.

Where it really does matter is in the aspects that I have referred
to in the thread entitled "Object is as object does". That really
IS a serious matter and really DOES cause foully obscure failures
in real programs. And there is some evidence that they are starting
to hit safety critical areas :-(

A related, but pure C, matter is the question of whether a debugging
compiler is ENTITLED to check array bounds. This has been asked
three times on the reflector, to my memory, and got different answers
each time. The presence of such evilly obscure areas is one of the
main reasons that nobody has written (and, in my view, nobody ever
will write) a debugging C implementation fit to use on real code.

James Kuyper

unread,

Jun 1, 2001, 12:15:37 PM6/1/01

to

Nick Maclaren wrote:
...

> A related, but pure C, matter is the question of whether a debugging
> compiler is ENTITLED to check array bounds. This has been asked
> three times on the reflector, to my memory, and got different answers
> each time. The presence of such evilly obscure areas is one of the

A yes/no question got THREE different answers? :-)

Seriously, would anyone care to present an argument for any answer other
than "yes"? I've already indicated why I think it's permitted, in a
different branch of this same thread.

Nick Maclaren

unread,

Jun 1, 2001, 1:59:18 PM6/1/01

to

In article <3B17BFA9...@wizard.net>,

James Kuyper <kuy...@wizard.net> wrote:
>Nick Maclaren wrote:
>...
>> A related, but pure C, matter is the question of whether a debugging
>> compiler is ENTITLED to check array bounds. This has been asked
>> three times on the reflector, to my memory, and got different answers
>> each time. The presence of such evilly obscure areas is one of the
>
>A yes/no question got THREE different answers? :-)

About half a dozen, actually, but only three "authoritative" ones :-)

The answers from people I recognised as being long-standing members
of the committee included:

No. An array parameter is adjusted to a pointer, and the object
that it points to is the original object from which the pointer was
derived (recursively).

Yes. An array declarator defines an object, and it is undefined
if either that object extends outside its storage area or if you try
to access outside that object.

This aspect of the standard was deliberately left unspecified.
I am still uncertain whether the author of THAT one realised the
logical consequences, but he was definitely serious.

And, of course, there were permutations and combinations of these for
all of the various forms of array parameter, cast and other ways in
which you can derive one pointer expression from another.

>Seriously, would anyone care to present an argument for any answer other
>than "yes"? I've already indicated why I think it's permitted, in a
>different branch of this same thread.

No, but I will :-( If you permit array parameter (and cast) bound
breaches to be undefined behaviour, a lot of popular programs, coding
practices and implementations will break. You can probably imagine
my level of sympathy for the perpetrators of such things, but ....

Markus E Leypold

unread,

Jun 1, 2001, 3:53:37 PM6/1/01

to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:

> In article <9f65rt$7...@nfs0.sdrc.com>, scj...@thor.sdrc.com (Larry Jones) writes:

> A related, but pure C, matter is the question of whether a debugging
> compiler is ENTITLED to check array bounds. This has been asked
> three times on the reflector, to my memory, and got different answers
> each time. The presence of such evilly obscure areas is one of the
> main reasons that nobody has written (and, in my view, nobody ever
> will write) a debugging C implementation fit to use on real code.

Isn't there a gcc 2.7.x Varaint which actually did bounds checking?

Regards -- Markus

James Kuyper

unread,

Jun 1, 2001, 3:47:09 PM6/1/01

to

Nick Maclaren wrote:
>
> In article <3B17BFA9...@wizard.net>,
> James Kuyper <kuy...@wizard.net> wrote:
> >Nick Maclaren wrote:
> >...
> >> A related, but pure C, matter is the question of whether a debugging
> >> compiler is ENTITLED to check array bounds. This has been asked
> >> three times on the reflector, to my memory, and got different answers
> >> each time. The presence of such evilly obscure areas is one of the
> >
> >A yes/no question got THREE different answers? :-)
>
> About half a dozen, actually, but only three "authoritative" ones :-)
>
> The answers from people I recognised as being long-standing members
> of the committee included:
>
> No. An array parameter is adjusted to a pointer, and the object
> that it points to is the original object from which the pointer was
> derived (recursively).

So that's in reference to bounds checking based upon the size of a
pre-adjustment array parameter? I have to agree - that kind of bounds
checking would be non-conforming, as I've argued on the other branch of
this thread.

...

> >Seriously, would anyone care to present an argument for any answer other
> >than "yes"? I've already indicated why I think it's permitted, in a
> >different branch of this same thread.
>
> No, but I will :-( If you permit array parameter (and cast) bound
> breaches to be undefined behaviour, a lot of popular programs, coding
> practices and implementations will break. You can probably imagine
> my level of sympathy for the perpetrators of such things, but ....

Properly, that's an argument why it shouldn't be permitted; it not
relevant to the question of whether it IS permitted. If we're talking
about how the standard should be changed, I'm in partial agreement: I'd
favor relaxing the current rules to allow pointers to be offset to any
location within the same multi-dimensional array.

Greg Comeau

unread,

Jun 1, 2001, 6:11:47 PM6/1/01

to

In article <9f65rt$7...@nfs0.sdrc.com>,

Larry Jones <larry...@sdrc.com> wrote:
>James Kuyper (kuy...@wizard.net) wrote:
>> In particular, don't expect Nick to win his argument that it requires
>> clarification; I think he may be right, but I've seen nothing to
>> indicate that anyone with the authority to do so is willing to make such
>> a clarification.
>
>Only because we don't think it's a serious problem

I'm confused. That it to say, if I understand, it seems
like you are saying that it is the policy of the C committee
to ignore a defect brought to their attention if it's not
deemed serious? Please correct me.

>and no one has made a
>concrete proposal that is clearly better than the status quo. The
>wording *was* changed in C99 to address Nick's point, and I think he
>agrees that it's better than it was (I'm sure he'll correct me if I'm
>wrong), but he still doesn't think it's sufficient.

What makes the status quo better, or at least not worse?

I'd love to see Nick do such a proposal and have this put this
to rest. BTW, Nick, do you believe that Standard C++ has the same
problem?

Nick Maclaren

unread,

Jun 2, 2001, 5:34:42 AM6/2/01

to

In article <xlfg0dk...@neuromancer.informatik.uni-tuebingen.de>,

No. There was a variant that would detect a few of the more obvious
bounds errors, but nothing that was thorough enough to be worth the
effort of using. A system that picks up only the most obvious errors
merely lulls the kiddies into a false sense of security, and misses
any error that a competent programmer doesn't spot at the keyboard.

The same applies to memory allocation checkers, for much the same
reasons.

Nick Maclaren

unread,

Jun 2, 2001, 5:52:07 AM6/2/01

to

In article <3B17F13D...@wizard.net>,

James Kuyper <kuy...@wizard.net> wrote:
>>
>> No. An array parameter is adjusted to a pointer, and the object
>> that it points to is the original object from which the pointer was
>> derived (recursively).
>
>So that's in reference to bounds checking based upon the size of a
>pre-adjustment array parameter? I have to agree - that kind of bounds
>checking would be non-conforming, as I've argued on the other branch of
>this thread.

And many forms of casting, actually, though there is no consensus on
which.

>> >Seriously, would anyone care to present an argument for any answer other
>> >than "yes"? I've already indicated why I think it's permitted, in a
>> >different branch of this same thread.
>>
>> No, but I will :-( If you permit array parameter (and cast) bound
>> breaches to be undefined behaviour, a lot of popular programs, coding
>> practices and implementations will break. You can probably imagine
>> my level of sympathy for the perpetrators of such things, but ....
>
>Properly, that's an argument why it shouldn't be permitted; it not
>relevant to the question of whether it IS permitted. If we're talking
>about how the standard should be changed, I'm in partial agreement: I'd
>favor relaxing the current rules to allow pointers to be offset to any
>location within the same multi-dimensional array.

I can certainly provide some arguments that it IS permitted! In
particular, by not permitting it, you end up with a logical
inconsistency somewhere else. This problem was at the heart of the
cast confusion. For example:

extern double (*a)[9][3];
double (*b)[5][5] = ((*)[5][5])a;
(*a)[4][4] = 0.0;
(*b)[4][4] = 0.0;

The point is that the standard REQUIRES a linear storage vector of
base element objects for all arrays, and relies on that in a good
many places. Not always obviously, of course. But there is an
attempt (some would say pretence) that the multidimensional array
structure has an object meaning beyond its use in address and size
calculation. In my view, this is a lost cause.

Similar remarks apply to the horrible introduction of effective type
(6.5 #6 in C99). Several of us tried to get that section dropped,
as it had clearly not been thought out, but we failed. For example,
think about partial object copies using memcpy - especially in the
context of structures, where it is possible to copy all of the data
but none of the padding bits. And I produced examples where a single
object could have THREE separate types (chosen from FOUR combinations)
at a single time. Wubba, wubba, wubba ....

It's a nightmare. And the trouble is that the wording has been
introduced because it really does make real differences to optimising
compilers and leading-edge code.

Nick Maclaren

unread,

Jun 2, 2001, 6:01:43 AM6/2/01

to

In article <9f93v3$lb$1...@panix3.panix.com>,

Greg Comeau <com...@comeaucomputing.com> wrote:
>In article <9f65rt$7...@nfs0.sdrc.com>,
>Larry Jones <larry...@sdrc.com> wrote:
>
>>and no one has made a
>>concrete proposal that is clearly better than the status quo. The
>>wording *was* changed in C99 to address Nick's point, and I think he
>>agrees that it's better than it was (I'm sure he'll correct me if I'm
>>wrong), but he still doesn't think it's sufficient.
>
>What makes the status quo better, or at least not worse?
>
>I'd love to see Nick do such a proposal and have this put this
>to rest. BTW, Nick, do you believe that Standard C++ has the same
>problem?

I don't know C++. But I am afraid that Larry Jones has a point.
I did propose explicit wording for C90, and early on in C99, but
since then I have learnt a lot. In particular, I have discovered
that this is only the visible tip of an extremely large iceberg.
My wording would have been only a little better than the wording
that was introduced in C99.

There are three aspects that need attention:

The way that translation phase 4 calls phase 7. This isn't
very hard to resolve, but is messy.

A precise model for each type of parsing and syntax analysis
used in C, of which there are a good many.

A description of the internal structure of translation phase 7,
which is currently partly specified in several dozen places in the
standard and partly specified by "but everyone knows".

The trouble is that the latter two are not universally accepted
to be problems in the first place! Until they ARE accepted to be
genuine problems, any proposal will be rejected as "the standard
is clear enough as it is". And I have failed to persuade people,
despite the fact that they have failed to provide wording that
disambiguates a good many of my examples.

Fergus Henderson

unread,

Jun 3, 2001, 1:56:00 PM6/3/01

to

Yes, and work on that continues. See the recent thread on the gcc mailing
list: <http://gcc.gnu.org/ml/gcc/2001-04/msg00794.html>.

--
Fergus Henderson <f...@cs.mu.oz.au> | "I have always known that the pursuit
| of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.

Fergus Henderson

unread,

Jun 3, 2001, 2:01:21 PM6/3/01

to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:
>Markus E Leypold <ley...@informatik.uni-tuebingen.de> wrote:
>>nm...@cus.cam.ac.uk (Nick Maclaren) writes:
>>
>>> In article <9f65rt$7...@nfs0.sdrc.com>, scj...@thor.sdrc.com (Larry Jones) writes:
>><snipped>
>>
>>> A related, but pure C, matter is the question of whether a debugging
>>> compiler is ENTITLED to check array bounds. This has been asked
>>> three times on the reflector, to my memory, and got different answers
>>> each time. The presence of such evilly obscure areas is one of the
>>> main reasons that nobody has written (and, in my view, nobody ever
>>> will write) a debugging C implementation fit to use on real code.
>>
>>Isn't there a gcc 2.7.x Varaint which actually did bounds checking?
>
>No. There was a variant that would detect a few of the more obvious
>bounds errors, but nothing that was thorough enough to be worth the
>effort of using. A system that picks up only the most obvious errors
>merely lulls the kiddies into a false sense of security, and misses
>any error that a competent programmer doesn't spot at the keyboard.
>
>The same applies to memory allocation checkers, for much the same
>reasons.

Nick is, as usual, being overly pessimistic here.
The class of errors which are obvious to a competent programmer
is not a superset of the class of errors which are obvious to
a system that does run-time bounds checking, even if that run-time
bounds checking doesn't detect every error that could be considered a
violation of the C standard.

Nick Maclaren

unread,

Jun 3, 2001, 3:32:03 PM6/3/01

to

In article <9fdu1h$73u$1...@mulga.cs.mu.OZ.AU>,

Fergus Henderson <f...@cs.mu.oz.au> wrote:
>nm...@cus.cam.ac.uk (Nick Maclaren) writes:
>>>
>>>Isn't there a gcc 2.7.x Varaint which actually did bounds checking?
>>
>>No. There was a variant that would detect a few of the more obvious
>>bounds errors, but nothing that was thorough enough to be worth the
>>effort of using. A system that picks up only the most obvious errors
>>merely lulls the kiddies into a false sense of security, and misses
>>any error that a competent programmer doesn't spot at the keyboard.
>>
>>The same applies to memory allocation checkers, for much the same
>>reasons.
>
>Nick is, as usual, being overly pessimistic here.
>The class of errors which are obvious to a competent programmer
>is not a superset of the class of errors which are obvious to
>a system that does run-time bounds checking, even if that run-time
>bounds checking doesn't detect every error that could be considered a
>violation of the C standard.

Well, perhaps.

Experience with Fortran is that bounds checks that detected only
errors that could be located by looking at each routine in isolation
were a complete waste of time. While they DID pick up the majority
of errors made by the kiddies, they reduced their debugging time by
only 20% or so, as the bugs that couldn't be detected that way were
the ones that took the time to find. Hence the uselessness of IBM
Fortran G1's feature and the usefulness of WATFIV.

With experienced programmers, of course, that 20% dropped to about
2%, so it wasn't even worth seeing if the feature helped.

If I get a moment I shall take another look at that project; when
I last looked, the facilities available were dire even by the low
standards of Fortran G1. Which was not the fault of the authors,
of course.

Zack Weinberg

unread,

Jun 3, 2001, 3:44:32 PM6/3/01

to

Nick Maclaren <nm...@cus.cam.ac.uk> writes:
>
>Experience with Fortran is that bounds checks that detected only
>errors that could be located by looking at each routine in isolation
>were a complete waste of time. While they DID pick up the majority
>of errors made by the kiddies, they reduced their debugging time by
>only 20% or so, as the bugs that couldn't be detected that way were
>the ones that took the time to find. Hence the uselessness of IBM
>Fortran G1's feature and the usefulness of WATFIV.

The project being discussed doesn't do static compile-time checks. It
replaces all pointers with (base, limit) pairs and validates them at run
time.

It's highly invasive - you have to recompile absolutely everything -
but at least in theory will catch all bounds violations. Note that a
practical definition of "bounds violation" is being used, not one which
hews to the exact phrasing in the standard. I believe it includes array
shape checking.

Of course, being a runtime check, it will never catch *potential* bugs;
it can only intercept bugs and make the program fail deterministically.

zw

Nick Maclaren

unread,

Jun 4, 2001, 4:30:54 AM6/4/01

to

In article <9fe430$1di$1...@nntp.Stanford.EDU>,

Zack Weinberg <za...@stanford.edu> wrote:
>Nick Maclaren <nm...@cus.cam.ac.uk> writes:
>>
>>Experience with Fortran is that bounds checks that detected only
>>errors that could be located by looking at each routine in isolation
>>were a complete waste of time. While they DID pick up the majority
>>of errors made by the kiddies, they reduced their debugging time by
>>only 20% or so, as the bugs that couldn't be detected that way were
>>the ones that took the time to find. Hence the uselessness of IBM
>>Fortran G1's feature and the usefulness of WATFIV.
>
>The project being discussed doesn't do static compile-time checks. It
>replaces all pointers with (base, limit) pairs and validates them at run
>time.

I assumed that - static bounds checking of a language like C is too
weak to be worth even setting as a student project (except to say
why it can't be made to work).

>It's highly invasive - you have to recompile absolutely everything -
>but at least in theory will catch all bounds violations. Note that a
>practical definition of "bounds violation" is being used, not one which
>hews to the exact phrasing in the standard. I believe it includes array
>shape checking.

Well, of course you expect a complete recompilation. And Fortran G1
used a similar 'practical' definition - WATFIV's stricter and more
formal definition was much more useful.

>Of course, being a runtime check, it will never catch *potential* bugs;
>it can only intercept bugs and make the program fail deterministically.

No, that's not what I mean. C has perhaps a dozen different ways of
converting one pointer into another, with a comparable number of
variant rules, and most are seriously ambiguous. For example, casts
say "trust me - I know what I am doing" and so are very dangerous,
yet casts are essential for an excessive number of constructions.

For an explanation of what I mean by "potential bugs", see my thread
called "Object is as object does". The point is that a crude check
will very often pass a reference as acceptable, and yet an optimising
compiler will generate code which will go wrong. And such 'advanced'
errors account for most of the debugging time spent by all except the
complete novices. It is, after all, why so much portable software
recommends disabling optimisation as a first start in debugging.

Let us assume that the (base, limit) pairs are now handled perfectly
(i.e. no information loss through memcpy etc.) but always refer to
the originally allocated object. They will provide little or no
help in detecting the class of errors that led to this thread, and
even less in detecting the object overlap problems that cause so
much trouble with optimisation.

As I said, a similar phenomenon occurs with memory checkers. The
X Toolkit is a crock, and I once tried to locate one of its pointer
errors. No chance. It wasn't a simple error of going one over the
end of an array, but one where a pointer was being used after the
object had been deallocated or some such nasty. And the pointer
pair approach won't help with that one, either, without a reliable
pointer validity check.

Sorry, but I really have been there and done that before. The fact
that a tool makes life easier for kiddies in their first, naive
fumbles doesn't mean that it is useful. Its benefit may be negated
by discouraging or even preventing the kiddies from learning methods
that will work on real codes.

As I said, I may take another look, but I wasn't impressed with its
approach last time I did. Yes, it may be the best that can be done,
but that isn't the same as it being on the positive side of the
ledger.

Douglas A. Gwyn

unread,

Jun 4, 2001, 11:23:37 AM6/4/01

to

Nick Maclaren wrote:
> A related, but pure C, matter is the question of whether a debugging
> compiler is ENTITLED to check array bounds.

Of course it is, since that is out of the scope of the standard.
However, array bounds checking is not as simple in C as in many
other languages, because the rules intentionally allow certain
kinds of pointer aliasing (via casts, etc.) so long as the
operation conforms to the addressable-object model and obeys
type constraints (alignment, etc.).

Nick Maclaren

unread,

Jun 4, 2001, 12:40:06 PM6/4/01

to

In article <3B1BA7F9...@null.net>,

I don't think that you have been following the thread. The question
is whether, and to what extent, going outside array BOUNDS (as
distinct from going outside the storage of the originally allocated
object) is a breach of the standard.

If it is not a breach of the standard, and the program is otherwise
strictly conforming, then a conforming implementation is not supposed
to reject the program.

If it IS a breach of the standard, the question is EXACTLY which
constructions constitute a breach.

Douglas A. Gwyn

unread,

Jun 4, 2001, 3:00:02 PM6/4/01

to

Nick Maclaren wrote:
> The question is whether, and to what extent, going outside
> array BOUNDS (as distinct from going outside the storage of
> the originally allocated object) is a breach of the standard.

Then the subject line is misleading. For example,
#include <stdio.h>
int main(int argc, char *argv[1]) {
if (argc > 2)
printf("%s\n", argv[2]);
return 0;
}
is strictly conforming, since the "[1]" imposes no constraint
(the actual object is allocated outside main, and argv itself
has decayed to type (char **) before being used, in this case).
Apparently the current issue is exemplified by
int foo[50];
int main(void) {
void *p = foo;
int (*q)[3] = (int (*)[3])foo; // okay
(*q)[10]; // okay in C99 (not C89)
return 0;
}
Even though there it appears that the bounds are "exceeded",
there is no C99 requirement being violated; there is a valid
object there with a valid value (0). In response to a DR
filed against C89, we were forced to conclude that the
specification strictly required that the result of
pointer+integer point within (or just after) the *immediate*
array derived from the type that had gone into forming the
pointer+integer. One of the net effects of various scattered
wording changes about object access in C99 is that the array
object in question need only have the two relevant elements
exist, not that they have indices limited to the apparent
range of the subscripts of any specific array type. (What
is constrained is the difference of the subscripts and the
existence of the array elements.) I think we should have
included an example, e.g. treating a [2][6] declared array
like a [3][4] array.

As I said previously, it is tricky to check for array bounds
violations under such circumstances, although for some
architectures with fine-grained memory protection it might
be easy enough to catch them at run time (just unmap all non-
allocated data addresses and trap illegal references). In C89
there were more cases that could be detected at compile time.

Clive D. W. Feather

unread,

Jun 5, 2001, 7:40:37 AM6/5/01

to

In article <3B142891...@cds.duke.edu>, Max TenEyck Woodbury
<mt...@cds.duke.edu> writes

>>>> extern int main (int argc, char **argv);

>>>> extern int main (int argc, char *argv[]);
>>>> extern int main (int argc, char *argv[5]);

>If for some reason,
>sizeof(argv) and sizeof(*argv) is needed, they produce
>different results.
>
>#1 sizeof(argv) = sizeof(char * *)
>#2 sizeof(argv) = undefined
>#3 sizeof(argv) = 5 * sizeof(char *)

No. In all three cases argv has the type pointer to pointer to char, and
so sizeof (argv) is sizeof (char **).

>Is my memory correct that sizeof(char * *) need not be
>(but usually is) == sizeof(char *)?

Yes.

--
Clive D.W. Feather, writing for himself | Home: <cl...@davros.org>
Tel: +44 20 8371 1138 (work) | Web: <http://www.davros.org>
Fax: +44 20 8371 4037 (D-fax) | Work: <cl...@demon.net>
Written on my laptop; please observe the Reply-To address

Clive D. W. Feather

unread,

Jun 5, 2001, 8:26:33 AM6/5/01

to

In article <9emnp9$fs6$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
<nm...@cus.cam.ac.uk> writes
>>>> extern int main (int argc, char *argv[fred]); /* fred not defined */
>>>> extern int main (int argc, char *argv[++++++]);
>>>> extern int main (int argc, char *argv[$]);

>>>Nothing to do with textually sequential order. The last two violate a
>>>syntax rule, and no amount of "adjustment" can solve that.

>>>Your first example violates a syntax rule as well, though people often
>>>misunderstand this. In 6.5.1, the production "identifier" only applies
>>>when a declaration of that identifier is in scope, as noted in #2.

>Where, in the wording of the standard, does it say that 6.5.1 should
>be applied 'before' parameter adjustment is? If a parameter is
>adjusted, its size becomes irrelevant much like the operand of sizeof.
>Now, where in the wording of the standard, does it say that such
>operands are analysed syntactically before being ignored?

You have it backwards: it doesn't say that they aren't.

The principle here is that if code violates a constraint or syntax
error, a diagnostic is required. It is not relevant whether the code in
question is ever executed or not. If I write:

static int foo (void) { ++ }

at the end of a translation unit, and there is no call to foo() in that
translation unit, it *STILL* violates the syntax.

[The above statement is to be read after preprocessing where relevant,
so that, for example, #if'd out code is exempt from the full syntax.]

Exactly the same principle applies to your examples. They violate a
constraint.

extern int main (int argc, char *argv[-1]);

violates a constraint as well.

In another thread I've just quoted DR 047. This says that the Standard
does *not* imply anywhere that an invalid parameter is magically
transformed to a valid pointer.

In other words, the issue is not the order in which things are done.
Rather:

* The code must conform to the syntax. AND
* The code must not violate any constraint. AND
* The code must not involve undefined behaviour at runtime.

In your example with "fred", the point is not that it is unclear what
order things are done in; to determine whether "fred" is declared, you
have to look at the code textually before the line in question, but
that's part of the scope rules, nothing more.

Clive D. W. Feather

unread,

Jun 5, 2001, 8:07:35 AM6/5/01

to

In article <9emfpp$n5i$1...@panix3.panix.com>, Greg Comeau
<com...@panix.com> writes
>>> extern int main (int argc, char *argv[0/0]);
>>
>>This one is actually quite interesting. Is "0/0" an integer constant
>>expression ? If it isn't, then that line is equivalent to:
>>
>> extern int main (int argc, char *argv [*]);
>>
>>(that is, a variable length array).
>
>I may be havinga bad day, so why is it equiv to that?

[#5] If the size is an expression that is not an integer
constant expression: if it occurs in a declaration at
function prototype scope, it is treated as if it were
replaced by *; otherwise, each time it is evaluated it shall
have a value greater than zero.

Clive D. W. Feather

unread,

Jun 5, 2001, 7:56:50 AM6/5/01

to

In article <3B1701DC...@wizard.net>, James Kuyper Jr.
<kuy...@wizard.net> writes

>#include <header.h>
>// header.h defines L, M, and N as integer constant expressions with
>positive values,
>// that I'm deliberately not specifying, and has no other relevant
>effect.
>
>void func(int parameter[N])
>{
> for(int i=0; i<L; i++)
> parameter[i]++;
>}
>
>int main(void)
>{
> int argument[M]={0};
>
> func(argument);
> return 0;
>}

My understanding of the Standard is, and always has been, that the value
of N is irrelevant (provided that it is positive). A bounds-checking
implementation can trap when L > M, but not otherwise.

Note that in C99 we added a new feature: if you insert "static" before N
in the definition of func, then this is an assertion that M >= N (and
that the actual argument is not a null pointer), and an implementation
may trap or otherwise fail if M < N.

If you changed the call to "func (argument + K)", then a bounds-checking
implementation can trap when L > M - K. If "static" is added, then it
can trap if M - K < N.

Larry Jones

unread,

Jun 5, 2001, 3:07:53 PM6/5/01

to

Greg Comeau (com...@panix.com) wrote:
>
> I'm confused. That it to say, if I understand, it seems
> like you are saying that it is the policy of the C committee
> to ignore a defect brought to their attention if it's not
> deemed serious? Please correct me.

The policy of the C committee is to focus our (limited) resources where
they can do the most good. As long as the Standard is written in English
instead of some formal notation (and there are good reasons for that to
continue being the case for the forseeable future), there are bound to
be things that some people reguard as ambiguous or unclear.
Unfortunately, those are more matters of opinion than fact; what is
clearer to one person may well be less clear to another. For example,
when the committee decided to mandate that integer division round toward
zero instead of being implementation defined as it was previously, the
committee spent literally *hours* trying to devise clear, unambiguous
wording, and failed; someone was able to come up with a plausible
misreading of every attempt. That was *not* a productive use of our
time, and we're loathe to repeat that exercise.

Serious defects (e.g., things that are just plain wrong, things that are
missing, things that are *clearly* misleading or ambiguous) take
precedence over less serious defects. Defects like this one where the
standard appears to be clear enough that no one has ever actually
misunderstood it and the only question is whether the normative text is
clear enough if you deliberately ignore the informative text (something
no motivated reader is going to do) end up at the very bottom of the
priority list. And there are usually enough things to do that things at
the very bottom of the list almost never reach the top of the list.

> What makes the status quo better, or at least not worse?

Simply the fact that it is the status quo. There is a very high cost to
change a standard, both to the producers (e.g., preparing the change
text, approving it, and publishing it) and to the consumers (e.g.,
getting the changes, reading them, and understanding the changes, their
rationale, and their affect). You don't want to pay that price for
little or no benefit.

-Larry Jones

Summer vacation started! I can't be sick! -- Calvin

Nick Maclaren

unread,

Jun 5, 2001, 4:15:36 PM6/5/01

to

In article <koK7pVz5$MH7...@romana.davros.org>,

I know that is the principle - that isn't what I am talking about.

Is there any WORDING that states unambiguously that the action of
adjustment is not simply to throw away the unneeded tokens as
soon as it is realised that they are not needed?

Note that the matter of whether code is executed or not isn't the
issue. The term "evaluated" is used (without definition) to imply
that translation phase 7 does something with the code.

Array parameters SPECIFICALLY use another term, which is reasonable
to assume means another concept. Why shouldn't it be a PARSING
action and take effect during the parsing of the declarator?

Nick Maclaren

unread,

Jun 5, 2001, 4:26:01 PM6/5/01

to

In article <9fjam9$o...@nfs0.sdrc.com>,

Larry Jones <larry...@sdrc.com> wrote:
>Greg Comeau (com...@panix.com) wrote:
>>
>> I'm confused. That it to say, if I understand, it seems
>> like you are saying that it is the policy of the C committee
>> to ignore a defect brought to their attention if it's not
>> deemed serious? Please correct me.
>
>The policy of the C committee is to focus our (limited) resources where
>they can do the most good. As long as the Standard is written in English
>instead of some formal notation (and there are good reasons for that to
>continue being the case for the forseeable future), there are bound to
>be things that some people reguard as ambiguous or unclear.

That is true, though many people dissent over how good the reasons
for continuing the tradition are.

>Unfortunately, those are more matters of opinion than fact; what is
>clearer to one person may well be less clear to another. For example,
>when the committee decided to mandate that integer division round toward
>zero instead of being implementation defined as it was previously, the
>committee spent literally *hours* trying to devise clear, unambiguous
>wording, and failed; someone was able to come up with a plausible
>misreading of every attempt. That was *not* a productive use of our
>time, and we're loathe to repeat that exercise.

But that is NOT true - or, at least, the first clause isn't.

The fact that what is clearer to one person may well be less clear
to another means that it is a matter of FACT that there are going
to be some things that are ambiguous and unclear. What right do
you have to say that your opinion of what the WORDS mean is the
correct one?

Note that I am NOT referring to what the INTENT is, on which you
ARE an authority, but about your interpretation of the English
language, where you (like me) are just another user of it.

>Serious defects (e.g., things that are just plain wrong, things that are
>missing, things that are *clearly* misleading or ambiguous) take
>precedence over less serious defects. Defects like this one where the
>standard appears to be clear enough that no one has ever actually
>misunderstood it and the only question is whether the normative text is
>clear enough if you deliberately ignore the informative text (something
>no motivated reader is going to do) end up at the very bottom of the
>priority list. And there are usually enough things to do that things at
>the very bottom of the list almost never reach the top of the list.

That is correct, except this is NOT an example where the standard
is "clear enough". It isn't, and I have met a good many people who
have been thoroughly confused by it. It is a situation where the
standard is disambiguated by existing practice - not the same thing
at all. But, because of the unanimity of existing practice in this
case, it isn't as serious as it might be.

James Kuyper

unread,

Jun 5, 2001, 4:54:33 PM6/5/01

to

Nick Maclaren wrote:
>
> In article <9fjam9$o...@nfs0.sdrc.com>,
> Larry Jones <larry...@sdrc.com> wrote:
> >Greg Comeau (com...@panix.com) wrote:

...

> The fact that what is clearer to one person may well be less clear
> to another means that it is a matter of FACT that there are going
> to be some things that are ambiguous and unclear. What right do
> you have to say that your opinion of what the WORDS mean is the
> correct one?

If you care about the ISO standard, then the C committee is precisely
the one entity that has been designated by ISO as having that right. I
can understand challenging the committee's decision on a given issue,
but I don't understand your challenge of their right to issue a
decision.

However, in this case, he isn't even describing a decision about the
meaning of the language, but a decision about how the committee should
spend the committee's own very limited resources. Who ELSE should have a
right to make that decision? It's not as if anyone's paying the
committee to do the work; there's very little anyone can do to force
volunteers to concentrate on any issue that they don't want to
concentrate on.

> Note that I am NOT referring to what the INTENT is, on which you
> ARE an authority, but about your interpretation of the English
> language, where you (like me) are just another user of it.

The Standard is not written in English. It's written in a dialect of
English which some call standardese, and ISO is quite definitely the
authority on the meaning of standardese. When an ISO standard defines
the meaning of a given word or phrase, the conventional English meaning
of that word or phrase ceases to be relevant anywhere that standard
applies.

Still, most of the words in the standard are not technical words, but
ordinary English, and ISO isn't a relevant authority on those words, nor
on how they're put togethe. However, neither is anyone else! There is no
official authority to whom you can appeal concerning the meaning of a
generic English sentence. In other languages there may be: I've heard
that there are organizations officially in charge of protecting Spanish,
French, German, and Japanese from unwanted changes - they're not very
effective, but they do exist; there's no such organization for English.

If a disputed text occurs in a legal context, you can sometimes get a
judicial authority to make a ruling on its meaning. Otherwise, the best
you can do for a decision on the meaning of an ISO standard is to follow
the procedures outlined by ISO. I can understand your frustration with
some of the answers you've recieved ("the standard is clear enough", as
an answer to a yes/no question, is my favorite example). However, who
ELSE has the right to decide?

...

> That is correct, except this is NOT an example where the standard
> is "clear enough". It isn't, and I have met a good many people who
> have been thoroughly confused by it. It is a situation where the
> standard is disambiguated by existing practice - not the same thing
> at all. But, because of the unanimity of existing practice in this
> case, it isn't as serious as it might be.

Exactly - "it isn't as serious as it might be" - that's all the
justification that's needed for the committee to concentrate it's
efforts on some other issue that is more serious. "No one has ever
stayed confused about the intent long enough to implement it
incorrectly" may be a less than ideal criterion for deciding that
something doesn't need immediate fixing. However, can you point to
anything the committee is currently working on that is less urgent? Not
being on the committee, I can't be sure, but I'll be surprised if there
is any such work being done.

Nick Maclaren

unread,

Jun 5, 2001, 6:25:58 PM6/5/01

to

In article <3B1D4709...@wizard.net>,

James Kuyper <kuy...@wizard.net> wrote:
>Nick Maclaren wrote:
>>
>> In article <9fjam9$o...@nfs0.sdrc.com>,
>> Larry Jones <larry...@sdrc.com> wrote:
>> >Greg Comeau (com...@panix.com) wrote:
>...
>> The fact that what is clearer to one person may well be less clear
>> to another means that it is a matter of FACT that there are going
>> to be some things that are ambiguous and unclear. What right do
>> you have to say that your opinion of what the WORDS mean is the
>> correct one?
>
>If you care about the ISO standard, then the C committee is precisely
>the one entity that has been designated by ISO as having that right. I
>can understand challenging the committee's decision on a given issue,
>but I don't understand your challenge of their right to issue a
>decision.

No, that's not the situation and not what I was doing. There is a
sense in which ISO defines what the WORDS mean (your remark about
standardese which I have snipped), but there is no sense in which
ISO delegates to any subcommittee the power to define the meaning
implied by the English language. What it delegates is the power to
specify an arbitrary INTENT in terms of that language.

And, no, I am NOT challenging their right to issue a decision.
What I am challenging is the right to say that a standard is
unambiguous.

>However, in this case, he isn't even describing a decision about the
>meaning of the language, but a decision about how the committee should
>spend the committee's own very limited resources. Who ELSE should have a
>right to make that decision? It's not as if anyone's paying the
>committee to do the work; there's very little anyone can do to force
>volunteers to concentrate on any issue that they don't want to
>concentrate on.

Again, I have no dissention there.

>> Note that I am NOT referring to what the INTENT is, on which you
>> ARE an authority, but about your interpretation of the English
>> language, where you (like me) are just another user of it.
>
>The Standard is not written in English. It's written in a dialect of
>English which some call standardese, and ISO is quite definitely the
>authority on the meaning of standardese. When an ISO standard defines
>the meaning of a given word or phrase, the conventional English meaning
>of that word or phrase ceases to be relevant anywhere that standard
>applies.

That is correct, but (a) C breaks ISO's guidelines in a good many
places, and (b) I was referring to the sections where it does not use
ISO's technical terms.

>Still, most of the words in the standard are not technical words, but
>ordinary English, and ISO isn't a relevant authority on those words, nor
>on how they're put togethe. However, neither is anyone else! There is no
>official authority to whom you can appeal concerning the meaning of a
>generic English sentence. In other languages there may be: I've heard
>that there are organizations officially in charge of protecting Spanish,
>French, German, and Japanese from unwanted changes - they're not very
>effective, but they do exist; there's no such organization for English.

Correct. That is why I said that it is definitely the case that
there are ambiguities etc., and why I said that my reading of the
wording is as valid as anyone elses. And, as I also said, a good
many other people have read this section in consistent way, but a
way that was not intended. If that doesn't prove ambiguity, I don't
know what would ....

Markus E Leypold

unread,

Jun 6, 2001, 5:31:30 AM6/6/01

to

f...@cs.mu.oz.au (Fergus Henderson) writes:

> Markus E Leypold <ley...@informatik.uni-tuebingen.de> writes:
>
> >nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> >

> >Isn't there a gcc 2.7.x Varaint which actually did bounds checking?
>
> Yes, and work on that continues. See the recent thread on the gcc mailing

Nice to know. -- Regards Markus

Larry Jones

unread,

Jun 6, 2001, 2:01:27 PM6/6/01

to

Nick Maclaren (nm...@cus.cam.ac.uk) wrote:
> In article <9fjam9$o...@nfs0.sdrc.com>,
> Larry Jones <larry...@sdrc.com> wrote:
> >
> >Unfortunately, those are more matters of opinion than fact; what is
> >clearer to one person may well be less clear to another.

[...]

> But that is NOT true - or, at least, the first clause isn't.
>
> The fact that what is clearer to one person may well be less clear
> to another means that it is a matter of FACT that there are going
> to be some things that are ambiguous and unclear. What right do
> you have to say that your opinion of what the WORDS mean is the
> correct one?

Yes, my point (which you made better than I did) is that, as long as the
standard is written in English, it *will* be ambiguous and unclear, at
least to some readers, no matter what. But whether a particular issue
is ambiguous or unclear *is* a matter of opinion -- if I say something
is perfectly clear and you say it's not, who's to say who is right? In
some sense, we both are; in another sense, if even one person says it's
not, then that is prima facia evidence that it's not. But, given that
it can never be made perfectly clear, when is it clear enough? All I
can do is use my best judgement along with the sense of the committee.

As for this particular issue, I'm sympathetic. Like I said, I already
made changes for C99 and I'm willing to make more in the future, but I
don't have any good ideas for making it clearer and no one else has made
a concrete proposal yet either.

-Larry Jones

Hello, local Navy recruitment office? Yes, this is an emergency... -- Calvin

Larry Jones

unread,

Jun 6, 2001, 2:08:41 PM6/6/01

to

Nick Maclaren (nm...@cus.cam.ac.uk) wrote:
>
> C breaks ISO's guidelines in a good many places,

Like what? As far as I know, the only guidelines we violate are having
paragraph numbers (which, I hope everyone agrees, are exceedingly useful
when trying to make specific references), having text in subclauses that
also have dependent subclauses (which is discouraged only because it
creates difficulty in references, which we don't have because of the
paragraph numbers), and not using , for the radix point in the text
(because it would be way too confusing). We certainly haven't broken
any of the guidelines gratuitously unless it was through ignorance.

-Larry Jones

I've never seen a sled catch fire before. -- Hobbes

Joseph S. Myers

unread,

Jun 6, 2001, 5:34:08 PM6/6/01

to

In article <9flrj9$f...@nfs0.sdrc.com>,

Larry Jones <larry...@sdrc.com> wrote:
>Nick Maclaren (nm...@cus.cam.ac.uk) wrote:
>>
>> C breaks ISO's guidelines in a good many places,
>
>Like what? As far as I know, the only guidelines we violate are having
>paragraph numbers (which, I hope everyone agrees, are exceedingly useful

ISTR reading that the people developing the Ada standard persuaded ISO
to agree to free distribution of that standard, but not to paragraph
numbers - so the free Ada standard has paragraph numbers but not the
ISO one. Can you get ISO to combine these exceptions, so you can
continue to have paragraph numbers in the ISO C standard, while making
it freely available and redistributable?

--
Joseph S. Myers
js...@cam.ac.uk

Nick Maclaren

unread,

Jun 7, 2001, 7:46:16 AM6/7/01

to

In article <9flr5n$f...@nfs0.sdrc.com>,

scj...@thor.sdrc.com (Larry Jones) writes:
|>
|> As for this particular issue, I'm sympathetic. Like I said, I already
|> made changes for C99 and I'm willing to make more in the future, but I
|> don't have any good ideas for making it clearer and no one else has made
|> a concrete proposal yet either.

Yes, I appreciate it. My difficulty with proposing wording is
that, every time I think that I have reverse engineered C's
description language, someone points out how I have misunderstood
it. After 12 years, I am still discovering new sources of
confusion, and I still don't know what the phase 7 model is :-(

The key here is that there needs to be SOME normative wording
saying that array parameters are analysed first, adjusted second,
and then the type is attached to the identifier. As it currently
stands, there are several plausible interpretations, of which at
least two have precedent (though one not in C).

Exactly as there needs to be SOME normative wording stating what
the ordering is in the sequence point definitions. Only that one
is a real, practical, serious problem.

But, in general, I don't see that fiddling with the English will
help a great deal. The problem about the current situation is
that different members OF THE COMMITTEE have different ideas in
their mind about what is intended. Subtly different, but still
significantly different.

Nick Maclaren

unread,

Jun 7, 2001, 7:51:29 AM6/7/01

to

Mainly confusing normative versus informative text, but there are
a LOT of places where the only definition is in footnotes and
examples.

And there are a lot of places where words are used with specific
technical meanings that are neither defined nor universally agreed
outside C. One example is "evaluate".

But there are also conflicts with other standards, especially
LIA-1.

Larry Jones

unread,

Jun 7, 2001, 4:27:04 PM6/7/01

to

Joseph S. Myers (js...@cam.ac.uk) wrote:
>
> ISTR reading that the people developing the Ada standard persuaded ISO
> to agree to free distribution of that standard, but not to paragraph
> numbers - so the free Ada standard has paragraph numbers but not the
> ISO one. Can you get ISO to combine these exceptions, so you can
> continue to have paragraph numbers in the ISO C standard, while making
> it freely available and redistributable?

No. The original Ada standard was developed using US Department of
Defense funding that required the final document to be freely
distributable so there was no point in not allowing it to be distributed
with ISO's name on it (it's good publicity, if nothing else). There's
no similar leverage for the C standard.

-Larry Jones

That's the problem with nature. Something's always stinging you
or oozing mucus on you. -- Calvin

Larry Jones

unread,

Jun 7, 2001, 4:43:17 PM6/7/01

to

Nick Maclaren (nm...@cus.cam.ac.uk) wrote:
>
> Mainly confusing normative versus informative text, but there are
> a LOT of places where the only definition is in footnotes and
> examples.

Yes, this is a case where we disagree on what the ISO Directives mean.
I have no problem with normative text imposing a requirement that is
only defined in informative text whereas you do. Perhaps you should
submit a similar defect report on the ISO Directives. :-) (Actually, I
believe there are a number of places where the ISO Directives violate
their own requirements and recommendations, never mind lacking clarity.)

> And there are a lot of places where words are used with specific
> technical meanings that are neither defined nor universally agreed
> outside C. One example is "evaluate".

I don't think there's anything in the ISO directives that forbids that,
but I'll grant you that it's not a good idea. Choosing which terms to
define is always tricky because the people using them all know what they
mean and expect everyone else to as well, but there are often nuances
that aren't captured in the generic meaning.

> But there are also conflicts with other standards, especially
> LIA-1.

Conflicts in what sense? LIA-1 was watered down to the point where it's
mostly descriptive rather than proscriptive. Conformance with the C
Standard doesn't guarantee conformance with LIA-1, but I don't know of
anything that would prevent an implementation from conforming to both --
that's the whole point of Annex H.

-Larry Jones

Well of course the zipper's going to get stuck if everyone
stands around WATCHING me! -- Calvin

Nick Maclaren

unread,

Jun 7, 2001, 6:09:15 PM6/7/01

to

In article <9fop15$e...@nfs0.sdrc.com>,

Larry Jones <larry...@sdrc.com> wrote:
>Nick Maclaren (nm...@cus.cam.ac.uk) wrote:
>>
>> Mainly confusing normative versus informative text, but there are
>> a LOT of places where the only definition is in footnotes and
>> examples.
>
>Yes, this is a case where we disagree on what the ISO Directives mean.
>I have no problem with normative text imposing a requirement that is
>only defined in informative text whereas you do. Perhaps you should
>submit a similar defect report on the ISO Directives. :-) (Actually, I
>believe there are a number of places where the ISO Directives violate
>their own requirements and recommendations, never mind lacking clarity.)

Somehow that fails to surprise me :-(

>> And there are a lot of places where words are used with specific
>> technical meanings that are neither defined nor universally agreed
>> outside C. One example is "evaluate".
>
>I don't think there's anything in the ISO directives that forbids that,
>but I'll grant you that it's not a good idea. Choosing which terms to
>define is always tricky because the people using them all know what they
>mean and expect everyone else to as well, but there are often nuances
>that aren't captured in the generic meaning.

Yes, indeed. Evaluate would be one, except that it is another case
where the consensus really is one. It isn't in computer science,
but all C people know what is meant.

>> But there are also conflicts with other standards, especially
>> LIA-1.
>
>Conflicts in what sense? LIA-1 was watered down to the point where it's
>mostly descriptive rather than proscriptive. Conformance with the C
>Standard doesn't guarantee conformance with LIA-1, but I don't know of
>anything that would prevent an implementation from conforming to both --
>that's the whole point of Annex H.

I have buried my LIA-1, so I can't find chapter and verse, but it is
the places where the C standard forbids the diagnosis of things like
domain errors.

Douglas A. Gwyn

unread,

Jun 8, 2001, 12:47:06 PM6/8/01

to

Nick Maclaren wrote:
> Yes, indeed. Evaluate would be one, except that it is another case
> where the consensus really is one. It isn't in computer science,
> but all C people know what is meant.

It certainly *is* "in computer science"; what do you think LISP's
"eval" does? Expression evaluation is the application of a rule-
based mapping from the expression (however expressed) to an entity
(usually a number) that we call the "value" of the expression.
In the course of the evaluation, use may be made of values of
subexpressions. The simplest interesting case is the value of an
identifier. It is hard to imagine that none of this stuff is
considered to be "in computer science".

Douglas A. Gwyn

unread,

Jun 8, 2001, 12:51:53 PM6/8/01

to

Nick Maclaren wrote:
> Exactly as there needs to be SOME normative wording stating what
> the ordering is in the sequence point definitions. Only that one
> is a real, practical, serious problem.
>
> But, in general, I don't see that fiddling with the English will
> help a great deal. The problem about the current situation is
> that different members OF THE COMMITTEE have different ideas in
> their mind about what is intended. Subtly different, but still
> significantly different.

But what you're missing is that as we work out consensus on more
rigorous specifications, even though sometimes these contradict
our previous personal notions, we still support the adopted form
of the specification as the "official" version. Occasionally
everybody on the committee is surprised to find that what we
specified differs from what each of us believed, at which point
we discuss whether we can live with the existing specification or
need to change it. Each decision has to be made on its individual
merits.

Nick Maclaren

unread,

Jun 8, 2001, 2:04:04 PM6/8/01

to

You are missing my point. Lisp eval evaluates an expression all
right, but the language does not define WHEN it does so. Even
ignoring the lazy evaluation religious war, Lisp is permitted to
evaluate expressions during compilation if it can do so.

C's term evaluation is not like that. An implementation is
FORBIDDEN from evaluating an expression at compile time (or at
least is forbidden from handling an error), unless it is in one
of the restricted contexts where this is required by the standard.

Now, that concept IS known in computer science but there is most
definitely no consensus that the term evaluate implies it.

Nick Maclaren

unread,

Jun 8, 2001, 2:07:25 PM6/8/01

to

No, I am not missing that point. There are two major disadvantages
with that approach:

1) Every single, niggling point has to be dealt with in
isolation. This is fiendishly time-consuming, as you know, but
it also means that a fix can easily introduce other problems (and
sometimes worse ones). Several DRs show this effect.

2) It makes it practically and theoretically impossible for
an outsider reading the standard to work out what it means. The
ONLY way to resolve the large number of ambiguities and similar
problems is to refer each one back to the committee.

Clive D. W. Feather

unread,

Jun 10, 2001, 6:23:17 AM6/10/01

to

In article <9emnjk$fhp$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
<nm...@cus.cam.ac.uk> writes

>>Nothing to do with textually sequential order. The last two violate a
>>syntax rule, and no amount of "adjustment" can solve that.
>

>Why? A syntax rule is a phase 7 activity, as is adjustment. Why
>should the application of syntax rules take precedence over
>adjustment. Chapter and verse, please :-)

Again, you've got to show where it says it *can*. My chapter and verse
is 5.1.1.3#1:

[#1] A conforming implementation shall produce at least one
diagnostic message (identified in an implementation-defined
manner) if a preprocessing translation unit or translation
unit contains a violation of any syntax rule or constraint,

Not "any syntax rule that isn't adjusted around".

Note that the one time we *did* want a syntax rule to be ignored because
of semantics, we said so. 6.10#4:

[#4] When in a group that is skipped (6.10.1), the directive
syntax is relaxed to allow any sequence of preprocessing
tokens to occur between the directive name and the following
new-line character.

Again, compare with 6.7.5.3#4, where we explicitly say that a rule
applies at a specific time:

[#4] After adjustment, the parameters in a parameter type
list in a function declarator that is part of a definition
of that function shall not have incomplete type.

[This article is intended to address the general question of ignoring a
rule of the Standard. I have addressed the specific question of the
meaning of adjustment in another thread.]

Clive D. W. Feather

unread,

Jun 10, 2001, 6:18:06 AM6/10/01

to

In article <9fjel8$5nl$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren

<nm...@cus.cam.ac.uk> writes
>>>>>> extern int main (int argc, char *argv[fred]); /* fred not defined */
>>>>>> extern int main (int argc, char *argv[++++++]);
>>>>>> extern int main (int argc, char *argv[$]);

>Is there any WORDING that states unambiguously that the action of

>adjustment is not simply to throw away the unneeded tokens as
>soon as it is realised that they are not needed?

No. Nor is there any wording that says that any identifier that isn't
declared isn't given a declaration at the end of each block.

You can't say "it doesn't say X doesn't happen". You can only assume an
action if it's explicitly stated.

James Kuyper Jr.

unread,

Jun 10, 2001, 7:45:28 AM6/10/01

to

"Clive D. W. Feather" wrote:
>
> In article <9fjel8$5nl$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
> <nm...@cus.cam.ac.uk> writes
> >>>>>> extern int main (int argc, char *argv[fred]); /* fred not defined */
> >>>>>> extern int main (int argc, char *argv[++++++]);
> >>>>>> extern int main (int argc, char *argv[$]);
>
> >Is there any WORDING that states unambiguously that the action of
> >adjustment is not simply to throw away the unneeded tokens as
> >soon as it is realised that they are not needed?
>
> No. Nor is there any wording that says that any identifier that isn't
> declared isn't given a declaration at the end of each block.
> You can't say "it doesn't say X doesn't happen". You can only assume an
> action if it's explicitly stated.

That's the point - it IS explicitly stated that adjustment happens, and
not defined what adjustment means.

Nick Maclaren

unread,

Jun 10, 2001, 8:27:29 AM6/10/01

to

In article <3B235DD8...@wizard.net>,

Precisely. If the declaration were adjusted immediately after
parsing and before syntax and type analysis, then there no constraint
is violated. If it is adjusted just before the identifier is given
a type, then there is.

Of course, there is nothing in the C standard that even hints whether
or not that is the right way of considering phase 7, and Clive
implies that it isn't, but blowed if I know what the correct way is!
Sure as eggs is little chickens, neither computer science conventions
nor the wording of the stand helps :-(

Nick Maclaren

unread,

Jun 10, 2001, 8:35:01 AM6/10/01

to

In article <VXeHmEtV...@romana.davros.org>,

Clive D. W. Feather <cl...@davros.org> wrote:

>In article <9emnjk$fhp$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
><nm...@cus.cam.ac.uk> writes
>>>Nothing to do with textually sequential order. The last two violate a
>>>syntax rule, and no amount of "adjustment" can solve that.
>>
>>Why? A syntax rule is a phase 7 activity, as is adjustment. Why
>>should the application of syntax rules take precedence over
>>adjustment. Chapter and verse, please :-)
>
>Again, you've got to show where it says it *can*. My chapter and verse
>is 5.1.1.3#1:
>
> [#1] A conforming implementation shall produce at least one
> diagnostic message (identified in an implementation-defined
> manner) if a preprocessing translation unit or translation
> unit contains a violation of any syntax rule or constraint,
>
>Not "any syntax rule that isn't adjusted around".

Why don't you have to abide by the same rule? As I have pointed
out, if the adjustment occurs immediately after parsing, then there
IS no constraint that is violated and therefore no diagnostic need
be produced.

So WHERE in the standard does it say that the programmer CAN assume
that the implementation will perform the adjustment immediately
before the identifier is give a type? If there is no such wording
then, by your own statement, he cannot assume any such thing.

And, if he cannot assume that, the standard is very explicit that
an implementation is free not to do it.

Clive D. W. Feather

unread,

Jun 10, 2001, 9:54:08 AM6/10/01

to

In article <9fadi7$1a7$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
<nm...@cus.cam.ac.uk> writes
>There are three aspects that need attention:
>
> The way that translation phase 4 calls phase 7. This isn't
>very hard to resolve, but is messy.

What am I missing here ? The translation unit undergoes each of the
phases in turn; each phase is completed before the next one. [In the
case of #include, the included text goes through 1 to 3 before being
inserted, and then takes part in 4.]

Is there a back-interaction I've overlooked ?

> A precise model for each type of parsing and syntax analysis
>used in C, of which there are a good many.

This was in another thread a while ago, wasn't it ?

I think that you can derive the following from the existing wording of
the Standard, except as noted:

(1) During TP3 the source is divided into preprocessing tokens; that is,
it is parsed to be an example of "pp-tokens-opt" (defined in 6.10#1).
The "greedy algorithm" is explicitly used both to resolve ambiguities
and to override what you've called "BNF semantics" (that is, if a valid
parse exists it then it is used); 6.4#4 spells out the exact rules.

(2) During TP4 it is parsed to be an example of a "preprocessing-file"
(6.10#1). BNF semantics are used (this is implicit, but I don't think
anyone ever thought otherwise), noting the effects of 6.10#3 and #4.
"preprocessing-token" is taken to be a terminal of the grammar, and each
terminal must match exactly one of the preprocessing tokens generated in
(1).

[That is, if stage 1 generated the three tokens {#}{i}{f}, they cannot
match #if in the 6.10 grammar.]

(3) At the start of TP7 each preprocessing token is either reclassified
as a token or there is an error.

(4) The sequence of tokens is parsed as an example of "translation-unit"
(6.9#1) using BNF semantics. Ambiguities are resolved via various
constraint and semantic clauses. In this parsing, all terminals in the
grammar, and the five non-terminals on the right hand side of "token"
(6.4#1), are treated as terminals, and each terminal must match exactly
one of the tokens generated in (3).

[That is, if stage 3 generated the four tokens {a}{+}{=}{b}, this is not
a valid compound-assignment.]

Nick Maclaren

unread,

Jun 11, 2001, 4:01:05 AM6/11/01

to

In article <YBLkzZ4A...@romana.davros.org>,

Clive D. W. Feather <cl...@davros.org> wrote:

>In article <9fadi7$1a7$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
><nm...@cus.cam.ac.uk> writes
>>There are three aspects that need attention:
>>
>> The way that translation phase 4 calls phase 7. This isn't
>>very hard to resolve, but is messy.
>
>What am I missing here ? The translation unit undergoes each of the
>phases in turn; each phase is completed before the next one. [In the
>case of #include, the included text goes through 1 to 3 before being
>inserted, and then takes part in 4.]
>
>Is there a back-interaction I've overlooked ?

No, but you may have forgotten! How do you evaluate an expression in
#if without referring forward to phase 7, at least? Yet you need to
do that without leaving phase 4.

>> A precise model for each type of parsing and syntax analysis
>>used in C, of which there are a good many.
>
>This was in another thread a while ago, wasn't it ?
>
>I think that you can derive the following from the existing wording of
>the Standard, except as noted:

Yes, but it is by no means complete!

>(4) The sequence of tokens is parsed as an example of "translation-unit"
>(6.9#1) using BNF semantics. Ambiguities are resolved via various
>constraint and semantic clauses. In this parsing, all terminals in the
>grammar, and the five non-terminals on the right hand side of "token"
>(6.4#1), are treated as terminals, and each terminal must match exactly
>one of the tokens generated in (3).

Yes, but but that is missing really quite a lot. For example:

Ambiguity resolution takes second place to analysis in the "BNF"
model, but there are places where the constraint rules OVERRIDE the
BNF (i.e. say "parsing X would be valid, but it isn't acceptable).

Certain scope and linkage constraints use different models; see
6.2.2 #5 and 6.9 #4 for a clear example, but remember also inline
functions :-(

Most languages use more than one model, for good reasons, but most
(a) state this explicitly and (b) have a well-defined relationship
between them. C does neither.

James Kuyper Jr.

unread,

Jun 11, 2001, 9:02:01 AM6/11/01

to

Nick Maclaren wrote:
...

> No, but you may have forgotten! How do you evaluate an expression in
> #if without referring forward to phase 7, at least? Yet you need to
> do that without leaving phase 4.

#if expression evaluation during phase 4 doesn't involve a "call" to
phase 7, though I wouldn't be surprised if there was some shared code
between the two phases in a typical compiler. It follows the rules of
section 6.6, but ignores most of the rest of phase 7 expression syntax
and semantics. Paragraphs 3, 6, and 8 of that section rather severely
restrict the portion of phase 7 semantics that are applicable. The only
integer types are intmax_t and uintmax_t (which is why it's not a good
idea for those to be unusual types, such as bignums). There are no
variables, just identifiers which are either recognised as macros with a
value, that get replaced with that value, or are replaced by 0. The only
identifier not subject to such replacement is defined(), something that
has no phase 7 counterpart.

Incidentally, it would be convenient if there were a phase 7 defined():

// Due to conditional compilation, this may or may not contain
// the declaration "int x_count=0;", but it does contain
// "int count=0;"
#include <myheader.h>

void func()
{
if(defined(x_count))
x_count++;
else
count++;
}

As usual, programmer convenience would be purchased at the cost of
implementor sweat - for this feature to be any use at all, blocks whose
entry is controlled by constant expressions dependent upon the value of
defined() would require special handling. If that defined() evaluates to
0, and the constant expression also evaluates to 0, then any constraints
whose violation depends upon the identifier's definition should be
relaxed. I've no idea how difficult that would be implement, but I've
some idea how difficult it is to express. I went through dozen different
wordings before coming up with that one, and it's still not adequately
precise for inclusion in a future version of the standard.

Nick Maclaren

unread,

Jun 11, 2001, 9:39:19 AM6/11/01

to

In article <3B24C149...@wizard.net>,

"James Kuyper Jr." <kuy...@wizard.net> writes:
|> Nick Maclaren wrote:
|> ...
|> > No, but you may have forgotten! How do you evaluate an expression in
|> > #if without referring forward to phase 7, at least? Yet you need to
|> > do that without leaving phase 4.
|>
|> #if expression evaluation during phase 4 doesn't involve a "call" to
|> phase 7, though I wouldn't be surprised if there was some shared code
|> between the two phases in a typical compiler. It follows the rules of
|> section 6.6, but ignores most of the rest of phase 7 expression syntax
|> and semantics. Paragraphs 3, 6, and 8 of that section rather severely

|> restrict the portion of phase 7 semantics that are applicable. ...

I was talking about the language specification and not the
implementation. And this most definitely DOES refer forward to
phase 7 syntax and semantics from phase 4 operations.

The point is that it makes it very difficult to work out which
wording applies when, and exactly how many of the specifications
apply in this case.

Clive D. W. Feather

unread,

Jun 12, 2001, 8:22:33 AM6/12/01

to

In article <9g1ts1$58b$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
<nm...@cus.cam.ac.uk> writes

>>(4) The sequence of tokens is parsed as an example of "translation-unit"
>>(6.9#1) using BNF semantics. Ambiguities are resolved via various
>>constraint and semantic clauses. In this parsing, all terminals in the
>>grammar, and the five non-terminals on the right hand side of "token"
>>(6.4#1), are treated as terminals, and each terminal must match exactly
>>one of the tokens generated in (3).
>
>Yes, but but that is missing really quite a lot. For example:
>
> Ambiguity resolution takes second place to analysis in the "BNF"
>model, but there are places where the constraint rules OVERRIDE the
>BNF (i.e. say "parsing X would be valid, but it isn't acceptable).

Um, is there ? In TP7 ?

I can see two situations to discuss:

(1) There is a single parse, but the code as thus-interpreted violates a
constraint. Or there are multiple parses, but all violate constraints.
This isn't, to my mind, "overriding". Rather, the code gets through the
"syntax" gate but falls at the "constraints" fence.

(2) There is more than one legitimate parse, and other text determines
how to select between them. The classic case is the "typedef name versus
parameter name" one. Are there any such cases where the wording doesn't
make it clear that that's what's happening ?

> Certain scope and linkage constraints use different models; see
>6.2.2 #5 and 6.9 #4 for a clear example, but remember also inline
>functions :-(

Only if I have to. But are either of those part of the "BNF model" ?

Nick Maclaren

unread,

Jun 12, 2001, 4:16:05 PM6/12/01

to

In article <wHIOCh$JmgJ...@romana.davros.org>,

Clive D. W. Feather <cl...@davros.org> wrote:

>In article <9g1ts1$58b$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
><nm...@cus.cam.ac.uk> writes
>>>(4) The sequence of tokens is parsed as an example of "translation-unit"
>>>(6.9#1) using BNF semantics. Ambiguities are resolved via various
>>>constraint and semantic clauses. In this parsing, all terminals in the
>>>grammar, and the five non-terminals on the right hand side of "token"
>>>(6.4#1), are treated as terminals, and each terminal must match exactly
>>>one of the tokens generated in (3).
>>
>>Yes, but but that is missing really quite a lot. For example:
>>
>> Ambiguity resolution takes second place to analysis in the "BNF"
>>model, but there are places where the constraint rules OVERRIDE the
>>BNF (i.e. say "parsing X would be valid, but it isn't acceptable).
>
>Um, is there ? In TP7 ?

I am pretty sure that someone created an example involving typedef
names and function parameters, but I forget what it was now. It
may have been tidied up, but I don't think that it has.

In K&R, there was definitely the case of an incompatibly declared
identifier and automatic function declaration, but I can't remember
if that reached C90.

I will see if I can think of others. I noted some as possible,
but forget what they were now, and did not double check.

Tokenisation, of course, has several such cases.

Nick Maclaren

unread,

Jun 12, 2001, 4:37:00 PM6/12/01

to

In article <wHIOCh$JmgJ...@romana.davros.org>,

Clive D. W. Feather <cl...@davros.org> wrote:

>In article <9g1ts1$58b$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren
><nm...@cus.cam.ac.uk> writes
>>>(4) The sequence of tokens is parsed as an example of "translation-unit"
>>>(6.9#1) using BNF semantics. Ambiguities are resolved via various
>>>constraint and semantic clauses. In this parsing, all terminals in the
>>>grammar, and the five non-terminals on the right hand side of "token"
>>>(6.4#1), are treated as terminals, and each terminal must match exactly
>>>one of the tokens generated in (3).
>>
>>Yes, but but that is missing really quite a lot. For example:
>>
>> Ambiguity resolution takes second place to analysis in the "BNF"
>>model, but there are places where the constraint rules OVERRIDE the
>>BNF (i.e. say "parsing X would be valid, but it isn't acceptable).
>
>Um, is there ? In TP7 ?

This is an updated response, so please ignore my previous one.

I am pretty sure that someone created an example involving typedef
names and function parameters, but I forget what it was now. It
may have been tidied up, but I don't think that it has.

In K&R, there was definitely the case of an incompatibly declared
identifier and automatic function declaration, but I can't remember
if that reached C90.

There are quite a few involving the scope rules, where another
language might well allow a construction that C doesn't. For
example:

typedef int x;

void y (x) int x; {
;
}

Earlier phases, of course, have quite a few examples.