Language Design

649 views
Skip to first unread message

wsm...@m.cs.uiuc.edu

unread,
Mar 15, 1989, 11:57:00 PM3/15/89
to

Should the language designers be making work for the language-oriented
editor designers or should the language-oriented editor designers
be making work for the language designers?

I believe it should be the latter. The language oriented-editor designers
should provide a technology base that may be taken as given by the language
designer who may then design languages without ugly lexical or syntactic
structures that make it impossible for the compiler to give sensible error
messages.

(For example, in the language I'm learning now, C++, if you forget to
declare a class, it is a *syntax* error. Not misuse of identifier or
undefined class, *syntax* error.)

(Define language-oriented editor to by the hypertext object-oriented
WYSIWYG formating editor or equivalent that you wish *you'd* written.)

Bill Smith
wsm...@cs.uiuc.edu
uiucdcs!wsmith

Gordon V. Cormack

unread,
Mar 17, 1989, 7:32:50 AM3/17/89
to
In article <520...@m.cs.uiuc.edu>, wsm...@m.cs.uiuc.edu writes:
>
>
> (For example, in the language I'm learning now, C++, if you forget to
> declare a class, it is a *syntax* error. Not misuse of identifier or
> undefined class, *syntax* error.)
>

Don't blame C++. Blame C.

C is not context-free parsable [actually I believe it is, but not
with any sensible grammar].
--
Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1
gvco...@waterloo.EDU gvco...@uwaterloo.CA gvco...@water.BITNET

Bob Hearn

unread,
Mar 17, 1989, 3:50:24 PM3/17/89
to
>> (For example, in the language I'm learning now, C++, if you forget to
>> declare a class, it is a *syntax* error. Not misuse of identifier or
>> undefined class, *syntax* error.)
>>
>
>Don't blame C++. Blame C.
>
>C is not context-free parsable [actually I believe it is, but not
>with any sensible grammar].
>--
>Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1
>gvco...@waterloo.EDU gvco...@uwaterloo.CA gvco...@water.BITNET

Well, the *grammar* defining C programs is certainly context-free! Just
look in the back of K&R. This doesn't mean that any grammatically correct
program is semantically correct. I don't know of any non-trivial
language with a context-free grammar for semantically correct programs. (I
may be wrong here, but certainly this is correct for most languages.)
Actually, I've always thought parts of the grammar (e.g. type declarations)
were hacked to facilitate easy parsing.
Many compilers, however, do consider some semantic errors syntax errors,
because it's easier that way. As for the C++ syntax error described above,
that's compiler-dependent. MPW C++ tells me that the identifier is not
a valid type name.

Bob Hearn
he...@claris.com

Alan J Rosenthal

unread,
Mar 17, 1989, 5:25:30 PM3/17/89
to

In article <90...@claris.com> he...@claris.com (Bob Hearn) writes:
>Well, the *grammar* defining C programs is certainly context-free! Just
>look in the back of K&R.

I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm
seeing context-sensitive stuff in the grammar.

Suppose you're looking at a compound-statement and you've just seen the `{'
opening it. Now suppose the next three tokens are `extern', "x", and `;'.
This is produced by a declaration-list, expanding to declaration, which expands
to decl-specifiers init-declarator-list[opt] ;. `extern' is an sc-specifier,
and decl-specifiers -> sc-specifier decl_specifiers[opt]. Now is "x" a
declarator (init-declarator-list -> init-declarator -> declarator ->
identifier) or a typedef-name (decl-specifiers -> type-specifier ->
typedef-name -> identifier)?

It's ambiguous unless you use non-context-free information to tell you whether
or not "x" was previously declared via typedef.

ajr

--
"The goto statement has been the focus of much of this controversy."
-- Aho & Ullman, Principles of Compiler Design, A-W 1977, page 54.

Bob Hearn

unread,
Mar 20, 1989, 7:51:46 PM3/20/89
to
In article <890317222...@explorer.dgp.toronto.edu> fl...@dgp.toronto.edu (Alan J Rosenthal) writes:
>
>In article <90...@claris.com> he...@claris.com (Bob Hearn) writes:
>>Well, the *grammar* defining C programs is certainly context-free! Just
>>look in the back of K&R.
>
>I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm
>seeing context-sensitive stuff in the grammar.
>
>Suppose you're looking at a compound-statement and you've just seen the `{'
>opening it. Now suppose the next three tokens are `extern', "x", and `;'.
>This is produced by a declaration-list, expanding to declaration, which expands
>to decl-specifiers init-declarator-list[opt] ;. `extern' is an sc-specifier,
>and decl-specifiers -> sc-specifier decl_specifiers[opt]. Now is "x" a
>declarator (init-declarator-list -> init-declarator -> declarator ->
>identifier) or a typedef-name (decl-specifiers -> type-specifier ->
>typedef-name -> identifier)?
>
>It's ambiguous unless you use non-context-free information to tell you whether
>or not "x" was previously declared via typedef.
>
>ajr
>

Enough of this bullshit! How many times do I have to say it??? The grammar
in K&R is context-free. Do you know what a context-free grammar *is*? You're
confusing the *class* of the grammar (ever heard of the Chomsky hierarchy?)
with what categories of parsers will work on it. You're also confusing
*syntactic correctness* with *semantic correctness*. This is a little more
acceptable, since, as I was saying, most compilers do to make parsing
easier.

Most C compilers use LALR(1) parsers (stands for "look-ahead left to right,
order 1), which are a subcategory of shift-reduce parsers. The 1 means they
can look at the next 1 token on the input stream when deciding whether to
shift or reduce. And you are quite correct in saying that, in this case,
that one token is not enough; you *do* need symbol table information. But
(1) No one says all context-free grammars have to be LALR(1) parsable
without context information, (2) no one says that the grammar in the back of
K&R is necessarily the one that's used in practice (grammars are often
tweaked (without modifying the language generated!) no make them easier to
parse), and (3) just because your compiler gives you a syntax error when
you do something like that, that doesn't necessarily mean your program
is syntactically incorrect. It just means that treating that kind of
semantic error as a syntax error makes the language easier to parse.

Now, before anyone else wants to tell me that K&R's grammar for C is not
context-free, will you *please* go and look up the definition of context-free
grammar??? Hint: anything that can be expressed in BNF is context-free!!

Bob Hearn
he...@claris.com

Bob Hearn

unread,
Mar 20, 1989, 6:48:19 PM3/20/89
to
>----------------------------------------------------------------------------
>To flame someone for bad spelling or grammer is a discouragement to net
>discussion in a timely fashion.
>----------------------------------------------------------------------------

Uh, that's 'grammar'. :-)

Bob Hearn
he...@claris.com

steve

unread,
Mar 20, 1989, 12:58:51 PM3/20/89
to

Don't blame C either. The real cause of this error is a conspiracy
between the compiler writer and YACC. YACC produces this message
for every syntax error unless you specifically design your grammer
to catch a specific error and output a different message. The writer
of your compiler probably used YACC, and then didn't design his
grammer to do this in this case. He is probably not too much to blame
though. To design a languge to catch every possible generic grammerical
error would make the compiler so slow you might as well code in assembly.

enough from this mooncalf - Steven


----------------------------------------------------------------------------
To flame someone for bad spelling or grammer is a discouragement to net
discussion in a timely fashion.
----------------------------------------------------------------------------

These opinions aren't necessarily Motorola's or Remora's - but I'd like to
think we share some common views.
----------------------------------------------------------------------------
Steven R Weintraub cs.utexas.edu!oakhill!devsys!steve
Motorola Inc. Austin, Texas
(512) 440-3023 (office) (512) 453-6953 (home)
----------------------------------------------------------------------------

Dave Jones

unread,
Mar 20, 1989, 10:09:34 PM3/20/89
to
From article <91...@claris.com>, by he...@claris.com (Bob Hearn):
...

>
> Now, before anyone else wants to tell me that K&R's grammar for C is not
> context-free, will you *please* go and look up the definition of context-free
> grammar??? Hint: anything that can be expressed in BNF is context-free!!
>
> Bob Hearn
> he...@claris.com
>

Bob, Bob, Bob.

Chill out, Dude. You're going to blow a brain vain if you're not
careful.

Now then.

The term "context-free", as Chomsky defined it is not very useful
to us computer types, because no one ever, ever writes grammars any
other way. To him, "context-free" meant that a rule could only
have one symbol on the left-hand side. "Context-sensitive" meant that
production rules could have strings of symbols on the left:

a X b <= a x y b

In the above, the a and b form a "context" for the reduction
X <= xy.

I haven't been following the discussion, but I suspect that the
correspondents were not alluding to the unuseful Chomsky definitions,
but instead were using the terms quite literally: "context-free" meaning
"context-free". I think that is a perfectly reasonable thing to
do, especially if one is unfamiliar with the confusing Chomsky
definition. (I never did like the practice of "defining" something
that already had a meaning in normal English.)

The grammar as published in K&R (both additions) requires
that the lexical analyzer must use left-context to determine whether
or not a token is recognized as a type-name identifier or a regular
identifier. In that sense, it is not "context-free".

I seem to remember that I once figured out that it is possible to
modify the grammar so that the scanner and parser do not have to know
about left-context. But I think I remember that such a parser would
require a two token-lookahead, making it impossible to use with most
compiler-compilers.

Norman Graham

unread,
Mar 22, 1989, 2:07:53 AM3/22/89
to

From article <890317222...@explorer.dgp.toronto.edu>, by fl...@dgp.toronto.edu (Alan J Rosenthal):


>
> In article <90...@claris.com> he...@claris.com (Bob Hearn) writes:
>>Well, the *grammar* defining C programs is certainly context-free! Just
>>look in the back of K&R.
>
> I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm
> seeing context-sensitive stuff in the grammar.
>

> [Stuff deleted about ambiguous derivation sequence.]


>
> It's ambiguous unless you use non-context-free information to tell you whether
> or not "x" was previously declared via typedef.

I have to agree with Bob on this one, but for a different reason.
The grammar in the back of K&R certainly is context-free... but it
is also ambiguous. To paraphrase Aho & Ullman's "The Theory of
Parsing, Translation, and Compiling, Volume I: Parsing", page 91:

A grammar is Context-free if every production has a single
nonterminal on the left and a sequence of 0 or more terminals
or nonterminals on the right.

It is obvious that the grammars in both editions of K&R fit this
description. Thus they are context-free.

Now on to ambiguity. According to Aho & Ullman (same as above), pagee 143:

We say that a CFG G is ambiguous if there is at least one sentence
w in L(G) for which there is more than one distinct derivation tree
with frontier w. This is equivalent to saying that G is ambiguous
if there is a sentence w in L(G) with two or more distinct leftmost
(or rightmost) derivations.

The classic example of ambiguity is the "dangling else" problem, as
illustrated by the grammars in both editions of K&R.

So we see that K&R's grammars are ambigous context-free grammars. That's
no big deal--the ambiguity doesn't affect the language generated by the
grammar. The problem comes in when compilers base the semantic meaning
of a sentence on the derivation sequence the grammar uses to generate
that sentence. For some reason :-) compiler writers like to choose
deterministically between competing derivation sequences. Thus they
either statically choose the proper derivation sequence (eg. the "else"
always matches the last else-less "if") or they use run-time information
to help them choose the "proper" derivation sequence.

--
Norman Graham Oklahoma State University
Internet: nor...@a.cs.okstate.edu Computing and Information Sciences
UUCP: {cbosgd, ihnp4, 219 Mathematical Sciences Building
rutgers}!okstate!norman Stillwater, OK 74078-0599

Bob Hearn

unread,
Mar 22, 1989, 3:14:56 PM3/22/89
to
In article <29...@goofy.megatest.UUCP> djo...@megatest.UUCP (Dave Jones) writes:
>The term "context-free", as Chomsky defined it is not very useful
>to us computer types, because no one ever, ever writes grammars any
>other way. To him, "context-free" meant that a rule could only
>have one symbol on the left-hand side. "Context-sensitive" meant that
>production rules could have strings of symbols on the left:
>
> a X b <= a x y b
>
>In the above, the a and b form a "context" for the reduction
>X <= xy.

Precisely! But, gee... I've considered myself a computer type for, oh, 12
years or so, and it's useful to me. How you write grammars depends on
what you're going to use the grammars for. I don't think anyone could
write a compiler, for example, without being at least familiar with
terms of formal language theory like "context-free" and "context-sensitive."

>
>I haven't been following the discussion, but I suspect that the
>correspondents were not alluding to the unuseful Chomsky definitions,
>but instead were using the terms quite literally: "context-free" meaning
>"context-free". I think that is a perfectly reasonable thing to
>do, especially if one is unfamiliar with the confusing Chomsky
>definition. (I never did like the practice of "defining" something
>that already had a meaning in normal English.)

If you *had* been following the discussion, you'd have seen that my original
posting was a followup to someone complaining about how screwed-up C's
grammar was, that it wasn't context-free. When someone uses the words
"context-free" and "grammar" in the same sentence, I assume he knows what
he's talking about, and what he said was incorrect... K&R's grammar for C
*is* CF, and not particularly screwed-up. I've received no end of mail
and flames for that simple correction! As for "context-free" already
having a meaning in normal English, I disagree. One could say that
you need context to parse C, but if you use the phrase "context-free" in
the realm of computer science, it's because you've heard it before, and
you should find out what it means before using it.

Bob Hearn
he...@claris.com

P.S. I'll chill when I stop getting flamed.

Bob Hearn

unread,
Mar 22, 1989, 9:16:13 PM3/22/89
to
In article <12...@watdragon.waterloo.edu> gvco...@watdragon.waterloo.edu (Gordon V. Cormack) writes:
>
>A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite
>set of nonterminal symbols, T (the alphabet) is a finite set of terminal
>symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ...
>where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol.
>
>The K&R grammar does not have a finite alphabet.
>--
>Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1
>gvco...@waterloo.EDU gvco...@uwaterloo.CA gvco...@water.BITNET

Oh. That's nice to know. From Scott, who is looking over my shoulder:
"Why are even bothering to argue with these idiots?"
Good question.

Bob Hearn
he...@claris.com

Dave Jones

unread,
Mar 22, 1989, 9:53:28 PM3/22/89
to
From article <91...@claris.com>, by he...@claris.com (Bob Hearn):
> In article <29...@goofy.megatest.UUCP> djo...@megatest.UUCP (Dave Jones) writes:

>
> If you *had* been following the discussion, you'd have seen that my
> original posting was a followup to someone complaining about how
> screwed-up C's grammar was, that it wasn't context-free. When someone
> uses the words "context-free" and "grammar" in the same sentence, I

> assume he knows what he's talking about, ...

What a silly assumption. I almost never assume that people who post
to the net know what they are talking about. But what you really
mean, I think, is that you want to use the term "context-free" exactly
the way Chomsky did. Okay, that's your right. But don't be surprised
or offended if someone doesn't know it. Just explain.

I taught two terms of formal language theory during a two year stint
as Visiting Associate Prof at Ohio University. I'm sure I brought the
subject up, but only to try to avoid the kind of discussion that is
going on here. We never talked about (Chomsky) context-sensitive
grammars again. I suspect that for most students, that small piece of
information got lost along the way.

Context-sensitive grammars, (in the Chomsky sense), are just not useful.
I really don't know why people continue to use his definitions.

I just now pulled a new compiler book off the shelf, completely at random.
It does pay lip-service to the Chomsky definitions. But of its 811 pages,
it devotes less than one quarter of page 97 to context-sensitive
grammars! First it gives a very brief definition, then the following:


While context-sensitive grammars and type-0 grammars are more
powerful than context-free grammars, they are also far less
useful. The problem is that efficient parsers for these extended
grammar classes do not exist, and without a parser there is no
way to use a grammar definition to drive a compiler.

[ Two sentences about parsers for context-free grammars omitted... ]

Whenever we mention a grammar (without saying which kind), the
grammar will be assumed to be context-free.


> [...]


> I've received no end of mail and flames for that simple correction!

Take my advice. Quit worrying about the flames you receive. Flamers
are small minded bigots, pedants, and idiots. Just ignore them.
(Or better yet, *apologize* sincerely. God, that irritates them!)

Dave J.


P.S. Spring has sprung. I probably won't be around the net much
in the next few weeks.

Ge Weijers

unread,
Mar 23, 1989, 7:12:59 AM3/23/89
to
From article <91...@claris.com>, by he...@claris.com (Bob Hearn):
>>I'm looking in the back of K&R (first edition), and if I'm not mistaken I'm
>>seeing context-sensitive stuff in the grammar.
> Enough of this bullshit! How many times do I have to say it??? The grammar
> in K&R is context-free. Do you know what a context-free grammar *is*?

Yeah, here is my regular C grammar (a regular expression, really):

c :
| letter c
| digit c
| /* etc. */
;

Okay, it matches a lot more than C, but all legal C programs are acceptable.
So are all programs in all other programming languages. Useful!
The grammar in K&R is in the same category. It accepts all legal C programs,
plus a lot more. Because of current compiler technology we use an artificial
separation between syntax and static semantics. Static semantics can be
expressed in more powerful grammar types (2-level van Wijngaarden grammars etc.)
Your note is correct, but so is the observation that some context-sensitive
features are suggested by using different names for the same terminal, nl.
identifier. K&R use non-terminal names judiciously to suggest more than they
are really telling.
C can not be described exactly by a CFG, so C is not a context-free language.
Usually this type of confusion starts because one party is talking about
a grammar, whilst the other party is really talking about (an exact grammar of)
the language.

Ge' Weijers, KUN Nijmegen, the Netherlands
g...@cs.kun.nl

Ric Holt

unread,
Mar 23, 1989, 9:44:59 AM3/23/89
to
#>>A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite
#>>set of nonterminal symbols, T (the alphabet) is a finite set of terminal
#>>symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ...
#>>where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol.
#>"Why are even bothering to argue with these idiots?"
#>Good question.
#>
#>Bob Hearn
#>he...@claris.com

In my modest opinion, Bob Hearn is doing a fine job of entertaining us
with his emotional outbursts, but a considerable disservice with his
claims. Likely, Cormack is correct and Hearn is wrong. Likely
C's grammar is not Context Free.

I believe Cormack (in his academic way) has proven Hearn wrong. I believe
C's grammar has the appearance of a context free grammar but that it
violates the mathematical def of context free.

This question (is C's grammar context free) is
important and we should get a clear answer to it rather than outbursts.
Could someone please settle this definitely?
Ric Holt

Aubrey McIntosh

unread,
Mar 23, 1989, 11:02:52 AM3/23/89
to
In article <12...@watdragon.waterloo.edu> gvco...@watdragon.waterloo.edu (Gordon V. Cormack) writes:
>
>A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite
>set of nonterminal symbols, T (the alphabet) is a finite set of terminal
>symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ...
>where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol.
>
>The K&R grammar does not have a finite alphabet.
>--
>Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1
>gvco...@waterloo.EDU gvco...@uwaterloo.CA gvco...@water.BITNET

I haven't programmed in C, but I do have a degree in Math, and have read
the source of some compilers, as well as portions of Aho and whozits.
So please take this as a real request for information.

As I read your message, and ponder, it seems to me that the implication is
that, in C, the user can define new Terminals at whim. (or alternately it
is infinite to begin with :-) ).

If this is, in fact, the correct implication, would you send a trivial
example?



--
--------------------------------------------------------------
Whatcha call a boomerang that doesn't come back? A stick.
1-(512)-346-5781 (v) using Modula-2. enquiries welcome.
Austin, TX 78759 ...!cs.utexas.edu![oakhill|kvue]!val!aubrey

Bob Hearn

unread,
Mar 23, 1989, 1:48:07 PM3/23/89
to
In article <89Mar23.094...@turing.toronto.edu> ho...@turing.toronto.edu (Ric Holt) writes:
>In my modest opinion, Bob Hearn is doing a fine job of entertaining us
>with his emotional outbursts, but a considerable disservice with his
>claims. Likely, Cormack is correct and Hearn is wrong. Likely
>C's grammar is not Context Free.

OK, OK, I'm sorry. I'm new to the net, and I wasn't expecting to get flamed
*quite* so much for one simple correction... it just became very annoying.
I'll try to restrict my responses to computer science from now on.

>I believe Cormack (in his academic way) has proven Hearn wrong. I believe
>C's grammar has the appearance of a context free grammar but that it
>violates the mathematical def of context free.
>
>This question (is C's grammar context free) is
>important and we should get a clear answer to it rather than outbursts.
>Could someone please settle this definitely?
> Ric Holt

Good idea. Although I think this whole argument is founded on
misunderstandings. It all depends on what you mean by "context-free
grammar." If you follow the technical definition, then there is no
question that C's grammar (as specified in K&R) is context-free. As
for Cormack's claim that C does not have a finite set of terminals, I
can't say that I really understand what he means. Perhaps he is
referring to the fact that there is an infinite number of identifiers?
Well, if we restrict ourselves to K&R, then the terminal in question is
simply (identifier), and it does not decompose further. If we want
to make this grammar more concrete and have it generate the actual
language that a compiler accepts, then we would have identifiers producing
strings over the char. set. In this case, the char. set is the set
of terminals, and it is certainly finite.

BTW, the real argument here is not whether C's grammar satisfies some
mathematical definition. It does, but it is not a particularly relevant
piece of information. The only reason I claimed it *did* was that
someone, a long time ago, made some comment on how screwed up C's grammar
was, and that it wasn't context-free. Well, as I don't think C's grammar
is particularly screwed up relative toother languages' grammars, I
responded. I happen to like C, and I don't like C-bashing. Guess I
learned my lesson, huh?

Bob Hearn
he...@claris.com

Steve Vegdahl

unread,
Mar 23, 1989, 2:24:57 PM3/23/89
to
In article <91...@claris.com>, he...@claris.com (Bob Hearn) writes:
> Enough of this bullshit! How many times do I have to say it??? The grammar
> in K&R is context-free.
...

> Now, before anyone else wants to tell me that K&R's grammar for C is not
> context-free, will you *please* go and look up the definition of context-free
> grammar??? Hint: anything that can be expressed in BNF is context-free!!

I detect that we are picking nits here. I think that we all know what is
meant by the claim that the K&R grammar is not context-free. It has been
noted that the K&R grammar is *ambiguous*; hence, its specification of
how programs should be parsed is *ill-defined*. Additional
"context-sensitive" specifications are added to the K&R grammar in the form
of English prose (in the second edition, anyway; the first edition
seems to ignore the ambiguity problem).

Effectively, the grammar in K&R is "BNF + English prose". Taking this
view, the K&R grammar is not context-free.

Steve Vegdahl
Computer Research Lab
Tektronix Labs
Beaverton, Oregon

Gordon V. Cormack

unread,
Mar 22, 1989, 6:20:18 PM3/22/89
to
In article <91...@claris.com>, he...@claris.com (Bob Hearn) writes:
> >
>
> Enough of this ********! How many times do I have to say it??? The grammar

> in K&R is context-free. Do you know what a context-free grammar *is*?

A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite


set of nonterminal symbols, T (the alphabet) is a finite set of terminal
symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ...
where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol.

The K&R grammar does not have a finite alphabet.

David Gudeman

unread,
Mar 23, 1989, 5:04:21 PM3/23/89
to
In article <91...@claris.com> he...@claris.com (Bob Hearn) writes:
>Good idea. Although I think this whole argument is founded on
>misunderstandings. It all depends on what you mean by "context-free
>grammar." If you follow the technical definition, then there is no
>question that C's grammar (as specified in K&R) is context-free.

I agree, that this is founded on misunderstandings, but I think you've
got the wrong misunderstanding. I believe that C's grammar is _not_,
in fact context free, though the grammar in K&R is. How can this be?
I'm glad you asked. The grammar given in K&R isn't technically
correct. Don't take offense, I'm not making a judgement on this
practice, just pointing it out.

One inaccuracy is in the production

typedef-name : identifier

I don't know if there are others. "typedef-name" occurs in the production

type-specifier: ... | typedef-name

and a mechanical reading of this implies that _any_ identifier can
occur as a type-specifier. This is not the case. In fact, an
identifier can only be a type-specifier if it has previously been
declared with a typedef. I can't prove it, but I believe you need a
context-sensitive grammar to specify this.

So both claims made in this argument are correct:

(1) the grammar of C is not context free
and
(2) the grammar given in K&R is context free

Bob Hearn

unread,
Mar 23, 1989, 7:12:40 PM3/23/89
to
In article <98...@megaron.arizona.edu> gud...@arizona.edu (David Gudeman) writes:
>I agree, that this is founded on misunderstandings, but I think you've
>got the wrong misunderstanding. I believe that C's grammar is _not_,
>in fact context free, though the grammar in K&R is. How can this be?
>I'm glad you asked. The grammar given in K&R isn't technically
>correct. Don't take offense, I'm not making a judgement on this
>practice, just pointing it out.
>
>One inaccuracy is in the production
>
> typedef-name : identifier
>
>I don't know if there are others. "typedef-name" occurs in the production
>
> type-specifier: ... | typedef-name
>
>and a mechanical reading of this implies that _any_ identifier can
>occur as a type-specifier. This is not the case. In fact, an
>identifier can only be a type-specifier if it has previously been
>declared with a typedef. I can't prove it, but I believe you need a
>context-sensitive grammar to specify this.
>

Yes, this is another big misunderstanding in this discussion... I would
say that the *grammar* stipulates that a type-specifier can be an
identifier, but the *semantics* specifies that this results in a valid
program only when the identifier is a valid type name. This is really
the same problem as

{
.
.
x = 1;
.
.
}

which is correct only if x has been previously declared. I think here
one would always call this an undeclared identifier error, a type of
semantic error, rather than a syntax error. And you're quite right that
it takes a context-sensitive grammar to generate the set of VALID C programs
in this sense! My point is that this is true for just about any language
you can name, and there is nothing particularly odd about this aspect
of C.

In any case, this discussion is getting old... I vote we talk about something
new. This is comp.lang.misc, not comp.lang.C.grammar, right? :-)
So, anyone designed any good languages lately?

Bob Hearn
he...@claris.com

David Gudeman

unread,
Mar 24, 1989, 3:24:23 AM3/24/89
to
In article <91...@claris.com> he...@claris.com (Bob Hearn) writes:
]In article <98...@megaron.arizona.edu> gud...@arizona.edu (David Gudeman) writes:
]>
]>One inaccuracy is in the production

]>
]> typedef-name : identifier
]>
]>I don't know if there are others. "typedef-name" occurs in the production
]>
]> type-specifier: ... | typedef-name
]>
]>and a mechanical reading of this implies that _any_ identifier can
]>occur as a type-specifier. This is not the case. In fact, an
]>identifier can only be a type-specifier if it has previously been
]>declared with a typedef.
]
]Yes, this is another big misunderstanding in this discussion... I would

]say that the *grammar* stipulates that a type-specifier can be an
]identifier, but the *semantics* specifies that this results in a valid
]program only when the identifier is a valid type name. This is really
]the same problem as [undeclared identifier]

It sounds as though you are defining the semantics of a language as
any part that cannot be described with a context free grammar. Of
course this makes you trivially correct, but does not affect the
opinions of people who criticize C based on a different definition of
syntax and semantics. Also, you cannot treat the C grammar as a
formal grammar that is only meant to accept strings from a formal
language. The grammar is intended to show syntactic categories with
semantic attachments.

The problem of undeclared identifiers is common to many programming
languages, but the typedef problem is more serious. An identifier is
always parsed into category "identifier" regardless of whether it has
been declared. However, typedefs are ambigous. For example:

typedef int foo;
int i;

void f()
{ foo *i;
...
}

The K&R grammar is ambigous on how to parse this, while any C
programmer would expect "i" to be a local pointer to an int. Once an
identifier has been declared as a type, it should no longer be an
identifier, but rather have a different syntactic class. No, this
does not affect the language accepted by the grammar. But it does
affect the syntactic category of the whole string "foo *i;".

Because of the differences in syntactic categories, most people would
say that

static foo x;

should be a syntax error if foo has not been typedefed. You can argue
that, but like I said before, you are only defining terms to fit your
views.

>In any case, this discussion is getting old... I vote we talk about something
>new. This is comp.lang.misc, not comp.lang.C.grammar, right? :-)

The following should not be taken as a flame, I'm just making an
observation: If you want to stop the discussion, you don't have to
reply. It doesn't work to write an article and then suggest that no
one respond to it after you have gotten in the last word.

Hank Dietz

unread,
Mar 24, 1989, 12:34:29 PM3/24/89
to
[stuff as to CFGs needing a finite set of terminals, and the K&R grammar having
an infinite set of terminals.]

>
>This question (is C's grammar context free) is
>important and we should get a clear answer to it rather than outbursts.
>Could someone please settle this definitely?
> Ric Holt


Clever thought, but wrong. The *grammar* in K&R only talks of terminals
which are taken from a finite set of lexical types, not of the inifinite set
of possible lexemes whose structure is defined within the surrounding text
but not in the grammar.

In as much as it is given, the grammar for C in K&R IS context free; some of
the stuff defined in the text around it might not be (e.g., lexical
analysis), but the whole issue isn't worth this network bandwidth. C is
easily parsed using any of LL(1), LALR(1), etc., with only the usual lexical
trickery (AKA the usual symbol table mechanism).

-ha...@ee.ecn.purdue.edu

Norman Graham

unread,
Mar 24, 1989, 5:58:42 PM3/24/89
to
From article <98...@megaron.arizona.edu>, by gud...@arizona.edu (David Gudeman):
> [stuff deleted about how the grammar specifies that
> any identifier can serve as a type-specifier]

>
> So both claims made in this argument are correct:
>
> (1) the grammar of C is not context free
> and
> (2) the grammar given in K&R is context free

This is an interesting point (ie. one I hadn't thought of before :-).
In theory, a grammar is a generator for strings in a language, not
a recognizer for a language (eg. various forms of automata). So
the grammar in K&R is context-free, but it doesn't generate the C
language--in addition to valid C programs it also generates invalid
C programs. Of course, when this grammar is used as the basis of
a compiler, the semantic analyzer should detect these invalid
programs. This makes sense to me... how about anyone else out there?

Theodore Stevens Norvell

unread,
Mar 24, 1989, 8:15:50 PM3/24/89
to
In article <12...@watdragon.waterloo.edu> gvco...@watdragon.waterloo.edu (Gordon V. Cormack) writes:
>A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite
>set of nonterminal symbols, T (the alphabet) is a finite set of terminal
>symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ...
>where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol.
>
>The K&R grammar does not have a finite alphabet.

I'm sure that Gordon Cormack has something in mind here because I know
he's a smart guy, but I'm darned if I can figure out what it is. In
any case, I'm sure that the K&R grammar only uses a finite subset of its
alphabet, because otherwise it wouldn't fit in my desk.

So Gordon, if your still there after being insulted in such an uncalled for
way, please enlighten us.

As I recall, the grammar in K&R [1978] has a single terminal "identifier"
which includes all identifiers, a nonterminal called "typedef_name"
defined by the production
typedef_name --> identifier
This leads to an ambiguity because phrases like
t * i ;
can be understood to be either declarations or statements. But ambiguous
grammars have been with us a long time. People even use parser generators
that accept ambiguous grammars! In fact there are other ambiguities
in the grammar (e.g. the dangling else). Ambiguities don't change the
set of strings admitted by a grammar they merely admit some strings in
more than one way.

The real question is what do you want to call syntax and what do you
want to call static semantics. In the absence of any definitive definition,
this is a matter of taste and convenience to the implementor. Is
the following (complete) compilation unit syntactically correct?
f() { t * i ; int a; }
The grammar says yes; most compliers say "syntax error". If we
change the grammar to make typedef_name a terminal and remove its
only production, then the grammar agrees with the compilers.
The grammar is of course still context free, but the ambiguity is
gone.

Ric Holt raises the separate question of whether the C language is
context free. The syntax of C certainly is. The problem with
C is that it is not "feedback free" in the following sense. The
parsing of C depends on the lexing of C (this is only natural),
but the lexing depends on the parsing. Of course, with the original
K&R grammar, C _is_ feedback free which is no doubt why Ritchie
wrote the grammar that way.

Summary:
-The K&R [1978] grammar is context free, but describes the wrong language.
-The C language syntax (abstracted from lexing and static semantics)
is also context free.
-The usual implementations and most useful definitions of C language
are not "feedback free".
-We're still in the dark about infinite alphabets.

Theo Norvell

(The term "feedback free" is used in a completely different sense by Holt.
My apologies for stealing it. By the time I realized I was using it
differently, I liked it too much to give it back. In Holt's sense, C is
feedback free. See the Turing Programming Language Design and Definition
for his definition. In fact, see it anyway, if you're interested in
design or definition.)

pa1162

unread,
Mar 26, 1989, 6:09:18 PM3/26/89
to
Bob,
I'm only a student, so don't think I have pretensions to telling you folks
you're business, but the grammar in K&R probably isn't even the grammar
for C... as I think you or someone else mentioned... it gets tweaked
a bit if you want to try to write a compiler from it... so worry about
whether it is context-free is immaterial. Why don't we all open our
K&R (old edition) to page 214 and read the note at the heading of section 18
"Syntax Summary". It says that the syntax summary given isn't
meant as an exact statement of the language. This isn't meant to
be a flame... it's an attempt at a distinction because I think you and
whoever are arguing at cross-purposes. The discussion is either
about context-freeness and grammars or about non-precise syntax summaries
and it's really not clear which one. As I said I am yet a student...
but I can see the structure of an argument when it hits me in the face.
I recommend somone clarify the and separate the matters at hand so that
all this "stuff" can be separated.
-Thomas Kammeyer (A student who likes to stick his nose into discussions)

John Woods

unread,
Mar 26, 1989, 6:12:00 PM3/26/89
to
In article <91...@claris.com>, he...@claris.com (Bob Hearn) writes:
> >C's grammar is not Context Free.
> grammar." If you follow the technical definition, then there is no
> question that C's grammar (as specified in K&R) is context-free.

I think the crux of the argument of the non-context-free crowd is that
the grammar in K&R does NOT actually specify the C language by itself, not
as a grammar. It relies on the trick of telling the lexer about typedefs,
which is a form of back-door context.

Consider: if you wrote a YACC specification of C according to the context-free
grammar, but omitted any kind of action clauses (perhaps you want to turn on
debugging and watch the state machine play, a fine entertainment for a rainy
day), it would NOT be able to correctly parse a C program with typedefs (unless,
of course, you hid a "real" C grammar (with actions and the ever present
back-door gimmick) in the lexer itself).

When the pro-context-free crowd's argument is presented as "So what; it is
close enough, and it works", the argument in convincing. When it is presented
as "Yeah, well, your mother wears army boots!", it isn't.
--
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, jo...@frog.UUCP, ...!mit-eddie!jfw, j...@eddie.mit.edu

Remainder Khomeini!

Chris Torek

unread,
Mar 27, 1989, 11:03:18 AM3/27/89
to
Maybe this will work best a bit at a time.

First, our definition:

In article <12...@watdragon.waterloo.edu> gvco...@watdragon.waterloo.edu
(Gordon V. Cormack) writes:

>A context-free grammar G is a 4-tuple G = <N,T,P,S> where N is a finite
>set of nonterminal symbols, T (the alphabet) is a finite set of terminal
>symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ...
>where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol.

In article <91...@claris.com> he...@claris.com (Bob Hearn) writes:
>... I think this whole argument is founded on misunderstandings.

It is.

>It all depends on what you mean by "context-free grammar."

Most people mean the definition above.

>If you follow the technical definition, then there is no
>question that C's grammar (as specified in K&R) is context-free.

Aha! But the grammar in K&R 1st ed. is incorrect.

In article <98...@megaron.arizona.edu> gud...@arizona.edu (David Gudeman)
writes:

>One inaccuracy is in the production
> typedef-name : identifier
>I don't know if there are others.

(I recall a correction or two elsewhere, but my K&R is in my office and
I am not.)

>So both claims made in this argument are correct:
>(1) the grammar of C is not context free
>and
>(2) the grammar given in K&R is context free

Returning to <91...@claris.com> (Bob Hearn):
>... As for Cormack's claim that C does not have a finite set of terminals,

(this refers to the line `The K&R grammar does not have a finite alphabet.'
in <12...@watdragon.waterloo.edu>)

>I can't say that I really understand what he means. Perhaps he is
>referring to the fact that there is an infinite number of identifiers?

Rather, that there are an unbounded (not infinite, although it comes
out the same) number of type declarators, via the `typedef' mechanism.

>Well, if we restrict ourselves to K&R, then the terminal in question is
>simply (identifier), and it does not decompose further.

(He then suggests splitting it into letters, etc., a simple mechanical
task.)

Unfortunately, the terminal in question is the `identifier' in
`typedef-name : identifier'. This is, as has been noted, inaccurate.
The key is that only those identifiers which (a) have been previously
defined via `typedef' and (b) have not been hidden by new local
declarations are in fact typedef names. This information can only be
found by context (oops, there is that word again), and if the size of
the input string (C code) is unbounded, there is no way to capture this
in a CFG (using our definition above). If we limit the length of the
language to be recognised---to any fixed number of symbols, as long as
it is fixed---we can construct a CFG for this subset of C.

Fortunately, none of this really matters.

As someone else pointed out (in an article I forgot to save), we
generally do not write grammars that accept only strictly correct
inputs. Instead, we take some shortcuts and patch things up via
semantics. This approach works quite well for C, although a number of
compilers do it wrong (including PCC) and patch the grammar by having
the lexer return `type' instead of `identifier' when the identifier has
been defined via typedef. In particular, this prevents

typedef int temperature;
f() { auto double temperature; ... }

from compiling correctly. (PCC does handle

f(temperature) double temp; { ... }

correctly, by turning off typedef name recognition in the lexer during
argument parsing. It could do the same after type keywords in local
variable declarations, and I am not sure why it does not---it may have
done a lookahead already, perhaps. But that is not the only place
PCC has been seen to fall short....)

>someone, a long time ago, made some comment on how screwed up C's grammar
>was, and that it wasn't context-free. Well, as I don't think C's grammar
>is particularly screwed up relative to other languages' grammars, I

>responded. ...

Strictly speaking (as long as you stick with `typedef produces a new
type specifier'), C's grammar is not context free; but it is indeed
not seriously `screwed up'. Perfect generators are not necessary, and
a language `close enough to C' exists that can be parsed simply.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: ch...@mimsy.umd.edu Path: uunet!mimsy!chris

Chris Torek

unread,
Mar 27, 1989, 12:09:54 PM3/27/89
to
Incidentally, now that I have said all that (some might say `too much'
:-) ), the only real difference between a C grammar that accepts
undeclared typedef names and a Pascal grammar that accepts undeclared
variable names is that, for some reason, people like to think of type
declarators as keywords and variables as identifiers. At least, this
is the only explanation I can see that we never argue whether Pascal's
grammar is context free. (If you require the parser to catch undeclared
variables, it is not, just as C's is not if you require the parser to
catch undeclared typedef names.)

Chris Torek

unread,
Mar 27, 1989, 12:54:37 PM3/27/89
to
Oops, in amongst all that verbiage, I forgot to answer the other
obvious question: Why, if there is a reasonable CFG that accepts
all correct C programs (and some incorrect ones as well, that can
be rejected by the semantics instead), does PCC use something else?

Our `impure' CFG includes (among others) the following productions
(rearranged and simplified to make the problem obvious):

statement : '{' optional-declarators statement-list '}'
statement : expression
declarator : typedef-name '*' identifier // i.e., a pointer
typedef-name : identifier
expression : identifier '*' identifier // i.e., multiply

Thus, there are two correct parses for the input string

{ t * x; }

One declares the variable `x' as a pointer to the typedef'd type `t';
the other multiplies the value of `t' by the value of `x'.

It seems our impure C CFG is ambiguous.

Well, we never said otherwise....

As a practical matter, we want the compiler to choose the declarator
parse if and only if `t' is in fact an active typedef name. Fortunately,
there are a number of ways to gimmick a parser to do this.

Andrew K. Wright

unread,
Mar 28, 1989, 10:17:48 AM3/28/89
to
In article <16...@mimsy.UUCP> ch...@mimsy.UUCP (Chris Torek) writes:
>Incidentally, now that I have said all that (some might say `too much'
>:-) ), the only real difference between a C grammar that accepts
>undeclared typedef names and a Pascal grammar that accepts undeclared
>variable names is that, for some reason, people like to think of type
>declarators as keywords and variables as identifiers. At least, this
>is the only explanation I can see that we never argue whether Pascal's
^^^^^^^^^^^^^^^^^^^^^^^^^^

>grammar is context free. (If you require the parser to catch undeclared
>variables, it is not, just as C's is not if you require the parser to
>catch undeclared typedef names.)

The difference I see is that a C grammar that accepts undeclared typedef
names destroys more of the structure of the grammar than a Pascal
grammar that accepts undeclared variable names. Consider the following
C expression:

x = (a)(b);

If "a" is a typedef name, then the right side is a cast. If "a" is
not a typedef name, then the right side is a function call.
I dont believe similar situations exist for Pascal (corrections welcome).
As G.V. Cormack notes, any language can be parsed by some context free
grammar; we are interested in grammars which represent the semantic
structure of the language as closely as possible.

About "C's grammar [not] being screwed up": with a few simple kludges
such as recognizing typedef names in the lex, it is fairly
easy to build a grammar which has sensible structure for C.
Or is it? Some points were raised about where PCC does/doesnt turn off
typedef recognition, and this seems ill-defined.
(Anybody know what the ANSI draft says?)
If having parts of your language ill-defined does not bother you,
consider trying to build a suffix or reverse parser for C. Such
a parser might be used in a syntax directed editor, or for syntax
error recovery. Such parsers cannot preserve the structure of
a C grammar because they do not have available the information required
to tell if an identifier is a typedef or not.

Andrew K. Wright akwr...@watmath.waterloo.edu
CS Dept., University of Waterloo, Ont., Canada.

Chris Torek

unread,
Mar 28, 1989, 1:39:32 PM3/28/89
to
>In article <16...@mimsy.UUCP> I included the throwaway% line:
>>... At least, this is the only explanation I can see that we never
> ^^^^^^^^^^^^^^^^^^^^^^^^^^

>>argue whether Pascal's grammar is context free.
-----
% cf. Webster: `throw.away \'thro--*-.wa-\ n : a handbill or circular
distributed free'; applied to writing (especially comic writing), it
refers to a line with similar worth.
-----

In article <12...@watdragon.waterloo.edu> akwr...@watdragon.waterloo.edu


(Andrew K. Wright) writes:
>The difference I see is that a C grammar that accepts undeclared typedef
>names destroys more of the structure of the grammar than a Pascal
>grammar that accepts undeclared variable names. Consider the following
>C expression:
>
> x = (a)(b);
>
>If "a" is a typedef name, then the right side is a cast. If "a" is
>not a typedef name, then the right side is a function call.
>I dont believe similar situations exist for Pascal (corrections welcome).

I am not sure what you mean by `destroys more of the structure of the
grammar'. It is ambiguous, and in particular it is an ambiguity of an
uncomfortable sort, quite unlike the shift/reduce ambiguity for if/then/else
which everyone resolves by taking the shift. Yacc permits this reduce/
reduce conflict, but resolves it by taking the first grammar rule; we
want the parser to take the *appropriate* grammar rule, which is sometimes
the first and sometimes not. So yacc unaided will not produce a
deterministic parser that does what we want.

BUT all this applies to constructing a deterministic LALR(1) parser,
which is not at all the same thing as having a context free grammar!

Kenneth Walker

unread,
Mar 28, 1989, 10:55:48 PM3/28/89
to
In article <12...@watdragon.waterloo.edu>, akwr...@watdragon.waterloo.edu (Andrew K. Wright) writes:
> About "C's grammar [not] being screwed up": with a few simple kludges
> such as recognizing typedef names in the lex, it is fairly
> easy to build a grammar which has sensible structure for C.
> Or is it? Some points were raised about where PCC does/doesnt turn off
> typedef recognition, and this seems ill-defined.
> (Anybody know what the ANSI draft says?)

(These comments are based on the July 1988 draft.) The standard allows
typedef names to be redeclared within an inner scope:

typedef int x;
{
char *x;
}

The only restriction is that the type must be explicily given. That is
you cannot redeclare it like:

static x;

One solution is to define the identifier in the declarator to
be any identifier:

any_ident: Identifier | Typedef_name;

rather than turn off typedef recognition in the lexical analyzer. This is
also needed for labels. However, this simple change to the grammar
presented in the standard does not result in an LALR1 grammar. I found
the transformations required to get a grammar that yacc would accept
quite difficult to find and the result was not very elegant. So my
answer to the question "Or is it?" is NO!

Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721
+1 602 621 2858 kwa...@Arizona.EDU {uunet|allegra|noao}!arizona!kwalker

Clive

unread,
Mar 29, 1989, 7:30:33 AM3/29/89
to
In article <890325011...@yorkville.csri.toronto.edu>, nor...@csri.toronto.edu (Theodore Stevens Norvell) writes:
> The real question is what do you want to call syntax and what do you
> want to call static semantics. In the absence of any definitive definition,
> this is a matter of taste and convenience to the implementor.

For a new approach to this, look at:

"The Revised Report on the Algorithmic Language Algol 68".
- van Wijngaarden et al

This approached this problem by defining the grammar using another (context-free)
grammar (hence the term 'two-level grammar'). This meant that things like not using
a variable before it was defined were specified in the *grammar* - very nifty.
To do this, the second level grammar had a non-terminal called NEST which expanded
to sequences like "int a char b" and so on. Productions of the first level grammar
had NESTs scattered in them (translating to C):

TYPE NEST variable : TYPE NEST IDENTIFIER, where TYPE IDENTIFIER declared in NEST .

(comma means "and"). The production

where TYPE IDENTIFIER declared in NEST

expands to "empty" if the variable was declared, and to a nonterminal with no
production rule if it isn't - thus the grammar can't generate a program with an
undeclared variable. (Algol 68 had user-definable types, like C).

They did all kinds of things with this - type checking of assignments and operator
arguments, and context sensitive automatic type conversion come to mind.

Pity it was so hard to parse :-)
--
Clive D.W. Feather cl...@ixi.uucp
IXI Limited ...!mcvax!ukc!acorn!ixi!clive (untested)
+44 223 462 131

Norman Graham

unread,
Mar 31, 1989, 7:36:09 PM3/31/89
to
From article <1...@ixi.UUCP>, by cl...@ixi.UUCP (Clive):

> This approached this problem by defining the grammar using another (context-free)
> grammar (hence the term 'two-level grammar'). This meant that things like not using
> a variable before it was defined were specified in the *grammar* - very nifty.


Of course these two-level grammars are *substantially* more powerful
than context-free grammars. Two-level grammars are capable of generating
recursively-enumerable sets (ie. those things that can be recognized only
by something with the power of a Turing machine). This class of languages
is much larger than both the Context-Free and Context-Sensitive languages.

However, Two-level grammars can also be used to generate the Context-Free
and Context-Sensitive languages that we use for programming. Two-level
grammars provide a compact and elegant (some would say 'cryptic' :-)
notation for the description of programming language syntax without
the need to escape to a natural language. Personally, I would love
to see more language designers take advantage of this tool.


--
Norman Graham Oklahoma State University
Internet: nor...@a.cs.okstate.edu Computing and Information Sciences

UUCP: {cbosgd, rutgers} 219 Mathematical Sciences Building
!okstate!norman Stillwater, OK 74078-0599

Chris Torek

unread,
Apr 2, 1989, 1:01:01 PM4/2/89
to
In article <11...@frog.UUCP> jo...@frog.UUCP (John Woods) writes:
>I think the crux of the argument of the non-context-free crowd is that
>the grammar in K&R does NOT actually specify the C language by itself, not
>as a grammar. It relies on the trick of telling the lexer about typedefs,
>which is a form of back-door context.

For practical parsing (using, e.g., yacc), yes; for CF-ness, no. The
*real* crux of the argument is that while you can write a decent, simple,
no-feedback-through-lexer CFG that accepts all C programs, you can *not*
write a decent, simple, no-feedback LALR(1) parser to do this, because
the simple CFG is ambiguous.

Those who say `C does not have a CFG' are really saying `C does not
have a trivially yacc-able grammar', which is true (but, as I keep
saying, really means only that we need a better parser generator than
yacc).

(And no, it does not take a nondeterministic automaton to parse C. The
little back-door trick would work wonderfully well if only you could
know, each time a yacc parser examines a token, which rule yacc was
attempting. When it tries `typedef-name: identifier', the lexer should
answer `identifier' if and only if the identifier happens to be a
typedef. Then make sure that yacc tries this rule before trying
`primary: identifier' [this part is easy] and voila! you have a
parser. Of course, yacc actually only calls the parser once for both
rules; you would need a way to change the token on the fly. A better
gimmick---less expensive in compute power, at least---would be to
associate a function (predicate) with each rule, and allow the
predicate to abort the rule. You then attach an `identifier-really-is-
typedef' predicate to the `typedef: identifier' rule, making the parser
pass over this resolution and choose instead the `primary: identifier'
rule. Indeed, the same trick would allow one to reject undeclared
identifiers, making `program foo; begin bar := 1 end.' a syntax error
instead of a semantic one---not that this is a good thing to do!)

Chris Torek

unread,
Apr 4, 1989, 12:32:37 AM4/4/89
to
In article <16...@mimsy.UUCP> I wrote:
>(And no, it does not take a nondeterministic automaton to parse C. ...
[gimmick A deleted; gimmick B:]

>associate a function (predicate) with each rule, and allow the
>predicate to abort the rule.

gvcormack points out (as I conveniently forgot to mention) that this
is actually an affix grammar, and affix grammars are not context free.

I think this is a good point for a summary.

- There is no context free grammar that recognises only valid
C programs (but this is true of many languages).

- There is a trivial context free grammar that recognises all
valid C programs, along with a number of invalid ones, but it
is ambiguous. There is probably no unambiguous CFG for C
(consider arbitrarily many parentheses around an id: is it a
cast or a value?).

- There is a trivial affix grammar (gimmick B above) that
recognises all valid C programs, along with a number of
invalid ones, which is not ambiguous and can be handled with
a modified yacc (you would have to augment yacc's output
tables and parser to retain both reductions and to know what
to call to select which is to occur).

and of course,

- There is always the wrong way. :-)

Michael Condict

unread,
Apr 4, 1989, 12:39:26 AM4/4/89
to

No, I think there is a real difference in "context-freeness" between
Pascal and C. In C compilers that are widely considered to conform to
K&R, the syntax accepted by the parser is actually modified when a typedef
is encountered and, furthermore, there is no better way to handle the
typedefs. To elaborate:

In Pascal, there is only one way to interpret a given statement or declaration,
regardless of any previous declarations.

Consider the following in C, however:

. . . { T (x); . . . }

How should the construct be parsed? Is it the declaration of
a variable x of type T (where T is a previously typedef'd identifier)?
Or is it a call to the function T with argument x?

We can't even distinguish between an executable statement and a declaration
in C without resorting to context. This makes C fundamentally
less context-free than Pascal (or FORTRAN or Ada, for that matter). The
difference can be put in practical terms by noting that it is impossible to
write a processor for C that accepts exactly those programs that would not
be considered a "syntax error", unless you put a symbol table into the
program, to remember the context of typedefs. No such requirement holds for
Pascal. For example, a program that counts executable statements in C
programs must have a symbol table for typedefs, with all the extra complexity
that implies. The corresponding program for counting statements in Pascal
is much simpler. The same would hold for a cross-reference program and any
number of other program analyzers.

To summarize, the only way you can parse C with a context-free parser is if
you don't care whether a declaration is confused with an executable statement.
This utterly destroys the usefullness of such a parse, as far as I'm concerned.
--
Michael Condict {att|allegra}!m10ux!mnc
AT&T Bell Labs (201)582-5911 MH 3B-416
Murray Hill, NJ

Marc W. Mengel

unread,
Apr 4, 1989, 10:53:30 AM4/4/89
to
In article <9...@m10ux.ATT.COM> m...@m10ux.ATT.COM (Michael Condict) writes:
>In Pascal, there is only one way to interpret a given statement or declaration,
>regardless of any previous declarations.

>Consider the following in C, however:

> . . . { T (x); . . . }

>How should the construct be parsed? Is it the declaration of
>a variable x of type T (where T is a previously typedef'd identifier)?
>Or is it a call to the function T with argument x?

...


>To summarize, the only way you can parse C with a context-free parser is if
>you don't care whether a declaration is confused with an executable statement.
>This utterly destroys the usefullness of such a parse, as far as I'm concerned.

That may destroy the usefulness in your mind, but that does not make it
non-context-free. To make a language recognizer (whose sole purpose
in life is to say "yes" this is a valid program, or "no" it is not)
you only need say yes or no for the whole program, not say what piece
is which. C may very well be not context free, or not easily context
free, but this is based on things like making sure variables are declared,
the right type, etc. (i.e. only allow p->q->w if q is a pointer member
of a structure type p is a pointer to, and the type q points to has a
member w...)

The suggestion you make, of a rule like:

decl_or_fcall : identifier '(' identifier ')' ';' ;

to spot the case you mention works well. In fact it is a useful tool
for doing things like allowing forward-referenced typedefs.

>Michael Condict {att|allegra}!m10ux!mnc

--
Marc Mengel mme...@cuuxb.att.com
attmail!mmengel
...!{lll-crg|att}!cuuxb!mmengel

Rahul Dhesi

unread,
Apr 5, 1989, 12:41:06 PM4/5/89
to
In article <9...@m10ux.ATT.COM> m...@m10ux.ATT.COM (Michael Condict) writes:
>Consider the following in C, however:
>
> . . . { T (x); . . . }
>
>How should the construct be parsed?

The parsing is always the same, from the point of view of a
context-free grammer. It's the semantics that are different. That you
assign a different meaning to the language construct based on what has
come before has nothing to do with its context-free grammar.

Context-free grammars don't care about the declaration of type names.
If they could, they wouldn't be context-free.

The confusion here arises because real languages like Pascal and C
simply *cannot* be completely defined using context-free grammars. But
they are--because the definition we use is incomplete, and we fill in
the missing parts with a set of additional rules that are not part of
the context-free grammar. The size of this set of rules may or may not
be greater for C than for Pascal.
--
Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi
ARPA: dh...@bsu-cs.bsu.edu

Peter da Silva

unread,
Apr 6, 1989, 11:49:04 AM4/6/89
to
If 'c's grammar is context free, and the context sensitivity is in the
semantics, tell me if the following is a syntax error:

x y z;
--
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, pe...@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, pe...@sugar.hackercorp.com.

Rahul Dhesi

unread,
Apr 6, 1989, 10:07:59 PM4/6/89
to
In article <37...@ficc.uu.net> pe...@ficc.uu.net (Peter da Silva) writes:
>If 'c's grammar is context free, and the context sensitivity is in the
>semantics, tell me if the following is a syntax error...

It depends on which context-free grammar you are using. Here's a
simple one:

program ::= char charlist
charlist ::= .null. | charlist char
char ::= 'a' | 'b' | ... '0' | '1' | ... '#' | '(' | .... blank | tab ... etc.

(.null. means empty string.)

My lexical analyzer always returns a single character or EOF. My
parser is pretty simple. There is only one possible error during
parsing: "premature EOF".

Semantic analysis is a nightmare :-).

Michael Condict

unread,
Apr 7, 1989, 12:20:53 AM4/7/89
to
In article <65...@bsu-cs.UUCP>, dh...@bsu-cs.UUCP (Rahul Dhesi) writes:
> In article <9...@m10ux.ATT.COM> m...@m10ux.ATT.COM (Michael Condict) writes:
> >Consider the following in C, however:
> >
> > . . . { T (x); . . . }
> >
> >How should the construct be parsed?
>
> The parsing is always the same, from the point of view of a
> context-free grammer. It's the semantics that are different. That you
> assign a different meaning to the language construct based on what has
> come before has nothing to do with its context-free grammar.
> . . .

> Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi
> ARPA: dh...@bsu-cs.bsu.edu

Everyone seems to have missed my point on this, because I used a bad example.
Consider instead the one that someone else posted:

{ T * x; . . . }

In this case (which could be the declaration of a var x of type (T *) or
could be the expression "T multiplied by x"), the parsing most certainly should
not be the same for either case, at least not if you intend to do anything
useful with the parse. One of the following two different parse trees
should be produced:

* decl
/ \ / \
T x T *
|
x

What you are ignoring is that very few parsers stand alone by themselves. They
are used in a program that wants to do something with the language. The
fundamental difference between C and other languages is that because of the
above bogosity, there is much less you can do with the language without
including a symbol table for typedefs. I can't believe this is even open
for argument. Try writing programs for C and Pascal that delete all executable
statements, printing only the declarations. The C version will have to have
a symbol table -- the Pascal version will not. Isn't that perfectly clear?

Of course we all know that no useful programming language, including C,
Pascal, FORTRAN, Ada or LISP, is context-free when we consider the exact
set of programs that are accepted by the compiler, but that was not my point.
--

Michael Condict {att|allegra}!m10ux!mnc

Chris Torek

unread,
Apr 9, 1989, 1:07:12 AM4/9/89
to
In article <9...@m10ux.ATT.COM> m...@m10ux.ATT.COM (Michael Condict) writes:
>What you are ignoring is that very few parsers stand alone by themselves.

We are not ignoring this. The topic was `context free grammars', not
`compilers and parsers'. If you want to argue that one cannot build
a useful compiler using a CFG approximation for C, that is fine (since
we have seen that this is true). Just do not do it by saying that
there is no context free grammar that recognises all valid C programs.

David Gudeman

unread,
Apr 9, 1989, 11:06:42 AM4/9/89
to
In article <16...@mimsy.UUCP> ch...@mimsy.UUCP (Chris Torek) writes:
]In article <9...@m10ux.ATT.COM> m...@m10ux.ATT.COM (Michael Condict) writes:
]>What you are ignoring is that very few parsers stand alone by themselves.
]
]We are not ignoring this. The topic was `context free grammars', not
]`compilers and parsers'. If you want to argue that one cannot build
]a useful compiler using a CFG approximation for C, that is fine (since
]we have seen that this is true). Just do not do it by saying that
]there is no context free grammar that recognises all valid C programs.

There is no context free grammar that recognises all valid C programs.
Several people have pointed out that your artificial definition of
syntax (anything that can be described with a context free grammar)
begs the issue. By my definition of syntax, and aparently that of
most other posters, C's syntax is not context free.

Kenneth Walker

unread,
Apr 9, 1989, 2:29:16 PM4/9/89
to
In article <10...@megaron.arizona.edu>, gud...@arizona.edu (David Gudeman) writes:
> There is no context free grammar that recognises all valid C programs.
> Several people have pointed out that your artificial definition of
> syntax (anything that can be described with a context free grammar)
> begs the issue. By my definition of syntax, and aparently that of
> most other posters, C's syntax is not context free.

"Context free grammar" and "recognize" have precise technical definitions;
look them up. By those definitions, there are context free grammars which
will recongnize any valid C program. The fact that the grammar may
not produce a parse which matches your definition of the syntax of the
program does not invalidate the truth of that statement. As you point
out, one should not confuse grammar and syntax.

The statement you want is: there is no unambiguous context free grammar
which will generate a parse which matches your syntax for all C programs.
That may be rather long winded, but at least it is accurate.

As far as I know, "context free syntax" does not have a widely recognized
meaning (can anyone come up with a reference to disprove this?). Clearly
you can define it in such a way that it would be correct to say that
C does not have a context free syntax. However, people will confuse the term
with "context free grammar", so I wouldn't suggest using it. It is better
to make a long winded explaination of the problem with C's syntax.

Rahul Dhesi

unread,
Apr 9, 1989, 3:32:33 PM4/9/89
to
In article <10...@megaron.arizona.edu> gud...@arizona.edu (David Gudeman)
writes:

>There is no context free grammar that recognises all valid C programs.

Well...there is no context free grammar that recognizes all valid C
programs and *no others*.

Not surprising, since there is no context-free grammar that recognizes
exactly all valid programs in *any* real programming language in common
use.

It's tempting to wonder if BASIC and FORTRAN might be exceptions, since
they do not require variables to be declared before use.
Unfortunately, if you say "GOTO 10000" in either, there had better be a
label or line number called "10000" somewhere, and no context-free
grammar can guarantee that. TECO (the text editor, whose programs are
indistinguishable from line noise by common mortals) may be another
candidate, but it too wants jump labels to exist.

Try one of the functional programming languages, in their purest form
(no jump labels, no variables). They may come close.
--
Rahul Dhesi <dh...@bsu-cs.bsu.edu>
UUCP: ...!{iuvax,pur-ee}!bsu-cs!dhesi

Gordon V. Cormack

unread,
Apr 9, 1989, 6:19:23 PM4/9/89
to

> ... total drivel not worth repeating

This is a flame.

I am amazed by the amount of noise this simple discussion has generated, and
the number of people who have accused others of not knowing anything at all.

I submit that it is for the most part the accusers, not the accused, who have
large gaps in their understanding of this discussion.
--
Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1
gvco...@waterloo.EDU gvco...@uwaterloo.CA gvco...@water.BITNET

Norman Diamond

unread,
Apr 9, 1989, 10:14:27 PM4/9/89
to
In article <9...@m10ux.ATT.COM> m...@m10ux.ATT.COM (Michael Condict) writes:
>Consider instead the one that someone else posted:
>
> { T * x; . . . }
>
>... One of the following two different parse trees

>should be produced:
>
> * decl
> / \ / \
> T x T *
> |
> x

Those are not parse trees. Parse trees would resemble the following
ones. Actually K&R's grammar would produce more complicated trees
than these, but a more complicated grammar could produce these trees.

expr decl
/ | \ / | \
T * x T * x

And it really isn't difficult to unify these cases to allow an
unambiguous grammar, leaving the resolution to the semantic stage.

expr_or_decl
/ | \
T * x

Of course, C is ugly because of the difficulty human readers have
in reading such things, but compilers (C compilers anyway) have no
sense of aesthetics.
Norman Diamond, Sony Computer Science Lab (diamond%csl.s...@relay.cs.net)
The above opinions are my own. | Why are programmers criticized for
If they're also your opinions, | re-inventing the wheel, when car
you're infringing my copyright. | manufacturers are praised for it?

Michael Condict

unread,
Apr 11, 1989, 1:44:50 AM4/11/89
to
In article <16...@mimsy.UUCP>, ch...@mimsy.UUCP (Chris Torek) writes:

I never said "there is no context free grammar that recognises all valid C
programs", because this is a vacuous statement, without further explanation.
What do you mean by "all"? Do you mean "exactly all" or "all plus more"?
What do you mean by valid? Do you mean the programs that have no "syntax
errors"? What's a syntax error? Avoid being circular when you reply. Or
do you mean the ones that pass through compilers that conform to the ANSI
(proposed) standard without any error messages? Or the ones that not only
make it through the compiler, but execute in a valid fashion (e.g., do not
attempt to divide by 0)?

If you mean the language consisting of all and only those C programs that meet
the last criterion, clearly such a language is not only not context-free,
but is not even computable. If you go to the other extreme (as you seem to be
doing) and allow the grammar to accept more than the set of syntactically
valid C programs, then of course such a language is context-free, as is
the language consisting of an arbitrary sequence of tokens from the
vocabulary of C tokens.

But so what? I attempted to inject parsers and language-analysis programs
into the discussion because otherwise the discussion has no point. If you
don't allow me to talk about what sort of use I might want to make of the
parse information, then I can't say whether I think you have a grammar for
a useful superset of the C language. If the grammar is context-free and
accepts all correctly executing, legal C programs, then it must also accept a
superset of this set, so the crucial issue is: what superset? E.g., what
errors are you failing to reject and which distinct constructs are you failing
to distinguish between (such as declarations vs. executable statements, in
"T (x);").

David Gudeman

unread,
Apr 11, 1989, 6:15:24 PM4/11/89
to
In article <10...@megaron.arizona.edu> kwa...@arizona.edu (Kenneth Walker) writes:
>In article <10...@megaron.arizona.edu>, gud...@arizona.edu (David Gudeman) writes:
>> There is no context free grammar that recognises all valid C programs.
>> ... By my definition of syntax, ... C's syntax is not context free.

>
>"Context free grammar" and "recognize" have precise technical definitions;
>look them up. By those definitions, there are context free grammars which
>will recongnize any valid C program.

Ken, did you notice who you were responding to? I think you know that
I could give the technical definitions of "context free grammar" and
"recognize" without looking them up.

>...The statement you want is: there is no unambiguous context free grammar


>which will generate a parse which matches your syntax for all C programs.
>That may be rather long winded, but at least it is accurate.

No, that's not the statement I want. I think the argument revolves
around one's definition of "C's syntax". Let CF-C be the context free
language recognized by the C grammar in K&R. Here is the argument you
seem to be making:

(1) the syntax of C is CF-C
(2) CF-C is context free
(3) therefore, the syntax of C is context free.

Here is the argument _I_ am making:

(1) the syntax of C is a proper subset of CF-C, called CS-C.
(2) there is no context free grammar which recognizes exactly the
language CS-C.
(3) therefore, the syntax of C is not context free.

Both arguments are sound, and the point of disagreement is statement
(1). Of course I can't claim that your definition of C's syntax is
_wrong_, but I claim that it is not useful.

I think the most useful definition of the syntax of a programming
language is:

Definition: the set of strings that do not produce syntax errors
when compiled by a correct compiler for the language.

This is not unambigous since different correct compilers may produce
slightly different sets of syntax errors. But it hard to imagine a C
compiler that does not give a syntax error for the declaration:

newtype x;

where "newtype" has not been declared with a typedef.

Micha Berger

unread,
Apr 11, 1989, 7:22:02 PM4/11/89
to
> Consider the following in C, however:

> . . . { T (x); . . . }

> How should the construct be parsed?

It would have to be an expression, since anything within brackets ain't a
decleration. This isn't Pascal kiddees, all functions are global.

> Consider instead the one that someone else posted:
>
> { T * x; . . . }

This just shows that your grammar is poor. Obviously it has to be the
interpreter's job to distinguish between decleration and expression.

I don't see any qualitative difference between typedef's and variable
decleration. An identifier is an identifier. Are you going to tell me
that Pascal isn't CF because A*B could have diffent meanings, because
multiplying integers creates different code than multiplynig reals?

Another thing. No language is a CFG. A language just isn't a grammar.
In order to prove that C isn't A CFL, you'ld have to prove that there
doesn't exist a CFG that only accepts valid C programs. A grammar
doesn't care what you do with each statement, so who cares how it would
be parsed? You'll just get a very complicated interpreter, since very
little could be said about statement types in advance.
--
Micha Berger
Disclaimer: All opinions expressed here are my own. The spelling, noone's.
email: ...!cmcl2!phri!dasys1!aj-mberg Aspaklaria Publications
vox: (718) 380-7572 73-32 173 St, Hillcrest, NY 11366

Kenneth Walker

unread,
Apr 12, 1989, 9:45:26 PM4/12/89
to
In article <93...@dasys1.UUCP>, aj-m...@dasys1.UUCP (Micha Berger) writes:
>
> I don't see any qualitative difference between typedef's and variable
> decleration. An identifier is an identifier. Are you going to tell me
> that Pascal isn't CF because A*B could have diffent meanings, because
> multiplying integers creates different code than multiplynig reals?

The difference is, at least in part, one of practicality. The parse of
A*B is useful for generating code for either integer or real multiplication.
However, I really want my parser to tell the difference between a
declaration and a statement. If you think you would be happy waiting
until the parser is done to decide, be my guest and try it (you will have
to decide what the grammar needs to look like first). The purpose of
a grammar based parser is to take a big chunk out of the complexity
of program analysis. There is still plenty of work left when it is done,
so the more it can do the better!

Some people have claimed that parsing C is easy. How many of you have
actually tried to produce a _useful_ parser for full ANSI Standard C?
I did it using YACC and the trick of feeding a symbol table back into
the lexical analyser, but it wasn't fun. A corollary of Murphy's Law:
there is a direct relationship between ignorance and optimism. (Which
seems to explain why I usually underestimate how long projects will
take.)

Norman Diamond

unread,
Apr 13, 1989, 1:54:52 AM4/13/89
to
In article <93...@dasys1.UUCP> aj-m...@dasys1.UUCP (Micha Berger) writes:
>> Consider the following in C, however:
>
>> . . . { T (x); . . . }
>
>> How should the construct be parsed?
>
>It would have to be an expression, since anything within brackets ain't a
>decleration. This isn't Pascal kiddees, all functions are global.

Since Micha Berger is so convinced of C's superiority, maybe he should
learn it. Anything within brackets is a declaration until the end of
the declarations, and then statements (expressions) until the closing
bracket. This isn't Pascal kiddee; intuition and aesthetics are not
welcome here.

Micha Berger

unread,
Apr 18, 1989, 10:51:39 AM4/18/89
to
In article <10...@socslgw.csl.sony.JUNET>, dia...@diamond.csl.sony.junet (Norman Diamond) writes:
> In article <93...@dasys1.UUCP> aj-m...@dasys1.UUCP (Micha Berger) writes:
> >> Consider the following in C, however:
> >
> >> . . . { T (x); . . . }
> >
> >> How should the construct be parsed?
> >
> >It would have to be an expression, since anything within brackets ain't a
> >decleration. This isn't Pascal kiddees, all functions are global.
>
> Anything within brackets is a declaration until the end of
> the declarations, and then statements (expressions) until the closing
> bracket. This isn't Pascal kiddee; intuition and aesthetics are not
> welcome here.

Well the sataement cannot be a decleration for x, since x is in
parenthasis. It cannot be a decleration for T (using ANSI's new format),
since it has no type.
My point was about something else. I felt people were trying to force
the statement to read along the same lines as:

main() {
int T(x)
int x;
{
};
.
.
.
}

and that's what I meant about being allowed in Pascal and not in C.

(BTW I've been using C as my primary language since 1978 not because
C is superior as a language, just that it gives me more power to optimize
my code. If I got a Pascal with a good enough optimizer, I'd switch
back. I like C's speed, not the language itself.)


--
Micha Berger
Disclaimer: All opinions expressed here are my own. The spelling, noone's.

email: ...!cmcl2!hombre!dasys1!aj-mberg Aspaklaria Publications

Norman Diamond

unread,
Apr 20, 1989, 8:42:00 AM4/20/89
to
Some unknown poster once asked:

>>>> . . . { T (x); . . . }
>>>> How should the construct be parsed?

In article <93...@dasys1.UUCP> aj-m...@dasys1.UUCP (Micha Berger) writes:

>Well the sataement cannot be a decleration for x, since x is in
>parenthasis. It cannot be a decleration for T (using ANSI's new format),
>since it has no type.

1. It can be a declaration for x. If your compiler rejects this:

typedef int T;
main ()
{
printf ("hello ");
{
T (x);
x = 3;
printf ("world.\n");
}
}

then your compiler is broken.

2. Before ANSI standardized existing practice, it could have been a
declaration for T, as you hint (though not when T was typedef'ed).

> My point was about something else. I felt people were trying to force
>the statement to read along the same lines as:
>
> main() {
> int T(x)
> int x;
> {
> };
> ...
> }

Then maybe you should have read the example before answering.
Your first answer was still out to lunch, and you insulted the
wrong language. Your second answer is still out to lunch too.

Reply all
Reply to author
Forward
0 new messages