Parsing C: TypedefName VS Identifier

Sank

unread,

Apr 13, 2005, 7:36:32 AM4/13/05

to

The problem may seem a bit "old", but it does not really make it
easier.

As my utilite needs to parse a C program, I tried to use some of the
most well-known grammars, Jim Roskind YACCable C grammar (c5.y) and a
lexer for it (cpp5.l). And it found out that the grammar is dependent
on whether the lexeme is an IDENTIFIER or a TYPEDEFname. To manage with
this problem Jim Roskind recommended to maintain a symbol table for
current scope, with the help of which this problem was to be solved.

But the grammar is dependent on distinguishing IDENTIFIER and
TYPEDEFname even in declarations. So here's the problem: how can the
lexer learn at the point of declaration *WHAT* is that identifier-like
string, IDENTIFIER or TYPEDEFname? We didn't store it in any symbol
table yet, we want to parse the declaration!

I have tried to rewrite the grammar to avoid this problem, but it
didn't work and created a bunch of reduce/reduce conflicts. I haven't
found yet any useful recommendations ... I will be thankful to everyone
who has heard something about solving this problem !

Best regards,
Igor Baltic
PhML239, St Petersburg

jacob navia

unread,

Apr 13, 2005, 7:57:43 AM4/13/05

to

Sank wrote:
>
> But the grammar is dependent on distinguishing IDENTIFIER and
> TYPEDEFname even in declarations. So here's the problem: how can the
> lexer learn at the point of declaration *WHAT* is that identifier-like
> string, IDENTIFIER or TYPEDEFname? We didn't store it in any symbol
> table yet, we want to parse the declaration!
>

Make the lexer look up in the typedefs table if the id is a typedef.
If it is return TYPEDEFNAME, else return identifier...

All typedefs must be previously declared.

Dave Hansen

unread,

Apr 13, 2005, 10:16:25 AM4/13/05

to

To put it another way, every TYPEDEFname started life as an
IDENTIFIER.

Regards,

-=Dave
--
Change is inevitable, progress is not.

CBFalconer

unread,

Apr 13, 2005, 11:02:26 AM4/13/05

to

This is just the sort of thing my hashlib module is designed to
handle. You can create a linked list of hashtables in scope order,
with a separate table for for reserved words, and possibly separate
lists to handle the various C namespaces. All the tables use the
same hash, comparison, etc functions. An example of this sort of
multiple tables for identifiers is in id2id-20. It, and the
hashlib package, can be found at:

<http://cbfalconer.home.att.net/download/>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Ivan A. Kosarev

unread,

Apr 14, 2005, 11:21:35 AM4/14/05

to

"Sank" <ig...@sb8286.spb.edu> wrote in message
news:1113392192.7...@z14g2000cwz.googlegroups.com...

> But the grammar is dependent on distinguishing IDENTIFIER and
> TYPEDEFname even in declarations. So here's the problem: how
> can the lexer learn at the point of declaration *WHAT* is that
> identifier-like string, IDENTIFIER or TYPEDEFname? We didn't
> store it in any symbol table yet, we want to parse the declaration!
>
> I have tried to rewrite the grammar to avoid this problem, but it
> didn't work and created a bunch of reduce/reduce conflicts. I
> haven't found yet any useful recommendations ... I will be thankful
> to everyone who has heard something about solving this problem !

There are two requirements for an identifier to be a typedef name:

1) There is visible declaration for the identifier so that the declaration
declares the identifier with the typedef storage class specifier and

2) There are no type specifiers in currently parsing declaration
specifier list. For example,

typedef int I;

void f()
{
I I;

The second "I" here is an ordinary identifier, not a typedef name. So,
this declares object "I" of type int.