Token shifting problem in C parser

23 views
Skip to first unread message

eliben

unread,
Sep 20, 2008, 5:39:05 AM9/20/08
to ply-hack
Hello,

I'm writing a C parser using PLY, and recently ran into a problem.
This code:

typedef int my_type;
my_type x;

Is correct C code, because my_type is defined as a type previously to
being used as such. I handle it by filling a type symbol table in the
parser that gets used by the lexer to differentiate between types and
simple identifiers.

However, while the type declaration rule ends with SEMI (the ';'
token), PLY shifts the token 'my_type' from the second line before
deciding it's done with the first one. Because of this, I have no
chance to pass the update in the type symbol table to the lexer and it
sees my_type as an identifier and not a type.

Any ideas for a fix ?

The full code is at: http://code.google.com/p/pycparser/source/browse/trunk/src/c_parser.py
Not sure how I can create a smaller example out of this.

Thanks in advance,
Eli

eliben

unread,
Sep 20, 2008, 5:46:20 AM9/20/08
to ply-hack
Just wanted to add that the same problem exists with the example ansic
grammar supplied with PLY.

Eli

David Beazley

unread,
Sep 20, 2008, 9:25:23 AM9/20/08
to ply-hack, eliben

One way to handle something like this is to modify the grammar slightly. Instead of having a rule like this:

def p_whatever_statement(p):
"statement : whatever SEMI"
pass

You split it into two rules:

def p_whatever_statement(p):
"statement : whatever_part SEMI"
pass

def p_whatever_part(p):
"whatever_part : whatever"
# Do whatever (modify symbol tables, etc.)

This will force the code in the second rule (whatever_part) to run before the semicolon at the end gets reduced along with the rule. There's more information in section 5.11 of
the PLY documentation (Embedded Actions).

Cheers,
Dave



On Sat 20/09/08 5:39 AM , eliben eli...@gmail.com sent:
>
>
> Hello,
>
>
>
> I'm writing a C parser using PLY, and recently ran into a problem.
>
> This code:
>
>
>
> typedef int my_type;
>
> my_type x;
>
>
>
> Is correct C code, because my_type is defined as a type previously to
>
> being used as such. I handle it by filling a type symbol table in the
>
> parser that gets used by the lexer to differentiate between types and
>
> simple identifiers.
>
>
>
> However, while the type declaration rule ends with SEMI (the ';'
>
> token), PLY shifts the token 'my_type' from the second line before
>
> deciding it's done with the first one. Because of this, I have no
>
> chance to pass the update in the type symbol table to the lexer and it
>
> sees my_type as an identifier and not a type.
>
>
>
> Any ideas for a fix ?
>
>
>
> The full code is at: http://code.google.com/p/pycparser/source/browse/trunk/src/c_pa
> rser.py
> Not sure how I can create a smaller example out of this.
>
>
>
> Thanks in advance,
>
> Eli
>
> >
>
>
>
>
>



eliben

unread,
Sep 20, 2008, 11:24:56 AM9/20/08
to ply-hack
> One way to handle something like this is to modify the grammar slightly.  Instead of having a rule like this:
>
> def p_whatever_statement(p):
>       "statement : whatever SEMI"
>       pass
>
> You split it into two rules:
>
> def p_whatever_statement(p):
>      "statement : whatever_part SEMI"
>      pass
>
> def p_whatever_part(p):
>     "whatever_part : whatever"
>     # Do whatever  (modify symbol tables, etc.)
>
> This will force the code in the second rule (whatever_part) to run before the semicolon at the end gets reduced along with the rule.   There's more information in section 5.11 of
> the PLY documentation (Embedded Actions).
>

Thanks a lot, David - problem solved !

I used to have this rule:

def p_declaration(self, p):
""" declaration : declaration_specifiers
init_declarator_list_opt SEMI
"""
Where I would add the new type to the symbol table.
I split it to:

def p_decl_body(self, p):
""" decl_body : declaration_specifiers
init_declarator_list_opt
"""
<<<handle declaration here>>>

def p_declaration(self, p):
""" declaration : decl_body SEMI
"""
p[0] = p[1]

And now it works, because decl_body is always reduced prior to
shifting in the token after SEMI.

I just hope there are no hidden gotchas in it. No new conflicts were
created, at least.

Eli
Reply all
Reply to author
Forward
0 new messages