Changing the start symbol

26 views
Skip to first unread message

0beron

unread,
Sep 30, 2009, 5:32:44 AM9/30/09
to ply-hack
I have a ply grammar that is designed to parse as much of FORTRAN 90
as I can, in order to write a bunch of code analysis and auto-
generation tools. I have a start symbol called 'top' which contains
everything else. In writing some new tools, I often find myself
wanting to parse a fragment of a file which I know starts with a
particular token, for example a type definition or a single subroutine
call. I'd like to build a parser instance that uses a subset of my
existing grammar, ie end up with a parser that will no longer accept
full FORTRAN files, but will quickly parse a string that is known to
(or suspected to) contain a few FORTRAN constructs of interest.

My first attempt at this is to add a routine to my parser definition:
def start_symbol(st):
global parser
parser = yacc.yacc(start=st, debug=0)

And then to call start_symbol('new_start_symbol') before calling
yacc.parse

This seems to work, but I get a page or ten of warnings from yacc
stating that there are unreachable symbols (which there are because
I'm now using a subset of the grammar intentionally). It also seems to
insist on rebuilding the parser from scratch each time, which isn't
good news for quick fire command line tools where picking up the
cached parser tables is really essential.

Is there any way I can mess with the internals of an existing parser
object to 'prime' it to be in a different starting state? Ie let PLY
load the cached tables, create a parser, and then just tweak the
existing instance?

A.T.Hofkamp

unread,
Sep 30, 2009, 8:01:06 AM9/30/09
to ply-...@googlegroups.com
0beron wrote:
> Is there any way I can mess with the internals of an existing parser
> object to 'prime' it to be in a different starting state? Ie let PLY
> load the cached tables, create a parser, and then just tweak the
> existing instance?

I don't know about parser internals, but maybe you don't need to.

You can make a new start symbol, and introduce new tokens that select a rule
to 'jump' to the right rule, eg

begin: FORTRAN top | EXPRESSION expression ;

now you just need to inject the right token to select a part of the grammar,
ahead of the real tokens.


One way would be to make a new token getter function that wraps the normal lexer.
Another way can be to use lexer states, ie put the lexer in some unique state,
and the one that happens is to emit a token, and jump to the 'normal' lexing
state.

The former is definitely possible, in the past I have made custom lexers and
hooked them up to the ply parser by making a token getter function. I don't
have a standard way of doing that. Note that I always wrap ply parsers and
scanners in a class (which is non-standard for ply), so it may be a lot of work.
About the second alternative, I have never used lexer states in ply, so I
cannot say whether it is really feasible, and/or how much work that is going
to be.

Albert

David Beazley

unread,
Sep 30, 2009, 8:22:42 AM9/30/09
to ply-...@googlegroups.com, David Beazley

On Sep 30, 2009, at 7:01 AM, A.T.Hofkamp wrote:

>
> 0beron wrote:
>> Is there any way I can mess with the internals of an existing parser
>> object to 'prime' it to be in a different starting state? Ie let PLY
>> load the cached tables, create a parser, and then just tweak the
>> existing instance?
>
> I don't know about parser internals, but maybe you don't need to.
>
> You can make a new start symbol, and introduce new tokens that
> select a rule
> to 'jump' to the right rule, eg
>
> begin: FORTRAN top | EXPRESSION expression ;
>
> now you just need to inject the right token to select a part of the
> grammar,
> ahead of the real tokens.
>


Yes, this technique is what I was going to suggest. However, I'm
looking at the PLY implementation and it doesn't seem like there is an
easy way to inject a starting token. You could manually hack it by
creating your own lexer object and passing that into the yacc.parse()
function. However, I really ought to make a patch that simplifies
this.

Cheers,
Dave

Reply all
Reply to author
Forward
0 new messages