lex.py optimized mode

12 views
Skip to first unread message

eliben

unread,
Jul 4, 2008, 11:04:53 AM7/4/08
to ply-hack
Hello,

It appears that when running with optimize=1, PLY's Lex doesn't check
that the table file is older than itself, and ends up using the table
anyway. So if I generate a table, change the lexer and run it again,
it actually uses the old lexer from the table.

This isn't documented and can be quite confusing :-) I think adding a
timestamp check can work here without impeding performance too much.
The lextab is an amazing feature (speeds up lexer creation 15X in my
project), and it would make it more usable.

As an aside note about the documentation - I think that the optimize
mode should be decoupled from the Python's -O mode in the
documentation, because it's a great feature in itself that's very
worthy to use for large grammars, and because Python's -O isn't very
relevant these days.

Thanks
Eli

eliben

unread,
Aug 1, 2008, 3:41:44 AM8/1/08
to ply-hack
Hello,
Are there any news on this subject ?

Thanks
Eli

dhendriks

unread,
Sep 26, 2008, 7:01:29 AM9/26/08
to ply-hack
The same as for Ply's lex table file goes for the yacc table file. I
would also like to see some kind of check for outdated table files.

For the implementation (if time staps are used): The scan and parse
functions used could possibly be defined in multiple files. Therefore,
we need to get the time stamps of all the files that contain t_* and/
or p_* functions. Also, other information, like the list of tokens,
may be defined in yet another file. I'm actually unsure if you can get
this to work in all cases. For instance, assume we define:

tokens = some_imported_file.tokens

Now if 'tokens' changes in the some_imported_file, the value of tokens
changes. However, the file where tokens is defined (using the above
line), hasn't changed (the imported file has changed). This kind of
thing could be hard to detect.

Another way to implement the outdated check is to create some kind of
hash of all the important data. I'm not sure, by I can imagine this
could be expensive.

Dennis

On 1 aug, 09:41, eliben <eli...@gmail.com> wrote:
> Hello,
> Are there any news on this subject ?
>
> Thanks
> Eli
>
> On Jul 4, 5:04 pm, eliben <eli...@gmail.com> wrote:
>
> > Hello,
>
> > It appears that when running with optimize=1, PLY's Lex doesn't check
> > that thetablefileis older than itself, and ends up using thetable
> > anyway. So if I generate atable, change the lexer and run it again,

David Beazley

unread,
Sep 26, 2008, 9:16:16 AM9/26/08
to ply-hack, dhendriks
Yes, this would be a good feature to add to PLY-2.6. I'll have to think of some new scheme for knowing when to run the update. Right now PLY relies on MD5 signatures, but this
being deprecated in Python 3.0 so I'll have to come up with an alternative (maybe I'll do some kind of thing with hash keys).

Also, I just noticed that I probably completely ignored the original message about this dated July 4. Sorry about that--I was in the hospital waiting for the arrival of my first child.
Needless to say, it's been a crazy summer :-).

Cheers,
Dave


On Fri 26/09/08 7:01 AM , dhendriks d.hen...@tue.nl sent:

Simon Cross

unread,
Sep 26, 2008, 9:28:18 AM9/26/08
to ply-...@googlegroups.com
On Fri, Sep 26, 2008 at 3:16 PM, David Beazley <da...@dabeaz.com> wrote:
> Yes, this would be a good feature to add to PLY-2.6. I'll have to think of some new scheme for knowing
> when to run the update. Right now PLY relies on MD5 signatures, but this
> being deprecated in Python 3.0 so I'll have to come up with an alternative (maybe I'll do some kind of
> thing with hash keys).

md5 signatures aren't being deprecated -- just the md5 module. It is
being replaced by hashlib.md5. In current Python 2.6 importing the md5
module prints a deprecation warning. In current 3.0, the md5 module is
gone.

>>> import hashlib
>>> hashlib.md5(b"foo").digest()
'\xac\xbd\x18\xdbL\xc2\xf8\\\xed\xefeO\xcc\xc4\xa4\xd8'

works without warnings in both current 2.6 and 3.0.

Schiavo
Simon

Simon Cross

unread,
Sep 26, 2008, 9:29:40 AM9/26/08
to ply-...@googlegroups.com
On Fri, Sep 26, 2008 at 3:28 PM, Simon Cross <hodg...@gmail.com> wrote:
>>>> import hashlib
>>>> hashlib.md5(b"foo").digest()
> '\xac\xbd\x18\xdbL\xc2\xf8\\\xed\xefeO\xcc\xc4\xa4\xd8'
>
> works without warnings in both current 2.6 and 3.0.

And in 2.5 without the 'b' before the "foo".

David Beazley

unread,
Sep 26, 2008, 9:51:21 AM9/26/08
to ply-...@googlegroups.com, Simon Cross
Yes, Python 3.0 still has MD5, but it's now all tied up with the bytes type and encodings. I think I might investigate some kind of alternative scheme for doing this.

Cheers,
Dave


On Fri 26/09/08 9:28 AM , "Simon Cross" hodg...@gmail.com sent:
>
>
> On Fri, Sep 26, 2008 at 3:16 PM, David Beazley dave@dabeaz.c
> om> wrote:
> > Yes, this would be a good feature to add to PLY-2.6.
> I'll have to think of some new scheme for knowing
> > when to run the update. Right now PLY relies on MD5
> signatures, but this
> > being deprecated in Python 3.0 so I'll have to come up
> with an alternative (maybe I'll do some kind of
> > thing with hash keys).
>
>
>
> md5 signatures aren't being deprecated -- just the md5 module. It is
>
> being replaced by hashlib.md5. In current Python 2.6 importing the md5
>
> module prints a deprecation warning. In current 3.0, the md5 module is
>
> gone.
>
>
>
> >>> import hashlib
>
> >>> hashlib.md5(b"foo").digest()
>
> '\xac\xbd\x18\xdbL\xc2\xf8\\\xed\xefeO\xcc\xc4\xa4\xd8'
>
>
>
> works without warnings in both current 2.6 and 3.0.
>
>
>
> Schiavo
>
> Simon
>
>
>
> >
>
>
>
>
>



eliben

unread,
Sep 26, 2008, 10:58:59 AM9/26/08
to ply-hack

On Sep 26, 3:16 pm, David Beazley <d...@dabeaz.com> wrote:
> Yes, this would be a good feature to add to PLY-2.6.   I'll have to think of some new scheme for knowing when to run the update.  Right now PLY relies on MD5 signatures, but this
> being deprecated in Python 3.0 so I'll have to come up with an alternative (maybe I'll do some kind of thing with hash keys).  
>

Don't you already implement it for parser tables in PLY 2.5 ? I
suppose it can be implemented the same way.

> Also, I just noticed that I probably completely ignored the original message about this dated July 4.  Sorry about that--I was in the hospital waiting for the arrival of my first child.  
> Needless to say, it's been a crazy summer :-).
>

Congrats !!

Eli

D.Hendriks (Dennis)

unread,
Sep 29, 2008, 3:22:31 AM9/29/08
to ply-...@googlegroups.com
Hello,

> Don't you already implement it for parser tables in PLY 2.5 ? I
> suppose it can be implemented the same way.

I thought so as well. However, I recently renamed a grammar rule and forgot to remove my parse table file. When running my program I got a KeyError on the name of the renamed (old) grammar rule name. I then figured the outdated check was simply not implemented...

I decided to check out the source code of Ply's yacc and found this:

(lines 1610-1622):

        if isinstance(module,types.ModuleType):
            parsetab = module
        else:
            exec "import %s as parsetab" % module

        if (optimize) or (Signature.digest() == parsetab._lr_signature):
            _lr_action = parsetab._lr_action
            _lr_goto   = parsetab._lr_goto
            _lr_productions = parsetab._lr_productions
            _lr_method = parsetab._lr_method
            return 1
        else:
            return 0

What we see is that the parse table file is imported and then, if 'optimize' is enabled, it doesn't even matter if the digest matches or not... I think this is a bug in the implementation? I think the 'or' in line 2615 should be an 'and'?

I have some other questions:
 - Why is the import executed, even if we don't use optimized mode?
 - The signature is updated with __tabversion__, method, the start symbol, precedence rules and the doc-strings of grammar rules. I'm missing the token list and the names of the grammar rules. Why is this? (There may be other things missing?)
 - The first time lr_read_tables is called (line 2705) in the yacc function, the signature only contains the __tabversion__, method and start symbol. I wonder if it can ever match the signature stored in the parse table file? (note that due to the 'or' in line 2615, the signature check if not even checked if optimize is enabled)

These are the questions I have after a quick glance at the code...

NOTE: source code I checked is for Ply 2.5

Dennis
Reply all
Reply to author
Forward
0 new messages