Next Version

2 views
Skip to first unread message

andrew cooke

unread,
Feb 5, 2009, 7:44:55 AM2/5/09
to lepl

A heads-up for changes in the next version.

I expect fairly major restructuring of the module organisation.
Matchers should not change name, but they may move package. However,
things will still be exposed via the top-level "lepl" module.

In other words, to help avoid problems later, import from lepl rather
than lepl.match etc.

That's the change most likely to break any existing code. In
addition, I plan to:
* Make all matchers classes (currently some are functions)
* Make the graph of matchers defined by the grammar explicit (matchers
will implement an interface similar to Node, and you will be able to
traverse the grammar)

Once that is done, I can add automated, global transformations to the
grammar. So adding memoization (ie Packrat parsers) becomes very
simple, as does allowing left recursive rules (not my idea - there's a
paper somewhere I'll dig out showing how to do this with a very simple
trick).

Andrew

andrew cooke

unread,
Feb 9, 2009, 8:01:32 PM2/9/09
to lepl
Just a status update - matchers are now nodes in a graph and I just
ran a successful test for rewriting the tree to "flatten" nested And()
and Or() matchers.

The same code, more or less, can be used to add Memoization as
described by Frost & Hafiz (I already have basic memoization; now I
need to work out how to include their approach to left recursion.

Here's the rewriting demo:

class FlattenTest(TestCase):

def test_flatten(self):
matcher = Literal('a') & Literal('b') & Literal('c')
assert str(matcher) == "And(And(Literal('a'), Literal('b')),
Literal('c'))", str(matcher)
parser = string_parser(matcher)
assert str(parser.matcher) == "And(Literal('a'), Literal('b'),
Literal('c'))", str(parser.matcher)

That shows the new 'parser' function, which takes a "simple" matcher
and adds decorators, rewrites the tree, etc. And also the new str()
implementation which again exploits the tree to do its work.

Andrew

andrew cooke

unread,
Feb 19, 2009, 5:41:48 AM2/19/09
to lepl
I just got left recursion working. It was more complex than I
suspected (the final code is quite simple, but for a long time I had a
bug that I didn't understand). The following test now succeeds (on a
naive recursive descent parser it would exhaust the stack because seq
would repeatedly call itself).

def test_left(self):

basicConfig(level=DEBUG)

seq = Delayed()
letter = Any()
seq += Optional(seq) & letter

p = string_matcher(seq,
Configuration(memoizers=[LMemo], monitors=[TraceResults
(True)]))
results = list(p('ab'))
print(results)
assert len(results) == 2, len(results)
assert results[0][0] == ['a', 'b'], results[0][0]
assert results[1][0] == ['a'], results[1][0]

Note the new Configuration object - necessary as the library becomes
more complex.

I now need to add more tests and document the new features. Apart
from bug fixing I don't expect to add more code (except to improve
support for parsing raw strings/lists - the Stream class is now less
critical as resource management is done during trampolining).

(The trampolining is described in a comment here -
http://knol.google.com/k/davy-wybiral/trampolining-in-python (please
ignore the response - computer science is not a game to avoid the use
of certain abstract data types...))

Work is busy and I have some (non-programming!) vacations coming up,
so I don't expect a release until March (World of Goo on Linux is also
using some of my free time!).

This will probably version 2.0, since there are several API changes
(the basic matchers remain the same). For 2.1 I will look at
performance (ignored in this release).

Andrew
Reply all
Reply to author
Forward
0 new messages