Fortunately, the most serious changes are hidden away in the implementation.
But there were some places (mainly line-aware and offside rule parsing) where
these show through to the user (eg. configuration for those is now simpler:
config.lines() and config.blocks()).
There are two "dangerous" issues (that I have noticed), which can break code
silently:
1 - For strings, regular expressions only match within the current line. If
you want to match past line breaks then use parse_sequence() rather than
parse() or parse_string()
2 - In offside-rule parsers, never use Line(). Always use BLine(). If you
want to match an empty line (or otherwise disable the offside rule for the
line) use Bline(..., indent=False).
Otherwise, if something has changed your code should break. That's not great,
I know, which is why I have tried to keep to the old API, but it's better than
something failing silently.
Before any release I want to see if I can remove "GeneratorWrapper" (at least
in most places) and I also need to update the documentation. So don't expect
anything final for a month or so.
Cheers,
Andrew
PS In a little more detail, the implementation changes are:
- Streams are now implemented in a completely new way. Instead of treating a
stream as a sequence you use a set of functions on an opaque blob. For
example, s_next(stream) returns (char, new_stream) and s_line(stream)
returns the current line (the old way was "nicer" on first glance, because
it "felt like" you were dealing with strings, even when you were not, but
the "magic" required to make that work became too complex and
unmaintainable). The function (rather than method) approach allows the
streams to be implemented as tuples, which, I hope, reduces object creation
overhead (state must still change, but for simple streams like strings it
can be a simple integer in the tuple).
- Line aware parsing is not implemented at the alphabet level (it used to
introduce two new "characters" for start and end of line), but at the token
level.
First, for the test at
https://code.google.com/p/lepl/source/browse/src/lepl/_performance/nat_lang.py
which involves a lot of backtracking. The numbers below are seconds for a
repeat of 100 with no special configuration options on my (old) laptop with
Python 3.2.
There's a significant speed-up when parsing simple strings. Time drops from
32 to 20s. The speed-up is much less with tokens: from 26 to 22s. I assume
this is because the old code, when using strings, was constantly making new
copies of substrings (the new code simply changes an index).
Second, for the test at
https://code.google.com/p/lepl/source/browse/src/lepl/_performance/floats.py
This changes from 17s (lepl 4) to 11s (lepl 5). Again, this is with strings.
So on my two main tests for performance I'm seeing a reduction in time of 37%,
15% and 35% (or, equivalently, a speed-up of x1.6, x1.2, x1.5).
The take-away from that is that the new approach allows simple cases to really
be simpler, while not penalising (in fact, improving slightly) more complex
cases. Which is what you would expect, really.
Andrew