On 08/17/2012 02:52 PM, viy wrote:
> Hi all, jfyi
>
> I've added just one token to my lexer rules and stuck in 100 groups limit in
> python re
>
http://stackoverflow.com/questions/478458/python-regular-expressions-with-more-than-100-groups
>
>
> PLY has workaround in its code - when your master re exceeds 100 groups, PLY
> catches AssertionError from python, splits master re into parts and retries.
>
> All works smoothly, but in my case my unit tests suite became 10x slower.
> Single parsing is about 1.5x slower.
>
> The solution is obvious - to get rid of the python limitation.
> Does anyone know the best way to do so?
Re-implement RE? :D
Much happiness would spread throughout the Python community, I am sure :)
Other solutions include
0. Live with it. Other solutions may cost more time than you are ever going to save.
1. DIY: You can easily define your own scanner, using arbitrary Python code. Just make sure you
match the interface. String scanning is relatively easy, it just takes a lot of code.
2. A long time ago (several years at least), someone wrote a Lex framework. I forgot about the
details, but the mailinglist archive or google can probably help you. Iirc, it was a true lex, and
had a different approach than using RE.
3. More exotic solutions like writing a scanner C extension (generated with lex/flex) are also possible.
4. Even more exotic stuff like generating a DFA somehow, and implementing that in Python can be done.
4. Other Python parser generators may have better solutions (I somewhat doubt it, but it should be
easy enough to scan through it, checking for how the scanner works)
Good luck
Albert