Parsing C-style strings as tokens

46 views
Skip to first unread message

astromme

unread,
May 27, 2011, 3:37:42 PM5/27/11
to lepl
I'm trying to parse strings using a grammar that is c-like but I'm
having trouble. Is there documentation on exactly what the lepl regex
engine supports? Ideally I would just pass in Token(r'"(\\. | [^"\
\])*"') but lepl to errors on that with lepl.regexp.core.RegexpError:
Cannot parse regexp '"(\\\\. | [^"\\\\])"*' using <Unicode> ]

from lepl import *
class Print(List): pass

keyword = Token('[a-z]+')
symbol = Token('[^0-9a-zA-Z \t\r\n]')
comma = symbol(',')

string = ~symbol('"') & (Token(r'\\.') | Token(r'[^"\\]'))[:,...] &
~symbol('"')
regex_string = Token(r'"(\\. | [^"\\])"')
sequential_print = ~keyword('print') & ~symbol('(') & string &
~symbol(')') > Print

print regex_string.parse(r'"Hello world"')[0]
print sequential_print.parse(r'print("")')[0]
print sequential_print.parse(r'print("mystring")')[0]

Thanks,

Andrew

andrew cooke

unread,
May 27, 2011, 5:31:02 PM5/27/11
to le...@googlegroups.com

hi,

yeah, the regexp support is not so great - i am working on improving it.

it's best to simply look at the parser, which is at
http://code.google.com/p/lepl/source/browse/src/lepl/regexp/str.py#150

in particular, it doesn't support named groups - look at the definition of
"open" - which means that you probably want (and mean) '"(?:\\\\.|[^"\\\\])*"'
(note that i've swapped the final * and " because i think you had a mistake
there? i also removed some spaces i didn't understand)

anyway, i don't have lepl handy at the moment, but i suspect the use of (?: )
is the source of your problems.

cheers,
andrew

ps are you using lepl 5.0? looking at the source i linked to, you should have
seen a more informative error emssage that describes exactly this.

> --
> You received this message because you are subscribed to the Google Groups "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to lepl+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
>

andrew cooke

unread,
May 27, 2011, 5:33:04 PM5/27/11
to le...@googlegroups.com

also, the rules governing backslash escapes are described at
http://code.google.com/p/lepl/source/browse/src/lepl/regexp/str.py#48 and are
subtly different to python's (but more consistent!).

andrew

Andrew Stromme

unread,
May 28, 2011, 10:55:02 AM5/28/11
to le...@googlegroups.com
Hey, that works great! Thanks so much.

Also, I'm pretty sure that I'm using lepl 5.

Loki[astromme]$ easy_install lepl
Searching for lepl
Best match: LEPL 5.0.0
Processing LEPL-5.0.0-py2.6.egg
LEPL 5.0.0 is already the active version in easy-install.pth

Using /Library/Python/2.6/site-packages/LEPL-5.0.0-py2.6.egg
Processing dependencies for lepl
Finished processing dependencies for lepl
Loki[astromme]$

Andrew

andrew cooke

unread,
May 28, 2011, 11:13:08 AM5/28/11
to le...@googlegroups.com

Great. I'll check/fix the warning - looks like it's being caught and replaced
with something less useful. Thanks,
Andrew
Reply all
Reply to author
Forward
0 new messages