Mult-line tokens with ply.lex

1,179 views
Skip to first unread message

Francesco Bochicchio

unread,
Aug 26, 2008, 10:15:00 AM8/26/08
to ply-hack
Hi all,

I'd like to use PLY to parse a grammar which includes multi-line
comments bounded by the '|' symbol.
The following scanner rule only works for single-line comments:

t_DOCSTRING = r'\|.*\|'

Anybody knows if it is possible to define multi-line tokens in
ply.lex? I also tried removing '\n' from t_ignore
and making it a special token (which I'm not sure is suitable for me,
since it would make the syntax rules
way too complicate), but still does not work.

BTW, the syntax I'm trying to parse is the one of 'Petal
files' (extension PTL). If anybody knows a module
that already does the job, I would appreciate a pointer to it :-)

Thanks in advance for any suggestion.


Ciao
-------
FB

Bart Whiteley

unread,
Aug 26, 2008, 12:09:46 PM8/26/08
to ply-...@googlegroups.com
On Tue, Aug 26, 2008 at 8:15 AM, Francesco Bochicchio
<boc...@virgilio.it> wrote:
>
> Hi all,
>
> I'd like to use PLY to parse a grammar which includes multi-line
> comments bounded by the '|' symbol.
> The following scanner rule only works for single-line comments:
>
> t_DOCSTRING = r'\|.*\|'
>
> Anybody knows if it is possible to define multi-line tokens in
> ply.lex? I also tried removing '\n' from t_ignore
> and making it a special token (which I'm not sure is suitable for me,
> since it would make the syntax rules
> way too complicate), but still does not work.

This is my rule for skipping C-style multi-line comments:

def t_MCOMMENT(t):
r'/\*(.|\n)*?\*/'
t.lineno += t.value.count('\n')

Bruce Frederiksen

unread,
Aug 26, 2008, 12:53:38 PM8/26/08
to ply-...@googlegroups.com
You might try r'\|(.|\n)*?\|' since '.' doesn't match '\n'. Also try
r'\|[^|]*\|'.

D.Hendriks (Dennis)

unread,
Aug 27, 2008, 2:36:56 AM8/27/08
to ply-...@googlegroups.com
For some more information:

- http://docs.python.org/lib/re-syntax.html -> Complete regular expression syntax from official python documentation. Here you can also see that the dot doesn't match the newline character, except when the DOTALL flag is used.

- http://docs.python.org/lib/module-re.html -> All information about the Python re module (regular expression module), once again fromt the official Python documentation.

- http://www.amk.ca/python/howto/regex/ -> A regular expression howto.

Dennis

Andrew Dalke

unread,
Sep 6, 2008, 7:24:43 PM9/6/08
to ply-hack
Coming into this a bit late ...

On Aug 26 Francesco Bochicchio asked:
> Anybody knows if it is possible to define multi-line tokens in ply.lex?

I see some people pointed out more complex regular expression for
that.

A different solution I've used is with parser states. When you see
the start of a comment, switch to an exclusive state. Something like
the following untested snippets:

states = (
("COMMENT", "exclusive"),
)

def t_start_comment(t):
r" \| '"
t.lexer.push_state("COMMENT")

def t_COMMENT_contents(t):
r"[^|]*"

def t_COMMENT_end(t):
r" \| "
t.lexer.pop_state()


This approach also lets you do things like report if the comment is
unterminated, or support structured comments (a la Javadoc) embedded
inside.

Andrew

Reply all
Reply to author
Forward
0 new messages