BNF for lexer?

43 views
Skip to first unread message

Werner LEMBERG

unread,
Jun 27, 2005, 5:55:35 AM6/27/05
to py...@googlegroups.com

For me it would be much easier to use BNF to describe what the lexer
shall do. Is there a lexer2regexp machine or something similar?


Werner

David Vallejo Fernández

unread,
Jun 27, 2005, 6:11:18 AM6/27/05
to py...@googlegroups.com
Hello! As I am concerned BNF constructs are used by PyGgy to simplify
the construction of a grammar, which is included in the .pyg
specification file, but not to construct the lexer.

Tim Newsham

unread,
Jun 27, 2005, 2:57:04 PM6/27/05
to py...@googlegroups.com
> For me it would be much easier to use BNF to describe what the lexer
> shall do. Is there a lexer2regexp machine or something similar?

If you would like you can write your lexer rules as part of
your grammar. It may be a bit more expensive (computationally)
than doing it as a seperate regexp based lexer, and there are
a few complications you may run into.

As for converting BNF to regular expressions, I'm not aware
of any such tool. Of course you cannot convert an arbitrary
grammar into a regular expression since grammars are more
expressive. It should be possible to create a tool which
converts a subset of grammars to regular expressions.
Doing the conversion manually is usually straightforward
though.

> Werner

Tim Newsham
http://www.lava.net/~newsham/

Werner LEMBERG

unread,
Jun 27, 2005, 7:04:35 PM6/27/05
to py...@googlegroups.com, new...@lava.net
> > For me it would be much easier to use BNF to describe what the
> > lexer shall do. Is there a lexer2regexp machine or something
> > similar?
>
> If you would like you can write your lexer rules as part of your
> grammar. It may be a bit more expensive (computationally) than
> doing it as a seperate regexp based lexer, and there are a few
> complications you may run into.

Hmm, I wonder whether it makes sense to have two parser runs instead
of lexer + parser -- doing this should avoid the complications. In
general, is the parser much slower than the lexer? Are there any
timings?

> It should be possible to create a tool which converts a subset of
> grammars to regular expressions. Doing the conversion manually is
> usually straightforward though.

But if you have recursive rules the results are loooong regexps with
many repetitive elements -- nothing for the faint-hearted. BTW, it
would be great if there existed a kind of `extended' regexp mode
(similar to Python) which ignores whitespace so that I can break the
lines at suitable points.


Werner

Tim Newsham

unread,
Jun 27, 2005, 7:57:29 PM6/27/05
to Werner LEMBERG, py...@googlegroups.com
On Tue, 28 Jun 2005, Werner LEMBERG wrote:
> Hmm, I wonder whether it makes sense to have two parser runs instead
> of lexer + parser -- doing this should avoid the complications. In
> general, is the parser much slower than the lexer? Are there any
> timings?

The lexer uses a linear-time algorithm. The parser is "usually"
linear-time but can be much worse.

> But if you have recursive rules the results are loooong regexps with
> many repetitive elements -- nothing for the faint-hearted.

The lexer spec file lets you define named regular expressions
to make it simpler to use many repeted sub-expressions.

Can you provide an example of what you are trying to do that
would be hard to do in the lexer spec file?

> BTW, it
> would be great if there existed a kind of `extended' regexp mode
> (similar to Python) which ignores whitespace so that I can break the
> lines at suitable points.

Define smaller phrases of your regular expression in the
definitions section.
Reply all
Reply to author
Forward
0 new messages