EBNF support in PLY Yacc?

466 views
Skip to first unread message

Davy

unread,
Aug 20, 2008, 10:36:00 AM8/20/08
to ply-hack
Hi,

I am using PLY these days and found BNF hard to write. Is there any
EBNF support in PLY? Or Is there any tool to convert EBNF to BNF for
Yacc?

Best regards,
Davy

David Beazley

unread,
Aug 20, 2008, 11:49:19 AM8/20/08
to ply-...@googlegroups.com
I'm not aware of any tool like this off the top of my head.
However, in terms of support EBNF in PLY, I have a feeling that it
would be easier to support EBNF using some kind of tool or wrapper
that creates a standard BNF for PLY to work with. Modifying the
table generation code in PLY to use an EBNF directly would likely
fall somewhere in the middle of the "hairy" category of projects. Of
course, I could be wrong, but that's just my gut feeling about it.

Cheers,
Dave

Davy

unread,
Aug 20, 2008, 9:22:15 PM8/20/08
to ply-hack
Hi David,

Thank you for the generous help :)
As far as I can see, Lex and Yacc do the same thing on different
abstraction layer.
That is, Lex rule is regular expression of character, while Yacc rule
(if we use EBNF) is regular expression of token. The only difference
is Yacc rule embrace recursion.
I guess Lex parser is only a special case of Yacc parser (Lex output
is list but not recursively structure like tree). So why not merge
them into one tool?

Best regards,
Davy
> > Davy- Hide quoted text -
>
> - Show quoted text -

Davy

unread,
Aug 20, 2008, 9:35:58 PM8/20/08
to ply-hack
Hi David,

I agree with you on the way of support EBNF in PLY (maybe a tool that
convert the Python source file with EBNF to an file with BNF). Because
EBNF is a lot easier to read and write than BNF, I guess there must
exists such a requirement :)

I am new to compiler front end (as a hardware engineer), maybe I can
take it as an exercise :)

Best regards,
Davy

On Aug 20, 11:49 pm, David Beazley <d...@dabeaz.com> wrote:

A.T.Hofkamp

unread,
Aug 21, 2008, 4:18:30 AM8/21/08
to ply-...@googlegroups.com
Davy wrote:
> Hi David,
>
> Thank you for the generous help :)
> As far as I can see, Lex and Yacc do the same thing on different
> abstraction layer.
> That is, Lex rule is regular expression of character, while Yacc rule
> (if we use EBNF) is regular expression of token. The only difference
> is Yacc rule embrace recursion.

True, if you forget about the white space that you silently throw away.

Below I wrote some background information with some terms you can use as
starting point for searching more information.

The lack of recursion in lex makes it possible to generate a DFA
(deterministic finite automaton) from the regular expression(s), which allows
for *very fast* processing of the characters.

The yacc parser on the other hand is slower, and more powerful than regular
expressions.

(Note that in compiler theory, 'regular expressions with recursion' don't
exist, such grammars fall in the Chomsky hierarchy of contex-free languages.
For such languages, you need to have a stack besides the automaton.)

This additional power allows you to recognize nested structures correctly, for
example to find the matching closing brakcet in an expression.

The LALR(1) algorithm used by yacc is a compromise between maximal recognition
power (enough for most languages), and not too many states (although that is
of less importance nowadays).

> I guess Lex parser is only a special case of Yacc parser (Lex output
> is list but not recursively structure like tree). So why not merge
> them into one tool?

Speed and simplicity mostly. The current division of labour means you can use
optimized algorithms for each part while reducing overall complexity.

While writing yacc rules, you don't have to think about the representation of
an integer number (was it '12345' or '0x3039' or '030071' or 'B11000000111001'?).


None the less, parsers that combine both phases do exist. One example is
'sglr' (a C-based solution, I am not aware of any Python-based solutions). It
uses much more powerful (and much more resource-eating) algorithms to
recognize the text.

Their power is useful at places where the lex/yacc algorithms are not enough,
eg with ambigious grammars and/or textual input with several languages
combined (eg Cobol with embedded SQL).

Albert


Davy

unread,
Aug 21, 2008, 9:41:00 PM8/21/08
to ply-hack
Hi Albert,

Nice to read your answer:) Perhaps I shall read some introduction
material on compiler/regular expression parser design, is there any
recommendation?

Best regards,
Davy
Reply all
Reply to author
Forward
0 new messages