parsing with significant whitespace

256 views
Skip to first unread message

Hans-Peter Jansen

unread,
Apr 21, 2007, 2:30:42 PM4/21/07
to ply-...@googlegroups.com
Hi *,

I'm trying to wrap my head around lexing and yaccing again. A few approaches
in ancient times failed so far., but hopefully python and ply ease things a
bit.

To start with, I never came so far (in such a short time frame), so all in
all it looks promising, but bear with me (a bit at least ;-).

Here's the biest: ftp://urpla.net/sieveparser.py

This time, I try to "understand" sieve scripts (rfc3028 et al.). Since the
outcome should be a graphical sieve editor, which should be as minimum
invasive as possible to the scripts, I decided to not ignore whitespace,
nor newlines, and ply.lex got me to a state, I'm pretty happy with.

yply.py helped me considerably with the rule definitions (from cyrus sieve),
but now I'm stuck as being to dump to get the CRLF rule right :-(.

Could some kind mind have a look and kick me into the right direction. I'm
ready to learn, promised!

TIA,
Pete

Hans-Peter Jansen

unread,
Apr 21, 2007, 4:07:19 PM4/21/07
to ply-...@googlegroups.com
Am Samstag, 21. April 2007 20:30 schrieb Hans-Peter Jansen:
>
> yply.py helped me considerably with the rule definitions (from cyrus
> sieve), but now I'm stuck as being to dump to get the CRLF rule right
> :-(.
>
> Could some kind mind have a look and kick me into the right direction.
> I'm ready to learn, promised!

Hmm, I should elaborate on what failed:

$ echo 'require ["fileinto", "reject", "regex", "vacation"];' | \
./sieveparser.py
yacc: Warning. Token 'WS' defined, but not used.
yacc: Warning. Token 'COMMENT' defined, but not used.
yacc: Warning. Token 'GT' defined, but not used.
yacc: Warning. Token 'GE' defined, but not used.
yacc: Warning. Token 'EQ' defined, but not used.
yacc: Warning. Token 'NE' defined, but not used.
yacc: Warning. Token 'LE' defined, but not used.
yacc: Warning. Token 'LT' defined, but not used.
yacc: Warning. Token 'ID' defined, but not used.
yacc: Warning. Token 'BCOMMENT' defined, but not used.
./sieveparser.py:474: Warning. Rule 'crlf' defined, but not used.
yacc: Warning. There are 10 unused tokens.
yacc: Warning. There is 1 unused rule.
yacc: Symbol 'crlf' is unreachable.
yacc: Generating LALR parsing table...
syntax error at REQUIRE token in line 1, column 1

The warnings about the unused tokens are expected, but the crlf rule stuff I
don't understand, nor (much more important) the syntax error on the require
token.

I uploaded another more complete sieve script to play with here:
ftp://urpla.net/sieve.script
(suffering from the same problem)

Still clueless,
Pete

mber...@gmail.com

unread,
Apr 26, 2007, 2:55:33 PM4/26/07
to ply-hack
Hi Pete,

you don't use the production crlf anywhere, that's why it's
unreachable.

Concerning the REQUIRE syntax error:
The first two tokens the lexer produces here is REQUIRE WS, because
there is a space after require. The WS terminal, alas, isn't used
anywhere in your grammar. Try replacing

def p_require_1(p):
'''require : REQUIRE stringlist SEMI'''

with
def p_require_1(p):
'''require : REQUIRE WS stringlist SEMI WS crlf'''

After every terminal symbol in the right sides put WS so as to skip
whitespace. When you expect a newline, put crlf in.

Markus

Hans-Peter Jansen

unread,
Apr 26, 2007, 6:58:49 PM4/26/07
to ply-...@googlegroups.com
Hi Markus,

thanks for your answer. That got me started again..

Am Donnerstag, 26. April 2007 20:55 schrieb m...@gmail.com:
> Hi Pete,
>
> you don't use the production crlf anywhere, that's why it's
> unreachable.
>
> Concerning the REQUIRE syntax error:
> The first two tokens the lexer produces here is REQUIRE WS, because
> there is a space after require. The WS terminal, alas, isn't used
> anywhere in your grammar. Try replacing
>
> def p_require_1(p):
> '''require : REQUIRE stringlist SEMI'''
>
> with
> def p_require_1(p):
> '''require : REQUIRE WS stringlist SEMI WS crlf'''
>
> After every terminal symbol in the right sides put WS so as to skip
> whitespace. When you expect a newline, put crlf in.

Ahh, I see, but that seem to be pretty harmful to the grammar, if could be
done correctly at all :-(.

Looks like I need to study the garden snake with its significant whitespace
again.

I'm off for a few days, and will come back later to this again.

Thanks again and a nice weekend to everybody,
Pete

Reply all
Reply to author
Forward
0 new messages