Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parsers with Pre-processors

7 views
Skip to first unread message

Dave Whipp

unread,
Jul 18, 2003, 1:11:40 AM7/18/03
to perl6-l...@perl.org
I've been re-reading A5 (regexen), and I was trying to work out how to
incorporate a preprocessor into regex, without a separate lexer. I came
to the conclusion that preprocessor commands are part of the whitespace
in the higher layer of the grammer. So we just need to define the <ws>
rule appropriately. Proprocessing also requires maintenance of filename
and linenum as we move between files.

The following code illustrates my thoughts: am I anywhere near close?


grammar expression is preprocessed
{
rule main { <expression>* <EOF> }

rule expression
{
:w <number> <op> <number> { print eval join $op, @number }
}
}

grammar preprocessed
{
rule ws
{
<SUPER.ws> # we are extending the default <ws> rule

# match #include _filename_.
| ^^ \# <SUPER.ws>* include <SUPER.ws>* <filename>
< <SUPER.ws> - \n >* \n
# assume .pos acts like a stack: push the
# new filehandle onto it: hope the handle
# will supply filename and line_num
{ .pos.push open "<$filename" or fail }

# at end of file, pop the .pos stack, if not empty
| <EOF> { .pos.empty and fail } { .pos.pop }
}

# assume Safe module gives us a filename with no dangerous
# meta-chars
rule filename { Safe.filename }
}

--
Dave.

Luke Palmer

unread,
Jul 18, 2003, 2:10:08 AM7/18/03
to da...@whipp.name, perl6-l...@perl.org
> I've been re-reading A5 (regexen), and I was trying to work out how to
> incorporate a preprocessor into regex, without a separate lexer. I came
> to the conclusion that preprocessor commands are part of the whitespace
> in the higher layer of the grammer. So we just need to define the <ws>
> rule appropriately. Proprocessing also requires maintenance of filename
> and linenum as we move between files.
>
> The following code illustrates my thoughts: am I anywhere near close?
>
>
> grammar expression is preprocessed
> {
> rule main { <expression>* <EOF> }
>
> rule expression
> {
> :w <number> <op> <number> { print eval join $op, @number }

C<join> is probably a method now so that should probably be either:

{ print eval @number.join($op) }

Or the equivalent indirect object syntax. Then again, C<&*join> might
just be a forwarding function of the same nature.

> }
> }
>
> grammar preprocessed
> {
> rule ws
> {
> <SUPER.ws> # we are extending the default <ws> rule

That's <SUPER::ws>, as SUPER isn't a parse object. This shouldn't go
first either, because it matches a null string.

> # match #include _filename_.
> | ^^ \# <SUPER.ws>* include <SUPER.ws>* <filename>

<ws> matches optional repeated whitespace, so that * is not needed.

Also, I think you want this all on the same line, right? Those
<SUPER::ws>'s match newlines as well. See below.

> < <SUPER.ws> - \n >* \n

Hmmm, I don't know whether you can use C<-> like that on rules that
are more than single characters. I'm not sure how to get around that
one without plunging into the innards of <ws>. If you could,
however, it would be:

< <SUPER::ws> - [\n] >

> # assume .pos acts like a stack: push the
> # new filehandle onto it: hope the handle
> # will supply filename and line_num

That'd be nice.

> { .pos.push open "<$filename" or fail }

Method calls need parens, though.

{ .pos.push(open "<$filename" err fail) }

> # at end of file, pop the .pos stack, if not empty
> | <EOF> { .pos.empty and fail } { .pos.pop }
> }

You probably need to rename this rule C<my_ws is private> or
somesuch, and then make the ws rule:

rule ws { <my_ws>+ }

Because you want to allow whitespace before and after your directive
with a single <ws> call.

> # assume Safe module gives us a filename with no dangerous
> # meta-chars
> rule filename { Safe.filename }

Probably

rule filename { <Safe::filename> }

Or even

grammar preprocessed is private filename

(If private inheritance is supported). From a design perspective,
either is valid.

Now about the < <SUPER::ws> - [\n] > thing. Here's a very slow way of
doing it:

<SUPER::ws> <( $0[-1] !~ /\n/ )>

It'd be nice to be able to tell a rule to minimal match:

<SUPER::ws?> \n

But there could be so many different meanings of that for some
particular rule, that it's probably not possible. The *rules (global)
might include ? variants of themselves, however, so things like this
would be easy.

And of course you could just do:

\s*? \n

But that's not very good object oriented design (what if someone else
overrode <ws> above you?).

What about a rule junction?

<all /<SUPER::ws>/, /\N*/>

(I am entirely unsure of my syntax there). Presumably, that would
find a place in the input where both of those match from the point
they started, which would do it. It could be optimized (heh :-) to
make an aggregate rule which checks both sides at each character
match, which would bring the match time out of exponential.

I see use for that construct.

Luke

> }
>
> --
> Dave.

0 new messages