Apoc 5 - some issues

Matthijs Van Duin

unread,

Mar 17, 2003, 9:35:50 AM3/17/03

to perl6-l...@perl.org

OK, I've recently spent some intimate time with Apocalypse 5 and it has
left me with a few issues and questions.

If any of this has already been discussed, I'd appreciate some links (I've
searched google groups but haven't found anything applicable)

1. Sub-rules and backtracking

> <name(expr)> # call rule, passing Perl args
> { .name(expr) } # same thing.

> <name pat> # call rule, passing regex arg
> { .name(/pat/) } # same thing.

Considering perl can't sanely know how to backtrack into a closure, wouldn't
{ .name(expr) } be equal to <name(expr)>: instead? (note the colon)

It seems to me that for a rule to be able to backtrack, you would need to
pass a closure as arg that represents the rest of the match: the rule
matches, calls the closure, and if the closure returns tries to backtrack
and calls it again, or returns if all possibilities are exhausted.

Or will a rule store all of its state into hypothetical variables? It
seems to me that would make the possibility of backtracking into closures
even more problematic, but maybe i'm just missing something...

Related to this: what is the prototype for rules (in case you want to
manually write or invoke them) ?

2. Rules with custom parsing

> As mentioned in a previous Apocalypse, the \L, \U, and \Q sequences no longer
> use \E to terminate--they now require bracketing characters of some sort.

(much later)
> In addition to normal subrules, we allow some funny looking method names like:
> rule \a { ... }

Can I conclude from this you can use "is parsed" on a rule to be able to
grab the bracketed expression it's followed by?

3. Negated assertions

> any assertion that begins with ! is simply negated.

> \P{prop} <!prop>
> (?!...) <!before ...> # negative lookahead
> [^[:alpha:]] <-alpha>

Considering <prop> means "matches a character with property prop", it
seems to me <!prop> would mean the ZERO-WIDTH assertion "does not match a
character with property prop", rather than "match a character without
property prop".

Shouldn't it be <-prop> instead? (see also point 5)

4. Character class syntax

> predefined character classes are just considered intrinsic grammar rules

> [[:alpha:][:digit]] <<alpha><digit>>
> <[_]+<alpha>+<digit>-<Swedish>>

Can I conclude from this that the + to add character classes is optional?
What about <-<foo><bar>>, is is that the inversion of <<foo><bar>> ? (I do
hope so) But <-<foo>+<bar>> will be the inversion of <<foo>-<bar>> right?

Also, what exactly is allowed inside a character class? Apparently
character sets like [a-z_] and subrules like <alpha>. What can I put into
a set? single character and ranges obvious; but what about interpolated
variables? I assume I also can't put \w inside [] anymore since it's a
subrule, so [\w.;] would become <\w[.;]> ?

5. Character class semantics

> predefined character classes are just considered intrinsic grammar rules

This means you can place arbitrary rules inside a character class. What
if the rule has a width unequal to 1 or even variable-width? I can think
of a few possibilities:

a. Require subrules inside a character class to have a fixed width of 1
char. (requires a run-time check since the rule might be redefined.. ick)

b. Rules inside a character class are ORed together, an inverted subrule
is interpreted as [ <!before <subrule>> . ]

c. The whole character class is a zero-width assertion followed by the
traversal of a single char.

My personal preference is (c), which also means \N is equivalent to <-\n>

6. Null pattern

> That won't work because it'll look for the :wfoo modifier. However, there
> are several ways to get the effect you want:
> /[:w()foo bar]/
> /[:w[]foo bar]/

Tsk tsk Larry, those look like null patterns to me :-)

While I'm on the subject.. why not allow <> as the match-always assertion?
It might conflict with huffman encoding, but I certainly don't think <>
could ethically mean anything other than this. And <!> would ofcourse be
the match-never assertion.

7. The :: operator

> :: # fail all |'s when backtracking

> If you backtrack across it, it fails all the way out of the current
> list of alternatives.

This suggests that if you do:
[ foo [ bar :: ]? | foo ( \w+ ) ]
that if it backtracks over the :: it will break out of the outermost [],
since the innermost isn't a list of alternatives.

Or does it simply break out of the innermost group, and are the
descriptions chosen a bit poorly?

That's it for now I think.. maybe I'll find more later :)

--
Matthijs van Duin -- May the Forth be with you!

Luke Palmer

unread,

Mar 17, 2003, 1:17:21 PM3/17/03

to matt...@cds.nl, perl6-l...@perl.org

> 1. Sub-rules and backtracking
>
> > <name(expr)> # call rule, passing Perl args
> > { .name(expr) } # same thing.
>
> > <name pat> # call rule, passing regex arg
> > { .name(/pat/) } # same thing.
>
> Considering perl can't sanely know how to backtrack into a closure, wouldn't
> { .name(expr) } be equal to <name(expr)>: instead? (note the colon)

Nope. <name(expr)>: is equivalent to { .name{expr} }: . It does know
how to backtrack into a closure: it skips right by it (or throws an
exception through it... not sure which) and tries again.
Hypotheticals make this function properly.

> It seems to me that for a rule to be able to backtrack, you would need to
> pass a closure as arg that represents the rest of the match: the rule
> matches, calls the closure, and if the closure returns tries to backtrack
> and calls it again, or returns if all possibilities are exhausted.

Sounds like continuation-passing style. Yes, you can backtrack
through code with continuation-passing style. Continuations have yet
to be introduced into the language.

> Related to this: what is the prototype for rules (in case you want to
> manually write or invoke them) ?

rule somerule($0) {}

If it takes arguments, put them on the end of the signature. Invoke
them just like subs.

(Just realized something: you can't do {...} on a rule, because that
means match any character three times.)

> 3. Negated assertions
>
> > any assertion that begins with ! is simply negated.
>
> > \P{prop} <!prop>
> > (?!...) <!before ...> # negative lookahead
> > [^[:alpha:]] <-alpha>
>
> Considering <prop> means "matches a character with property prop", it
> seems to me <!prop> would mean the ZERO-WIDTH assertion "does not match a
> character with property prop", rather than "match a character without
> property prop".

Right. It has to be. There is no way to implement it in a
sufficiently general way otherwise.

> 5. Character class semantics
>
> > predefined character classes are just considered intrinsic grammar rules
>
> This means you can place arbitrary rules inside a character class. What
> if the rule has a width unequal to 1 or even variable-width? I can think
> of a few possibilities:
>
> a. Require subrules inside a character class to have a fixed width of 1
> char. (requires a run-time check since the rule might be redefined.. ick)
>
> b. Rules inside a character class are ORed together, an inverted subrule
> is interpreted as [ <!before <subrule>> . ]
>
> c. The whole character class is a zero-width assertion followed by the
> traversal of a single char.
>
> My personal preference is (c), which also means \N is equivalent to <-\n>

Yikes. Good questions. Recall that Unicode is sortof like
multi-character matching, so it might be possible to allow
<<anyrule><anyother>>. That might be a way to specify the parallel
matching of those two rules. It's entirely likely that I'm wrong.

> 6. Null pattern
>
> > That won't work because it'll look for the :wfoo modifier. However, there
> > are several ways to get the effect you want:
> > /[:w()foo bar]/
> > /[:w[]foo bar]/
>
> Tsk tsk Larry, those look like null patterns to me :-)
>
> While I'm on the subject.. why not allow <> as the match-always assertion?
> It might conflict with huffman encoding, but I certainly don't think <>
> could ethically mean anything other than this. And <!> would ofcourse be
> the match-never assertion.

You could always use <(1)> and <(0)>, which are more SWIMmy :)

> 7. The :: operator
>
> > :: # fail all |'s when backtracking
>
> > If you backtrack across it, it fails all the way out of the current
> > list of alternatives.
>
> This suggests that if you do:
> [ foo [ bar :: ]? | foo ( \w+ ) ]
> that if it backtracks over the :: it will break out of the outermost [],
> since the innermost isn't a list of alternatives.
>
> Or does it simply break out of the innermost group, and are the
> descriptions chosen a bit poorly?

I think that's the one. It would make sense, since a list of
alternatives is either surrounded by brackets or the rule boundaries.

> That's it for now I think.. maybe I'll find more later :)

These were stumpers. Thanks! :)

Luke

Matthijs Van Duin

unread,

Mar 17, 2003, 1:49:36 PM3/17/03

to perl6-l...@perl.org, Luke Palmer

On Mon, Mar 17, 2003 at 11:17:21AM -0700, Luke Palmer wrote:
>> > <name(expr)> # call rule, passing Perl args
>> > { .name(expr) } # same thing.
>>

>> Considering perl can't sanely know how to backtrack into a closure,
>> wouldn't { .name(expr) } be equal to <name(expr)>: instead?
>

>Nope. <name(expr)>: is equivalent to { .name{expr} }: . It does know
>how to backtrack into a closure: it skips right by it (or throws an
>exception through it... not sure which) and tries again.
>Hypotheticals make this function properly.

That sounds very unlikely, and is also contradicted by earlier messages,
like: "closures don't normally get control back when backtracked over"
-- Larry Wall in http://nntp.x.perl.org/group/perl.perl6.language/10781

Hypothetical variables make things work right when backtracking *over* a
closure, but certainly not *into* one.

I'm talking about cases like:

rule foo { a+ }
rule bar { { .foo } ab }

my intuition says this equals { [a+]: ab } and hence never matches

>Sounds like continuation-passing style. Yes, you can backtrack
>through code with continuation-passing style. Continuations have yet
>to be introduced into the language.

You don't need continuations to do this though, you can do it in plain
perl code too. For example:

rule test { <foo> <bar> }
-->
method test (&cont) { .foo({ .bar(&cont) }) }

foo gets a closure that represents the rest of the match (bar followed by
whatever comes after test) and if it succeeds, invokes the closure hence
calling bar. if bar fails, it returns to foo which can then try a different
match and call the closure again. If all parts match than the final closure
will be called (passed by the match-function to the original rule) which
does something to return the final version of the state object to the
original called - for example using an exception.

I'm not saying rules will be implemented in such a way, but it's the first
thing that comes to mind.

> rule somerule($0) {}

I meant ofcourse as a method (since rules are just methods if I understood
correctly); to do the matching yourself rather than with perl 6 regex.

>> Considering <prop> means "matches a character with property prop", it
>> seems to me <!prop> would mean the ZERO-WIDTH assertion "does not match a
>> character with property prop", rather than "match a character without
>> property prop".
>
>Right. It has to be. There is no way to implement it in a
>sufficiently general way otherwise.

Hence the example of saying \P{prop} becomes <!prop> is wrong; it actually
becomes <-prop>, right?

>> While I'm on the subject.. why not allow <> as the match-always assertion?
>> It might conflict with huffman encoding, but I certainly don't think <>
>> could ethically mean anything other than this. And <!> would ofcourse be
>> the match-never assertion.
>
>You could always use <(1)> and <(0)>, which are more SWIMmy :)

Ick, ugly; I'd rather use <null> and <!null> than those, but <> and <!>
are shorter, and have (to me) fairly obvious meanings. But it was just a
random suggestion; I'm not going to actively try to advocate them if
they're not liked :-)

Luke Palmer

unread,

Mar 17, 2003, 2:09:18 PM3/17/03

to p...@nubz.org, perl6-l...@perl.org

> On Mon, Mar 17, 2003 at 11:17:21AM -0700, Luke Palmer wrote:
> >> > <name(expr)> # call rule, passing Perl args
> >> > { .name(expr) } # same thing.
> >>
> >> Considering perl can't sanely know how to backtrack into a closure,
> >> wouldn't { .name(expr) } be equal to <name(expr)>: instead?
> >
> >Nope. <name(expr)>: is equivalent to { .name{expr} }: . It does know
> >how to backtrack into a closure: it skips right by it (or throws an
> >exception through it... not sure which) and tries again.
> >Hypotheticals make this function properly.
>
> That sounds very unlikely, and is also contradicted by earlier messages,
> like: "closures don't normally get control back when backtracked over"
> -- Larry Wall in http://nntp.x.perl.org/group/perl.perl6.language/10781
>
> Hypothetical variables make things work right when backtracking *over* a
> closure, but certainly not *into* one.
>
> I'm talking about cases like:
>
> rule foo { a+ }
> rule bar { { .foo } ab }
>
> my intuition says this equals { [a+]: ab } and hence never matches

Oh, right, sorry. That was a braino on my part. That is what I
meant, but I didn't realize that { .name{expr} }: was redundant.

> > rule somerule($0) {}
>
> I meant ofcourse as a method (since rules are just methods if I understood
> correctly); to do the matching yourself rather than with perl 6 regex.

Ohh. That has to do with the parse object, which hasn't been covered
yet. The *signature* is probably just:

grammar FooBar {
method somerule() {...}
}

Where the invocant would be a FooBar, which presumably inherits from
something like ParseObject.

> >> Considering <prop> means "matches a character with property prop", it
> >> seems to me <!prop> would mean the ZERO-WIDTH assertion "does not match a
> >> character with property prop", rather than "match a character without
> >> property prop".
> >
> >Right. It has to be. There is no way to implement it in a
> >sufficiently general way otherwise.
>
> Hence the example of saying \P{prop} becomes <!prop> is wrong; it actually
> becomes <-prop>, right?

Probably. Larry might have something different in mind, but what
you've said seems the obvious solution at the moment.

> >> While I'm on the subject.. why not allow <> as the match-always assertion?
> >> It might conflict with huffman encoding, but I certainly don't think <>
> >> could ethically mean anything other than this. And <!> would ofcourse be
> >> the match-never assertion.
> >
> >You could always use <(1)> and <(0)>, which are more SWIMmy :)
>
> Ick, ugly; I'd rather use <null> and <!null> than those, but <> and <!>
> are shorter, and have (to me) fairly obvious meanings. But it was just a
> random suggestion; I'm not going to actively try to advocate them if
> they're not liked :-)

<null> and <!null>, good idea. I didn't even think of that. I think
<> and <!> are violating the whole point of introducing <null>: that
it was unclear what was meant by //, and it's too easy to do. They're
not quite as bad, but it doesn't seem right. <null> and <!null> are
good.

(Forgive me for being a little braindead; it's been a while since
I've read A5)

Luke

Matthijs Van Duin

unread,

Mar 17, 2003, 2:09:32 PM3/17/03

to perl6-l...@perl.org

On Mon, Mar 17, 2003 at 07:49:36PM +0100, Matthijs van Duin wrote:
>(blah blah I wrote on closures and rule-invocation)

>
>I'm not saying rules will be implemented in such a way, but it's the first
>thing that comes to mind.

Before anyone replies, I just realized I should probably just first browse
around in parrot since regex is already implemented ;-)

Luke Palmer

unread,

Mar 17, 2003, 2:14:00 PM3/17/03

to p...@nubz.org, perl6-l...@perl.org

> On Mon, Mar 17, 2003 at 07:49:36PM +0100, Matthijs van Duin wrote:
> >(blah blah I wrote on closures and rule-invocation)
> >
> >I'm not saying rules will be implemented in such a way, but it's the first
> >thing that comes to mind.
>
> Before anyone replies, I just realized I should probably just first browse
> around in parrot since regex is already implemented ;-)

No---you shouldn't do that. Regex (in languages/perl6) is a naive and
is due for a rewrite. I've volunteered to do that (after I do the
type system (no, I'm not being ambitious or anything :-P )). I'd
appreciate some help in the design and implementation, so feel free to
jump in!

Luke

Matthijs Van Duin

unread,

Mar 17, 2003, 2:33:14 PM3/17/03

to perl6-l...@perl.org, Luke Palmer

On Mon, Mar 17, 2003 at 12:14:00PM -0700, Luke Palmer wrote:
>> Before anyone replies, I just realized I should probably just first browse
>> around in parrot since regex is already implemented ;-)
>
>No---you shouldn't do that. Regex (in languages/perl6) is a naive and
>is due for a rewrite.

And I just realized the issue of subrules hasn't even been touched yet.

If regular subroutine invocation were used, then everything would be
cleaned up rather the rule matches, and therefore there would be no way
to backtrack into the subrule.

The subrule *has* to do a callback into the enclosing rule to match the
remainder. That way backtracking into the subrule is simply a return.
It's a simple and neat solution. (note that this isn't a continuation -
when you invoke a continuation, it actually never returns)

This may start to sound a bit like perl6 implementation rather than
language, but it most certainly *does* have impact on the language;
specifically on the calling conventions of rules.

I have seen the following two ways to invoke a rule (in A5 etc):

Grammar.rule($matchstring) # $matchstring =~ /<Grammar::rule>/
$state.rule() # <rule>: inside another rule

Note how it's <rule>: (with colon) because without any closure passed, it
can never backtrack into the subrule.

So $state.rule() must have an optional closure parameter, and Grammer.rule()
probably also needs an additional optional parameter: the modifiers

This makes the class method behave rather differently from the object
method.. can this be consolidated?

>I've volunteered to do that (after I do the type system (no, I'm not
>being ambitious or anything :-P )). I'd appreciate some help in the
>design and implementation, so feel free to jump in!

I dunno, I'm already quite busy... but I'd say I'm helping with the design
as we speak :-)