Rule Parameters

3 views
Skip to first unread message

Rod Adams

unread,
Mar 2, 2005, 12:06:17 AM3/2/05
to Perl6 Language List

Since the line between rules and subs is already blurring significantly,
I want to blur it a little more. I want to write rules which can take
parameters.

Consider that I am parsing HTML (a very frequent occurrence), and wish
to make a Rule that matches a balanced tag from open to close. I want
to use the same code many different times, but for different tags. So I
really want to say something like:

rule baltag (Rule|Str $<tag>) {
\< $<tag> \s* $<options> := (.*?) \>
$<body> := (.*?)
\</ $<tag> \>
}

I could then do:

$buffer ~~ / <baltag title> /;

later on to match any <title> tag in my buffer.


I'm open to alternative syntaxs, this one was just there to illustrate
my point.

-- Rod Adams.


Luke Palmer

unread,
Mar 2, 2005, 12:17:08 AM3/2/05
to Rod Adams, Perl6 Language List
Rod Adams writes:
> Since the line between rules and subs is already blurring significantly,
> I want to blur it a little more. I want to write rules which can take
> parameters.

No no no! That's too powerful.

Wow, skimming through both S5 and A5 and I see no mention of such a
thing. I know we've had it planned for quite a while.

> Consider that I am parsing HTML (a very frequent occurrence), and wish
> to make a Rule that matches a balanced tag from open to close. I want
> to use the same code many different times, but for different tags. So I
> really want to say something like:
>
> rule baltag (Rule|Str $<tag>) {
> \< $<tag> \s* $<options> := (.*?) \>
> $<body> := (.*?)
> \</ $<tag> \>
> }

Replace $<tag> with $tag and you're all set. We may allow putting
$<tag> directly in the parameter list for inclusion in the parse tree.

Luke

Larry Wall

unread,
Mar 2, 2005, 12:32:28 AM3/2/05
to Perl6 Language List
On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote:
: Since the line between rules and subs is already blurring significantly,
: I want to blur it a little more. I want to write rules which can take
: parameters.

No problem. That's how the arguments to rules like <before foo> are
already passed. If I recall, we originally specified three basic forms:

<foo bar> # bar is pattern
<foo: bar> # bar is string
<foo(bar)> # bar is Perl expression

though the middle one of those is the weakest, since it's equivalent to

<foo('bar')>

For that matter, the first one is just

<foo(/bar/)>

Anyway, these forms are somewhat negotiable yet. In recognition that
these are all methods underneath, I seem to recall switching the most
generic form to

<.foo()>

at some point, but I could be hallucinating. We could certaintly just
get by with

<foo pattern>

and

<.foo(@arguments)>

But I kind of like the sub syntax even if they're really methods underneath.
I dunno...

Of course, one can call them like ordinary methods too, as long as one
supplies an appropriate pattern-matching context invocant. But it's
pretty handy to have that magically supplied for you inside <...>.

Larry

Patrick R. Michaud

unread,
Mar 2, 2005, 10:46:51 AM3/2/05
to Perl6 Language List
On Tue, Mar 01, 2005 at 09:32:28PM -0800, Larry Wall wrote:
> On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote:
> : Since the line between rules and subs is already blurring significantly,
> : I want to blur it a little more. I want to write rules which can take
> : parameters.
>
> No problem. That's how the arguments to rules like <before foo> are
> already passed. If I recall, we originally specified three basic forms:
> <foo bar> # bar is pattern
> <foo: bar> # bar is string
> <foo(bar)> # bar is Perl expression

Yes, this is written in A05, although it's often hard to spot and
easy to overlook. They're in the large table under "Metacharacter reform":

<name(expr)> # call rule, passing Perl args
{ .name(expr) } # same thing.

<$var(expr)> # call rule indirectly by name
{ .$var(expr) } # same thing.

<name pat> # call rule, passing regex arg
{ .name(/pat/) } # same thing.

# maybe...
<name: text> # call rule, passing string
{ .name(q<text>) } # same thing.

The argument form of subrules is not currently mentioned in S05.

I've been designing and implementing PGE consistent with the
above syntaxes.

Pm

Rod Adams

unread,
Mar 2, 2005, 1:42:08 PM3/2/05
to Perl6 Language List
Larry Wall wrote:

>On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote:
>: Since the line between rules and subs is already blurring significantly,
>: I want to blur it a little more. I want to write rules which can take
>: parameters.
>
>No problem. That's how the arguments to rules like <before foo> are
>already passed.
>

Excellent!

>Of course, one can call them like ordinary methods too, as long as one
>supplies an appropriate pattern-matching context invocant. But it's
>pretty handy to have that magically supplied for you inside <...>.
>
>

Now for the tricky part.

rule baltag (Str $tag, Str $body is rw) {
\< $tag .*? \>


$<body> := (.*?)
\</ $tag \>
}

$buffer ~~ / <baltag title $<body>> .* \> <$body> \< /;

In other words, I want to pass a possibly unbound hypothetical into a
subrule, and let the subrule bind/rebind it.

Alternatively would be a syntax for calling the subrule, and then
binding a hypothetical to one of the hypotheticals returned in the
subrule. I'm moderately sure S05 made this possible, but I couldn't put
all the pieces together. I'll hazard the following guess:

rule baltag (Str $tag) {
\< $tag .*? \>


$<body> := (.*?)
\</ $tag \>
}

$buffer ~~ / $<btag> := <baltag title> <($<body> := $btag<body>)> /;

Which seems very clumsy, especially if I wish to call it a lot. Be nicer
if I could push the work onto the subrule.

>Larry
>
>
>
>

Rod Adams

unread,
Mar 2, 2005, 2:24:59 PM3/2/05
to Perl6 Language List
Patrick R. Michaud wrote:

Thanks for pointing that out, Patrick. I'm impressed with how you've
assimilated all the S's & A's. (And yes, I love that the guy in charge
of implementing the language has that ability.)

And now some questions to hammer out some details on passing args to
subrules:

What if you wish to pass two args, the first a string, the second a rule?
Are you then forced to use <name(expr, expr)> syntax?

Does <name: text1 text2> get handled as <name(q<text1 text2>)> or as
<name(q<text1>, q<text2>)>, in which case it's really qw//?

If I declare my rule as:

rule MyRule (Str $text) { ... }

Will the P6RE be smart enough to pass:

m:{<MyRule /thisdir/>}

As a string, not the rule that it looks like?

If I define a named rule, how do I get a reference to it from outside a
rule?
A05 leads me to think that

my $rx := Rule.MyRule;

will work, assuming Rule is the default gramme. I would also suspect
that one could reverse this.

my $rx = rule { ... };
Rule.MyRule := $rx;

Thus defining a new rule dynamically.


-- Rod Adams


Larry Wall

unread,
Mar 2, 2005, 3:05:37 PM3/2/05
to Perl6 Language List
On Wed, Mar 02, 2005 at 01:24:59PM -0600, Rod Adams wrote:
: Thanks for pointing that out, Patrick. I'm impressed with how you've
: assimilated all the S's & A's. (And yes, I love that the guy in charge
: of implementing the language has that ability.)

Yes, Patrick is a jewel. We'll probably wear some new facets on him though.

: And now some questions to hammer out some details on passing args to

: subrules:
:
: What if you wish to pass two args, the first a string, the second a rule?
: Are you then forced to use <name(expr, expr)> syntax?

Yes, or pass a string and parse it yourself.

: Does <name: text1 text2> get handled as <name(q<text1 text2>)> or as

: <name(q<text1>, q<text2>)>, in which case it's really qw//?

The former. It's a single string, which you can parse however you like.
Though I suppose we could extend the colon to a colon modifier:

<name:w text1 text2>

That's getting a little weird though, considering that in most other cases
such modifiers are outside the delimiters. Here's a really weird one:

<name:here END>

I'm more inclined to say that anything beyond a bare string has to use
function notation. I'm still not entirely sure we should even have a
string notation. Its utility/clutter ratio is pretty low.

: If I declare my rule as:


:
: rule MyRule (Str $text) { ... }
:
: Will the P6RE be smart enough to pass:
:
: m:{<MyRule /thisdir/>}
:
: As a string, not the rule that it looks like?

Probably not, given the late-binding issues and the desire to treat
patterns as first-class language. That's why we're differentiating
the call syntax without reference to the called signature. If you
want delayed compilation of the pattern, you can get it by passing
a string. But then any parentheses in it don't count as parens in
the outer rule. With <before ([A-Z]+)> we can treat the lookahead
parens as an ordinary capture because the parens are parsed at the
same time the outer rule is parsed.

: If I define a named rule, how do I get a reference to it from outside a

: rule?
: A05 leads me to think that
:
: my $rx := Rule.MyRule;
:
: will work, assuming Rule is the default gramme. I would also suspect
: that one could reverse this.
:
: my $rx = rule { ... };
: Rule.MyRule := $rx;
:
: Thus defining a new rule dynamically.

You have to use &Rule::MyRule to refer to the method by name. Rule.MyRule
would invoke the MyRule method as a class method in the Rule class, since
method calls do not require parens if there are no arguments.

Larry

Larry Wall

unread,
Mar 2, 2005, 3:30:49 PM3/2/05
to Perl6 Language List
On Wed, Mar 02, 2005 at 12:42:08PM -0600, Rod Adams wrote:
: Larry Wall wrote:
:
: >On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote:
: >: Since the line between rules and subs is already blurring significantly,
: >: I want to blur it a little more. I want to write rules which can take
: >: parameters.
: >
: >No problem. That's how the arguments to rules like <before foo> are
: >already passed.
: >
: Excellent!
:
: >Of course, one can call them like ordinary methods too, as long as one
: >supplies an appropriate pattern-matching context invocant. But it's
: >pretty handy to have that magically supplied for you inside <...>.
: >
: >
:
: Now for the tricky part.
:
: rule baltag (Str $tag, Str $body is rw) {
: \< $tag .*? \>
: $<body> := (.*?)
: \</ $tag \>
: }

Well, that would be written:

rule baltag (Str $tag, Str $body is rw) {
\< $tag .*? \>
$body := (.*?)
\</ $tag \>
}

since $body is a real variable rather than a hash entry. However,
I don't think binding works even in an ordinary sub--rebinding the
variable would break its association with whatever actual parameter.
You'd need to use assignment:

rule baltag (Str $tag, Str $body is rw) {
\< $tag .*? \>

(.*?)
\</ $tag \>
{ let $body = $1 }
}

: $buffer ~~ / <baltag title $<body>> .* \> <$body> \< /;

That'd have to be something like:

$buffer ~~ / <baltag(/title/,$<body>)> .* \> <$body> \< /;

But in theory it should autovivify the $<body> for you based on the "is rw".

: In other words, I want to pass a possibly unbound hypothetical into a

: subrule, and let the subrule bind/rebind it.

Can only assign it. You'd have to pass the actual symbol table holding
$<body> if you want to rebind the name. You could presumably pass
in $/ as an argument and hash it explicitly.

: Alternatively would be a syntax for calling the subrule, and then

: binding a hypothetical to one of the hypotheticals returned in the
: subrule. I'm moderately sure S05 made this possible, but I couldn't put
: all the pieces together. I'll hazard the following guess:
:
: rule baltag (Str $tag) {
: \< $tag .*? \>
: $<body> := (.*?)
: \</ $tag \>
: }
:
: $buffer ~~ / $<btag> := <baltag title> <($<body> := $btag<body>)> /;

That's essentially correct, though that should probably be {...}
instead of <(...)>, in case the body has the value "0" which would
cause the <(...)> assertion to fail. There might also be some
circumstances in which you need a "let" in there if there's any
possibility of backtracking over the binding but still having access
to the match object.

: Which seems very clumsy, especially if I wish to call it a lot. Be nicer

: if I could push the work onto the subrule.

Well, if you never want <baltag> to return the tags, then you can just
say something like:

rule baltag (Str $tag) {
\< $tag .*? \>

$0 := (.*?)
\</ $tag \>
}

$buffer ~~ / $<body> := <baltag title> /;

But you probably want to know what attributes that first .*? matched too.

Larry

Patrick R. Michaud

unread,
Mar 2, 2005, 3:27:05 PM3/2/05
to Perl6 Language List
> : Does <name: text1 text2> get handled as <name(q<text1 text2>)> or as
> : <name(q<text1>, q<text2>)>, in which case it's really qw//?
>
> The former. It's a single string, which you can parse however you like.
> Though I suppose we could extend the colon to a colon modifier:
>
> <name:w text1 text2>
>
> That's getting a little weird though, considering that in most other cases
> such modifiers are outside the delimiters. Here's a really weird one:
>
> <name:here END>
>
> I'm more inclined to say that anything beyond a bare string has to use
> function notation.

I'm inclined to say this from a PGE implementation perspective, at
least for the short-term.

As for the rest, I agree with Larry. :-)

Pm

Reply all
Reply to author
Forward
0 new messages