Since the line between rules and subs is already blurring significantly, I want to blur it a little more. I want to write rules which can take parameters.
Consider that I am parsing HTML (a very frequent occurrence), and wish to make a Rule that matches a balanced tag from open to close. I want to use the same code many different times, but for different tags. So I really want to say something like:
Rod Adams writes: > Since the line between rules and subs is already blurring significantly, > I want to blur it a little more. I want to write rules which can take > parameters.
No no no! That's too powerful.
Wow, skimming through both S5 and A5 and I see no mention of such a thing. I know we've had it planned for quite a while.
> Consider that I am parsing HTML (a very frequent occurrence), and wish > to make a Rule that matches a balanced tag from open to close. I want > to use the same code many different times, but for different tags. So I > really want to say something like:
On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote:
: Since the line between rules and subs is already blurring significantly, : I want to blur it a little more. I want to write rules which can take : parameters.
No problem. That's how the arguments to rules like <before foo> are already passed. If I recall, we originally specified three basic forms:
<foo bar> # bar is pattern <foo: bar> # bar is string <foo(bar)> # bar is Perl expression
though the middle one of those is the weakest, since it's equivalent to
<foo('bar')>
For that matter, the first one is just
<foo(/bar/)>
Anyway, these forms are somewhat negotiable yet. In recognition that these are all methods underneath, I seem to recall switching the most generic form to
<.foo()>
at some point, but I could be hallucinating. We could certaintly just get by with
<foo pattern>
and
<.foo(@arguments)>
But I kind of like the sub syntax even if they're really methods underneath. I dunno...
Of course, one can call them like ordinary methods too, as long as one supplies an appropriate pattern-matching context invocant. But it's pretty handy to have that magically supplied for you inside <...>.
On Tue, Mar 01, 2005 at 09:32:28PM -0800, Larry Wall wrote: > On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote: > : Since the line between rules and subs is already blurring significantly, > : I want to blur it a little more. I want to write rules which can take > : parameters.
> No problem. That's how the arguments to rules like <before foo> are > already passed. If I recall, we originally specified three basic forms: > <foo bar> # bar is pattern > <foo: bar> # bar is string > <foo(bar)> # bar is Perl expression
Yes, this is written in A05, although it's often hard to spot and easy to overlook. They're in the large table under "Metacharacter reform":
Larry Wall wrote: >On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote: >: Since the line between rules and subs is already blurring significantly, >: I want to blur it a little more. I want to write rules which can take >: parameters.
>No problem. That's how the arguments to rules like <before foo> are >already passed.
Excellent!
>Of course, one can call them like ordinary methods too, as long as one >supplies an appropriate pattern-matching context invocant. But it's >pretty handy to have that magically supplied for you inside <...>.
In other words, I want to pass a possibly unbound hypothetical into a subrule, and let the subrule bind/rebind it.
Alternatively would be a syntax for calling the subrule, and then binding a hypothetical to one of the hypotheticals returned in the subrule. I'm moderately sure S05 made this possible, but I couldn't put all the pieces together. I'll hazard the following guess:
Patrick R. Michaud wrote: >On Tue, Mar 01, 2005 at 09:32:28PM -0800, Larry Wall wrote:
>>On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote: >>: Since the line between rules and subs is already blurring significantly, >>: I want to blur it a little more. I want to write rules which can take >>: parameters.
>>No problem. That's how the arguments to rules like <before foo> are >>already passed. If I recall, we originally specified three basic forms: >> <foo bar> # bar is pattern >> <foo: bar> # bar is string >> <foo(bar)> # bar is Perl expression
>Yes, this is written in A05, although it's often hard to spot and >easy to overlook. They're in the large table under "Metacharacter reform":
>The argument form of subrules is not currently mentioned in S05.
>I've been designing and implementing PGE consistent with the >above syntaxes.
Thanks for pointing that out, Patrick. I'm impressed with how you've assimilated all the S's & A's. (And yes, I love that the guy in charge of implementing the language has that ability.)
And now some questions to hammer out some details on passing args to subrules:
What if you wish to pass two args, the first a string, the second a rule? Are you then forced to use <name(expr, expr)> syntax?
Does <name: text1 text2> get handled as <name(q<text1 text2>)> or as <name(q<text1>, q<text2>)>, in which case it's really qw//?
If I declare my rule as:
rule MyRule (Str $text) { ... }
Will the P6RE be smart enough to pass:
m:{<MyRule /thisdir/>}
As a string, not the rule that it looks like?
If I define a named rule, how do I get a reference to it from outside a rule? A05 leads me to think that
my $rx := Rule.MyRule;
will work, assuming Rule is the default gramme. I would also suspect that one could reverse this.
On Wed, Mar 02, 2005 at 01:24:59PM -0600, Rod Adams wrote:
: Thanks for pointing that out, Patrick. I'm impressed with how you've : assimilated all the S's & A's. (And yes, I love that the guy in charge : of implementing the language has that ability.)
Yes, Patrick is a jewel. We'll probably wear some new facets on him though.
: And now some questions to hammer out some details on passing args to : subrules: : : What if you wish to pass two args, the first a string, the second a rule? : Are you then forced to use <name(expr, expr)> syntax?
Yes, or pass a string and parse it yourself.
: Does <name: text1 text2> get handled as <name(q<text1 text2>)> or as : <name(q<text1>, q<text2>)>, in which case it's really qw//?
The former. It's a single string, which you can parse however you like. Though I suppose we could extend the colon to a colon modifier:
<name:w text1 text2>
That's getting a little weird though, considering that in most other cases such modifiers are outside the delimiters. Here's a really weird one:
<name:here END>
I'm more inclined to say that anything beyond a bare string has to use function notation. I'm still not entirely sure we should even have a string notation. Its utility/clutter ratio is pretty low.
: If I declare my rule as: : : rule MyRule (Str $text) { ... } : : Will the P6RE be smart enough to pass: : : m:{<MyRule /thisdir/>} : : As a string, not the rule that it looks like?
Probably not, given the late-binding issues and the desire to treat patterns as first-class language. That's why we're differentiating the call syntax without reference to the called signature. If you want delayed compilation of the pattern, you can get it by passing a string. But then any parentheses in it don't count as parens in the outer rule. With <before ([A-Z]+)> we can treat the lookahead parens as an ordinary capture because the parens are parsed at the same time the outer rule is parsed.
: If I define a named rule, how do I get a reference to it from outside a : rule? : A05 leads me to think that : : my $rx := Rule.MyRule; : : will work, assuming Rule is the default gramme. I would also suspect : that one could reverse this. : : my $rx = rule { ... }; : Rule.MyRule := $rx; : : Thus defining a new rule dynamically.
You have to use &Rule::MyRule to refer to the method by name. Rule.MyRule would invoke the MyRule method as a class method in the Rule class, since method calls do not require parens if there are no arguments.
On Wed, Mar 02, 2005 at 12:42:08PM -0600, Rod Adams wrote: : Larry Wall wrote:
: : >On Tue, Mar 01, 2005 at 11:06:17PM -0600, Rod Adams wrote: : >: Since the line between rules and subs is already blurring significantly, : >: I want to blur it a little more. I want to write rules which can take : >: parameters. : > : >No problem. That's how the arguments to rules like <before foo> are : >already passed. : > : Excellent! : : >Of course, one can call them like ordinary methods too, as long as one : >supplies an appropriate pattern-matching context invocant. But it's : >pretty handy to have that magically supplied for you inside <...>. : > : > : : Now for the tricky part. : : rule baltag (Str $tag, Str $body is rw) { : \< $tag .*? \> : $<body> := (.*?) : \</ $tag \> : }
since $body is a real variable rather than a hash entry. However, I don't think binding works even in an ordinary sub--rebinding the variable would break its association with whatever actual parameter. You'd need to use assignment:
But in theory it should autovivify the $<body> for you based on the "is rw".
: In other words, I want to pass a possibly unbound hypothetical into a : subrule, and let the subrule bind/rebind it.
Can only assign it. You'd have to pass the actual symbol table holding $<body> if you want to rebind the name. You could presumably pass in $/ as an argument and hash it explicitly.
: Alternatively would be a syntax for calling the subrule, and then : binding a hypothetical to one of the hypotheticals returned in the : subrule. I'm moderately sure S05 made this possible, but I couldn't put : all the pieces together. I'll hazard the following guess: : : rule baltag (Str $tag) { : \< $tag .*? \> : $<body> := (.*?) : \</ $tag \> : } : : $buffer ~~ / $<btag> := <baltag title> <($<body> := $btag<body>)> /;
That's essentially correct, though that should probably be {...} instead of <(...)>, in case the body has the value "0" which would cause the <(...)> assertion to fail. There might also be some circumstances in which you need a "let" in there if there's any possibility of backtracking over the binding but still having access to the match object.
: Which seems very clumsy, especially if I wish to call it a lot. Be nicer : if I could push the work onto the subrule.
Well, if you never want <baltag> to return the tags, then you can just say something like: