C<::> in rules

0 views
Skip to first unread message

Patrick R. Michaud

unread,
May 12, 2005, 10:33:37 AM5/12/05
to perl6-l...@perl.org
I have a couple of questions regarding C< :: > in perl 6 rules.
First, a question of verification -- in

$rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ;

"travel by plane jet train tgv today" ~~ $rule

I think the match should fail outright, as opposed to matching "train tgv".
In other words, it acts as though one had written

$rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;

and not

$rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

Does this sound right?

Next on my list, S05 says "It is illegal to use :: outside of
an alternation", but A05 has

/[:w::foo bar]/

which leads me to believe that :: isn't illegal here even though there's
no alternation. I'd like to strike that sentence from S05.

Also, A05 proposes incorrect alternatives to the above

/[:w[]foo bar]/ # null pattern illegal, use <null>
/[:w()foo bar]/ # null capture illegal, and probably undesirable
/[:w\bfoo bar]/ # not exactly the same as above

I'd like to remove those from A05, or at least put an "Update:"
note there that doesn't lead people astray. One option not
mentioned in A05 that we can add there is

/[:w<?null>foo bar]/

which is admittedly ugly.

So, now then, on to the item that got me here in the first place.
The upshot of all of the above is that

rx :w /foo bar/

is not equivalent to

rx /:w::foo bar/

which may surprise a few people. The :: at the beginning of
the pattern effectively anchors the match to the beginning of
the string or the current position -- i.e., it eliminates the
implicit C< .*? > at the start of the match. To put the :w
inside the rule (e.g., in a variable or subrule), one would
have to write it as

rx /[:w::foo bar]/
rx /:w<null>foo bar/

Now then, I don't have a problem at all with this outcome --
but I wanted to let p6l verify my interpretation of things and
make sure it's okay for me to adjust S05/A05 accordingly.

Pm

Aaron Sherman

unread,
May 12, 2005, 12:53:46 PM5/12/05
to Patrick R. Michaud, Perl6 Language List
My take, based on S05:

On Thu, 2005-05-12 at 10:33, Patrick R. Michaud wrote:
> I have a couple of questions regarding C< :: > in perl 6 rules.
> First, a question of verification -- in
>
> $rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ;
>
> "travel by plane jet train tgv today" ~~ $rule
>
> I think the match should fail outright, as opposed to matching "train tgv".

Correct, that's the meaning of ::

S05: "Backtracking over a double colon causes the surrounding group of
alternations to immediately fail:"

Your surrounding group is the entire rule, and thus you fail at that
point.

> In other words, it acts as though one had written
>
> $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
>
> and not
>
> $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

Your two examples fail in the same way because of the fact that the
group IS the whole rule.

> Next on my list, S05 says "It is illegal to use :: outside of
> an alternation", but A05 has
>
> /[:w::foo bar]/

I can't even figure out what that means. :w turns on word mode
(lexically scoped per S05) and "::" is a group-level commit. What are we
committing exactly? Looks like a noop to me, which actually might not be
so bad. However, you're right: this is an error as there are no
alternations.

> which leads me to believe that :: isn't illegal here even though there's
> no alternation. I'd like to strike that sentence from S05.

I don't think it should be removed. You can always use ::: if that's
what you wanted.

> Also, A05 proposes incorrect alternatives to the above
>
> /[:w[]foo bar]/ # null pattern illegal, use <null>

Correct.

> /[:w()foo bar]/ # null capture illegal, and probably undesirable

Correct.

> /[:w\bfoo bar]/ # not exactly the same as above

No, I think that's exactly the same.

> So, now then, on to the item that got me here in the first place.
> The upshot of all of the above is that
>
> rx :w /foo bar/
>
> is not equivalent to
>
> rx /:w::foo bar/

If we feel strongly, it could be special-cased, but your <null> solution
seems fine to me.

--
Aaron Sherman <a...@ajs.com>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback


Patrick R. Michaud

unread,
May 12, 2005, 1:48:16 PM5/12/05
to Aaron Sherman, Perl6 Language List
On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote:
>
> > > /[:w\bfoo bar]/ # not exactly the same as above
> >
> > No, I think that's exactly the same.
>
> What does \b mean again? I assume it's no longer backspace?

For as long as I can remember \b has meant "word boundary" in
regular expressions. :-) :-)

Pm

Patrick R. Michaud

unread,
May 12, 2005, 1:44:55 PM5/12/05
to Aaron Sherman, Perl6 Language List
On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote:
> My take, based on S05:
>
> > In other words, it acts as though one had written
> >
> > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
> >
> > and not
> >
> > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;
>
> Your two examples fail in the same way because of the fact that the
> group IS the whole rule.

False. In the first case the group is the whole rule. In the second
case the group would not include the (implied) '.*?' at the start of
the rule. Perhaps it helps to see the difference if I write it this way:

$rule = rx :w /<null>[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/;

Note that the rule is *unanchored*, thus it tries at the first character,
if it fails then it goes to the second character, if that fails it goes
to the third, etc. Thus, given:

$rule1 = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
$rule2 = rx :w /<null>[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

"travel by plane jet train tgv today" ~~ $rule1; # fails
"travel by plane jet train tgv today" ~~ $rule2; # matches "train tgv"

They're not equivalent.

> > Next on my list, S05 says "It is illegal to use :: outside of
> > an alternation", but A05 has
> >
> > /[:w::foo bar]/
>
> I can't even figure out what that means. :w turns on word mode
> (lexically scoped per S05) and "::" is a group-level commit. What are we
> committing exactly? Looks like a noop to me, which actually might not be
> so bad.

Yes, the point is that it's a no-op, because

/[:wfoo bar:]/

is something entirely different.

> > /[:w\bfoo bar]/ # not exactly the same as above
>
> No, I think that's exactly the same.

Nope. Consider:

$foo = rx /[:w::foo bar]/
$baz = rx /[:w\bfoo bar]/

"myfoo bar" ~~ $foo # matches
"myfoo bar" ~~ $baz # fails, foo is not on a word boundary

Pm

Jonathan Scott Duff

unread,
May 12, 2005, 1:33:59 PM5/12/05
to Aaron Sherman, Patrick R. Michaud, Perl6 Language List
On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote:
> On Thu, 2005-05-12 at 10:33, Patrick R. Michaud wrote:
> > Next on my list, S05 says "It is illegal to use :: outside of
> > an alternation", but A05 has
> >
> > /[:w::foo bar]/
>
> I can't even figure out what that means. :w turns on word mode
> (lexically scoped per S05) and "::" is a group-level commit. What are we
> committing exactly? Looks like a noop to me, which actually might not be
> so bad. However, you're right: this is an error as there are no
> alternations.

I think the definition of :: needs to be changed slightly. You even
used a phrase that isn't exactly true according to spec but would be
if :: meant what I think it should mean. That phrase is ":: is a
group-level commit". This isn't how I read S05 (and apparently how
you and others read it as well, hence your comment to Pm that there


are no alternations). S05 says:

Backtracking over a double colon causes the surrounding group of
alternations to immediately fail:

I think it should simply read:

Backtracking over a double colon causes the surrounding group to
immediately fail:

In other words, the phrase "of alternations" is a red herring.

> > which leads me to believe that :: isn't illegal here even though there's
> > no alternation. I'd like to strike that sentence from S05.
>
> I don't think it should be removed. You can always use ::: if that's
> what you wanted.

I too think it should be stricken.

> > /[:w\bfoo bar]/ # not exactly the same as above
>
> No, I think that's exactly the same.

What does \b mean again? I assume it's no longer backspace?

> > So, now then, on to the item that got me here in the first place.


> > The upshot of all of the above is that
> >
> > rx :w /foo bar/
> >
> > is not equivalent to
> >
> > rx /:w::foo bar/
>
> If we feel strongly, it could be special-cased, but your <null> solution
> seems fine to me.

If :: were to fail the surrounding group we can say that a rule
without [] or () is an implicit group for :: purposes.

-Scott
--
Jonathan Scott Duff
du...@pobox.com

Jonathan Scott Duff

unread,
May 12, 2005, 1:49:28 PM5/12/05
to Patrick R. Michaud, Aaron Sherman, Perl6 Language List

Doh! See how the shiny new perl6 confuses? ;-)

Uri Guttman

unread,
May 12, 2005, 1:53:25 PM5/12/05
to Patrick R. Michaud, Aaron Sherman, Perl6 Language List
>>>>> "PRM" == Patrick R Michaud <pmic...@pobox.com> writes:

PRM> On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote:
>>
>> > > /[:w\bfoo bar]/ # not exactly the same as above
>> >
>> > No, I think that's exactly the same.
>>
>> What does \b mean again? I assume it's no longer backspace?

PRM> For as long as I can remember \b has meant "word boundary" in
PRM> regular expressions. :-) :-)

except in char classes where it gets its backspace meaning back.

:-)

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Aaron Sherman

unread,
May 12, 2005, 2:29:24 PM5/12/05
to Patrick R. Michaud, Perl6 Language List
On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote:
> On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote:

> > > In other words, it acts as though one had written
> > >
> > > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
> > >
> > > and not
> > >
> > > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;
> >
> > Your two examples fail in the same way because of the fact that the
> > group IS the whole rule.
>
> False. In the first case the group is the whole rule. In the second
> case the group would not include the (implied) '.*?' at the start of
> the rule.

That cannot be true. If it were, then:

s/[a]//

and

s/a//

would replace different things, and they MUST NOT. If I've missed some
fundamental way in which rx:p5/(?:...)/ is different from rx/[...]/,
then please let me know. Otherwise, we can simply demonstrate this with
P5:

perl -le '"abcaabbcc" =~ /(?:aa)/;print $&'

and unshockingly, that prints "aa", not "abcaa"

> Note that the rule is *unanchored*, thus it tries at the first character,
> if it fails then it goes to the second character, if that fails it goes
> to the third, etc.

Yes, you're correct, but when you step forward over input in order to
find a start for your unanchored expression, you do NOT consume that
input, grouping or not. To say:

$foo ~~ /unanchored/

is something like

for 0..length($foo)-1 -> $i {
substr($foo,$i) ~~ /^unanchored/;
}

and always has been. Unless I'm unaware of some subtlety of [], it is
just the same as P5's (?:...), which behaves exactly this way.

I'll skip the rest of your post for now, except for the last bit, since
I think we need to resolve which universe we're in before we can give
each other street directions ;-)

> > > /[:w\bfoo bar]/ # not exactly the same as above
> >
> > No, I think that's exactly the same.
>
> Nope. Consider:
>
> $foo = rx /[:w::foo bar]/
> $baz = rx /[:w\bfoo bar]/
>
> "myfoo bar" ~~ $foo # matches
> "myfoo bar" ~~ $baz # fails, foo is not on a word boundary

You're correct, sorry about that.

Patrick R. Michaud

unread,
May 12, 2005, 3:41:24 PM5/12/05
to Aaron Sherman, Perl6 Language List
$rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
$rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

On Thu, May 12, 2005 at 02:29:24PM -0400, Aaron Sherman wrote:
> On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote:
> > On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote:
> > > Your two examples fail in the same way because of the fact that the
> > > group IS the whole rule.
> >
> > False. In the first case the group is the whole rule. In the second
> > case the group would not include the (implied) '.*?' at the start of
> > the rule.
>
> That cannot be true. If it were, then:
> s/[a]//
> and
> s/a//
> would replace different things, and they MUST NOT.

No, /[a]/ is still the same as /a/ here -- I'm not discussing that at
all, nor am I implying any special [] or rule semantics. I'm simply
referring to the fact that the rule is free to step across the
characters in the string, same as you pointed out.

Let me backtrack(!) and try a slightly different example,
first using a group and (::)

$r1 = rx /[abc :: def | ghi :: jkl | mn :: op]/;

"abcdef" ~~ $r1 # matches "abcdef"
"xyzghijkl" ~~ $r1 # matches "ghijkl"
"xyzabcghijkl" ~~ $r1 # matches "ghijkl"

Why does the last one match? Because it fails the group but
doesn't fail the rule -- i.e., the rule is still free to advance
its initial pointer to the next character and try again. Contrast
this with:

$r2 = rx /abc ::: def | ghi ::: jkl | mn ::: op/;

"abcdef" ~~ $r1 # matches "abcdef"
"xyzghijkl" ~~ $r1 # matches "ghijkl"
"xyzabcghijkl" ~~ $r1 # fails!

This one fails, because once we match the "abc", we're commited
to completing the match or failing the rule altogether.

Does this work to convince you that the two expression are indeed
different?

Pm

Aaron Sherman

unread,
May 12, 2005, 5:15:55 PM5/12/05
to Patrick R. Michaud, Perl6 Language List
On Thu, 2005-05-12 at 15:41, Patrick R. Michaud wrote:
> $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
> $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;
>
> On Thu, May 12, 2005 at 02:29:24PM -0400, Aaron Sherman wrote:
> > On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote:

> > > False. In the first case the group is the whole rule. In the second
> > > case the group would not include the (implied) '.*?' at the start of
> > > the rule.

This was a very unfortunate choice of explanations, since an implied
".*?" would change the semantics of the match deeply. However, your
later explanation:

> $r1 = rx /[abc :: def | ghi :: jkl | mn :: op]/;
>
> "abcdef" ~~ $r1 # matches "abcdef"
> "xyzghijkl" ~~ $r1 # matches "ghijkl"
> "xyzabcghijkl" ~~ $r1 # matches "ghijkl"
>
> Why does the last one match? Because it fails the group but
> doesn't fail the rule -- i.e., the rule is still free to advance
> its initial pointer to the next character and try again.

... is very understandable. Now I'm just left with a vague sense that I
never want to see anyone use :: :-)

Patrick R. Michaud

unread,
May 12, 2005, 8:10:37 PM5/12/05
to Aaron Sherman, Perl6 Language List
On Thu, May 12, 2005 at 05:15:55PM -0400, Aaron Sherman wrote:
> On Thu, 2005-05-12 at 15:41, Patrick R. Michaud wrote:
> > False. In the first case the group is the whole rule. In the second
> > case the group would not include the (implied) '.*?' at the start of
> > the rule.
>
> This was a very unfortunate choice of explanations, since an implied
> ".*?" would change the semantics of the match deeply.

I agree, my wording on this wasn't all that clear--I haven't found
a good phrase for "the stepping that takes place at the beginning
of an unanchored match". And in earlier versions of PGE, the
stepping was actually performed by a '.*?' node at the beginning
of the expression tree that didn't participate in the captured
result.

Anyway, we're in agreement as to what :: and ::: do, so I'll propose
changes to S05/A05 and we can go from there. Thanks! :-)

Pm

Larry Wall

unread,
May 12, 2005, 11:56:39 PM5/12/05
to perl6-l...@perl.org
On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote:
: Also, A05 proposes incorrect alternatives to the above
:
: /[:w[]foo bar]/ # null pattern illegal, use <null>
: /[:w()foo bar]/ # null capture illegal, and probably undesirable
: /[:w\bfoo bar]/ # not exactly the same as above
:
: I'd like to remove those from A05, or at least put an "Update:"
: note there that doesn't lead people astray. One option not
: mentioned in A05 that we can add there is
:
: /[:w<?null>foo bar]/
:
: which is admittedly ugly.

I would just like to point out that you are misreading those.
The [] and () above are part of pair syntax, not rule syntax.
Likewise your :w<?null> should be taken to :w('?null'). We used to
try to distinguish modifiers like :w that don't take an argument,
but that's a bad plan. All colon pairs parse alike wherever they
occur. That's why we've required space before bracket delimiters
outside, but the same constraint holds inside rules.

Which means, of course, that we should probably try to figure
what :w($x) actually means... :-)

Speaking of which, it seems to me that :p and :c should allow an
argument that says where to start relative to the current position.
In other words, :p means :p(0) and :c means :c(0). I could also see
uses for :p(-1) and :p(+1).

We could also pass positions as opaque objects, which is another
reason not to consider positions as mere numbers.

Larry

Patrick R. Michaud

unread,
May 13, 2005, 12:26:54 AM5/13/05
to perl6-l...@perl.org
On Thu, May 12, 2005 at 08:56:39PM -0700, Larry Wall wrote:
> On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote:
> : Also, A05 proposes incorrect alternatives to the above
> :
> : /[:w[]foo bar]/ # null pattern illegal, use <null>
> : /[:w()foo bar]/ # null capture illegal, and probably undesirable
> : /[:w\bfoo bar]/ # not exactly the same as above
> :
>
> I would just like to point out that you are misreading those.

Ouch, you're right! I've been looking at patterns too long, I
guess -- thanks for the correction.

> Speaking of which, it seems to me that :p and :c should allow an
> argument that says where to start relative to the current position.
> In other words, :p means :p(0) and :c means :c(0). I could also see
> uses for :p(-1) and :p(+1).

Sounds good to me.

Pm

Markus Laire

unread,
May 13, 2005, 4:43:42 AM5/13/05
to perl6-l...@perl.org
TSa (Thomas Sandlaß) kirjoitti:

> Larry Wall wrote:
>
>> Speaking of which, it seems to me that :p and :c should allow an
>> argument that says where to start relative to the current position.
>> In other words, :p means :p(0) and :c means :c(0). I could also see
>> uses for :p(-1) and :p(+1).
>
>
> Isn't that slightly inconsistent with :p meaning :p(1) the so-called
> "real winner for passing boolean options" of A12?

Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1)
and not :p(1)

--
Markus Laire
<Jam. 1:5-6>

Juerd

unread,
May 13, 2005, 7:21:49 AM5/13/05
to Markus Laire, perl6-l...@perl.org
Markus Laire skribis 2005-05-13 11:43 (+0300):

> Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1)
> and not :p(1)

<aol>
Agreed
</>


Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html

Aaron Sherman

unread,
May 13, 2005, 9:49:46 AM5/13/05
to Perl6 Language List
On Fri, 2005-05-13 at 00:26, Patrick R. Michaud wrote:
> On Thu, May 12, 2005 at 08:56:39PM -0700, Larry Wall wrote:
> > On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote:
> > : Also, A05 proposes incorrect alternatives to the above
> > :
> > : /[:w[]foo bar]/

> > I would just like to point out that you are misreading those.

> I've been looking at patterns too long

You know, this is going to be a problem for a lot of people...

Think of this case:

/:w[foo bar|bar foo]/

I may be in the minority here, but I think we should try to avoid having
[] and () mean different things in different parts of a rule, especially
where one use is VERY common, and the other is obscure at best. I'd even
be ok with only allowing this inside our already highly magical <>:

/<:w>[foo bar|bar foo]/

and

/<:p(false)>/

and

/ <:p5['ponie']> (?{die;}) /

I checked, and while <::...> has a meaning in S05, <:...> does not, so
as long as we never allow a modifier called "::", this would work.

In fact, Larry, I think it's safe to say that <> is actually more
sought-after than that : everyone wants ;-)

Luke Palmer

unread,
May 13, 2005, 11:04:22 AM5/13/05
to Patrick R. Michaud, perl6-l...@perl.org
On 5/12/05, Patrick R. Michaud <pmic...@pobox.com> wrote:
> I have a couple of questions regarding C< :: > in perl 6 rules.
> First, a question of verification -- in
>
> $rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ;
>
> "travel by plane jet train tgv today" ~~ $rule
>
> I think the match should fail outright, as opposed to matching "train tgv".
> In other words, it acts as though one had written
>
> $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
>
> and not
>
> $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

Those both do the same thing (which is the same as your example).
When you fail over the :: after plane, it skips out of the alternation
looking for something to backtrack before it. Since there is nothing,
the rule fails.

> Does this sound right?
>
> Next on my list, S05 says "It is illegal to use :: outside of
> an alternation", but A05 has
>
> /[:w::foo bar]/
>
> which leads me to believe that :: isn't illegal here even though there's
> no alternation. I'd like to strike that sentence from S05

Yeah, I think using :: to break out of the innermost bracketing group
is helpful even without an alternation present.

> Also, A05 proposes incorrect alternatives to the above
>
> /[:w[]foo bar]/ # null pattern illegal, use <null>
> /[:w()foo bar]/ # null capture illegal, and probably undesirable
> /[:w\bfoo bar]/ # not exactly the same as above
>
> I'd like to remove those from A05, or at least put an "Update:"
> note there that doesn't lead people astray. One option not
> mentioned in A05 that we can add there is
>
> /[:w<?null>foo bar]/
>
> which is admittedly ugly.
>
> So, now then, on to the item that got me here in the first place.
> The upshot of all of the above is that
>
> rx :w /foo bar/
>
> is not equivalent to
>
> rx /:w::foo bar/

Yeah, but it is. So no problem. :-)

> which may surprise a few people. The :: at the beginning of
> the pattern effectively anchors the match to the beginning of
> the string or the current position -- i.e., it eliminates the
> implicit C< .*? > at the start of the match.

Ohhh, ohh. There isn't an implicit .*? at the beginning of the match.
It's more like there's an implicit .*? followed by a rule call to the
match. Think of it as that we're trying to match the pattern at any
position rather than there being an implicit .*?.

Luke

Luke Palmer

unread,
May 13, 2005, 11:36:50 AM5/13/05
to Patrick R. Michaud, Perl6 Language List
On 5/13/05, Patrick R. Michaud <pmic...@pobox.com> wrote:
> To use the phrase from later in your message, there's still
> the "implicit .*? followed by the rule call." Since the rule
> itself hasn't failed (only the group failed), we're still free to
> try to match the pattern at later positions.

I'm basically saying that you should treat your:

$str ~~ /abc :: def | ghi :: jkl | mn :: op/;

As:

$rule = rx/abc :: def | ghi :: jkl | mn :: op/;
$str ~~ /^ .*? <$rule>/;

Which means that you fail the rule, your .*? advances to the next
character and tries the rule again.

Maybe I'm misunderstanding your interpretation (when in doubt, explain
with code).

Luke

Patrick R. Michaud

unread,
May 13, 2005, 12:54:47 PM5/13/05
to Luke Palmer, Perl6 Language List
On Fri, May 13, 2005 at 03:36:50PM +0000, Luke Palmer wrote:
> I'm basically saying that you should treat your:
> $str ~~ /abc :: def | ghi :: jkl | mn :: op/;
> As:
> $rule = rx/abc :: def | ghi :: jkl | mn :: op/;
> $str ~~ /^ .*? <$rule>/;
> Which means that you fail the rule, your .*? advances to the next
> character and tries the rule again.

Taking this explanation literally, this would mean that

$rule = rx/abc :: def | ghi :: jkl | mn :: op/;
$rule = rx/abc ::: def | ghi ::: jkl | mn ::: op/;

both succeed against "xyzabc---ghijkl". But even just considering
the :: instance, this interpretation doesn't match what you said
in your original message that :: would fail the rule without
further advancing:

Pm> $rule =3D rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ;
Pm> "travel by plane jet train tgv today" ~~ $rule

LP> When you fail over the :: after plane, it skips out of the alternation
LP> looking for something to backtrack before it. Since there is nothing,
LP> the rule fails.

> Maybe I'm misunderstanding your interpretation (when in doubt, explain
> with code).

One of us is misunderstanding the other. I'll explain with code,
but first let's clarify the difference. I read your first message as
claiming that

$r1 = rx / abc :: def | ghi :: jkl | mn :: op /;

$r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /;
$r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /;

are equivalent. I believe $r2 and $r3 are not equivalent.
For comparison, let's first look at a slightly different example,
and let's avoid subrules they don't provide the auto-advance
of unanchored patterns that forms the crux of my question.

First, I'm quite certain that $r2 and $r3 are different. For
illustration, let's use a variation like:

$q2 = rx / \w [ abc ::: def | ghi ::: jkl | mn ::: op ] /;
$q3 = rx / \w [ [ abc :: def | ghi :: jkl | mn :: op ] ]/;

"xyzabc---xyzghijklmno" ~~ $q2 # fails after seeing "zabc"
"xyzabc---xyzghijklmno" ~~ $q3 # matches "zghijkl"

The difference is precisely the difference between ::: and :: --
the former fails the rule entirely, while the latter simply fails
the current group (of alternations) and tries again.
With :::, an unanchored rule should also stop its process of
"advancing to the next character and trying again".
(Otherwise, "abefgh" ~~ rx / [ ab ::: cd | ef ::: gh ] / succeeds.)

So, by analogy

$r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /;
$r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /;

"xyzabc---xyzghijklmno" ~~ $r2 # fails after seeing "abc"
"xyzabc---xyzghijklmno" ~~ $r3 # matches "ghijkl"

The :: in $r3 doesn't cause the entire rule to fail, just the
group, so the match is free to backtrack and continue its
"advance to the next character and try again". (What the "::"
in $r3 *does* do is to tell the matching engine to not bother
trying the remaining alternatives once it has seen an "abc" at
this point.)

So, going back to the original

$r1 = rx / abc :: def | ghi :: jkl | mn :: op /;

does it work like $r2 or $r3? My gut feeling is that it should
work like $r2 -- i.e., that once we find an "abc" we'll fail the rule
if there's not a "def" following. This also accords with what
others have written in reply, when they say that all three of my
expressions fail in the same way (even though they do not).

However, *if* we say that :: at the top level fails the rule, that
means that as things currently stand

$z1 = rx :w /foo/;
$z2 = rx /:w::foo/;
$z3 = rx /[:w::foo]/;

can be a little surprising:

"hello foo" ~~ $z1 # matches "foo"
"hello foo" ~~ $z2 # fails immediately upon the 'h' != 'f'
"hello foo" ~~ $z3 # matches "foo"

which was the point of my original post. And as I said there, I don't
have a problem with this, I just wanted to make this result didn't
surprise too many others.

I hope this was clear enough -- if not, explain counter examples
in code. :-)

Pm

Larry Wall

unread,
May 13, 2005, 3:26:35 PM5/13/05
to perl6-l...@perl.org
On Fri, May 13, 2005 at 11:43:42AM +0300, Markus Laire wrote:
: Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1)
: and not :p(1)

I'm still not sure I believe in booleans to that extent. I suppose
we could go as far as to make it :p(0 but true). Actually, it's more
like "undef but true", if you want to be able to distinguish

sub foo (+$p = 0) { # no :p at all
say "true" if $p; # :p with no argument
$p //= 42; # :p with no argument
...
}

Or maybe it's something more like "1 but assumed". In any event, it'd
be nice to be able to distinguish :p from :p(1) somehow. Maybe the
Bool type is good enough for that. The bool type probably isn't unless
we depend on autoboxing to turn it into a Bool consistently.

Larry

Larry Wall

unread,
May 13, 2005, 4:07:20 PM5/13/05
to Perl6 Language List
On Fri, May 13, 2005 at 11:54:47AM -0500, Patrick R. Michaud wrote:
: $r1 = rx / abc :: def | ghi :: jkl | mn :: op /;

: $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /;
: $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /;

I would prefer that $r1 work like $r3, not like $r2, for two reasons.
First, it gives a useful distinction of meaning. Second, and more
importantly, outer lexical scopes are often delimited by something
other than what the inner scopes are delimited by. A file scope or an
eval string is no less a block because it's not delimited by curlies.
In the same way, the outer delimiters of an rx/.../ should function
as if you'd said rx/[...]/.

: However, *if* we say that :: at the top level fails the rule, that


: means that as things currently stand
:
: $z1 = rx :w /foo/;
: $z2 = rx /:w::foo/;
: $z3 = rx /[:w::foo]/;
:
: can be a little surprising:
:
: "hello foo" ~~ $z1 # matches "foo"
: "hello foo" ~~ $z2 # fails immediately upon the 'h' != 'f'
: "hello foo" ~~ $z3 # matches "foo"
:
: which was the point of my original post.

And that's the third reason. A :: at the beginning of a "group" should
essentially be a no-op.

By the way, I still think of it as "a group of alternatives" even
if there's only one alternative, and no |. But I can see how that
can be misread to imply at least two alternatives. (We're also
hampered by the linguistic fact that "alternative" can mean either
how many choices you have to make or how many paths are open to you.
In other words, one alternative of the first sort presents you two
alternatives of the second sort. And if there's no alternative, you
only have one alternative. Ain't English wonderful?

Anyway, :: fails the current lexical scope, not the current rule.
::: fails the current rule in a more dynamically scoped way, which
is why it also fails the engine applying the implicit .*?. And of
course, failure to <commit> is almost completely dynamic in scoping.
It's more like unwinding an exception till you find a handler that
knows it's the outer rule.

Larry

Damian Conway

unread,
May 13, 2005, 7:53:48 PM5/13/05
to Larry Wall, perl6-l...@perl.org
Larry wrote:

> I'm still not sure I believe in booleans to that extent. I suppose
> we could go as far as to make it :p(0 but true). Actually, it's more
> like "undef but true", if you want to be able to distinguish
>
> sub foo (+$p = 0) { # no :p at all
> say "true" if $p; # :p with no argument
> $p //= 42; # :p with no argument
> ...
> }

Yes, I was thinking along the same lines. C<undef but true> as a default seems
to be more accurate and useful than C<Bool::true>.

Damian

Luke Palmer

unread,
May 13, 2005, 9:15:36 PM5/13/05
to Patrick R. Michaud, Perl6 Language List
On 5/13/05, Patrick R. Michaud <pmic...@pobox.com> wrote:
> First, I'm quite certain that $r2 and $r3 are different. For
> illustration, let's use a variation like:
>
> $q2 = rx / \w [ abc ::: def | ghi ::: jkl | mn ::: op ] /;
> $q3 = rx / \w [ [ abc :: def | ghi :: jkl | mn :: op ] ]/;
>
> "xyzabc---xyzghijklmno" ~~ $q2 # fails after seeing "zabc"
> "xyzabc---xyzghijklmno" ~~ $q3 # matches "zghijkl"

Okay, I know where the misunderstanding is. When we use these kinds
of examples, let's not rely on the implicit matching semantic. I'm
saying that the above code is equivalent to:

# the following is a rule, so ::: backtracks out of it and no further
rule q2 { \w [ abc ::: def | ghi ::: jkl | mn ::: op ] }
rule q3 { \w [ [ abc :: def | ghi :: jkl | mn :: op ] ] }
"xyzabc---xyzghijklmno" ~~ /^ .*? <q2>/; # ::: backtracks into the .*?
"xyzabc---xyzghijklmno" ~~ /^ .*? <q3>/;

The presence of the \w does nothing, because \w doesn't backtrack.
Alternations and quantifiers backtrack when you fail beyond them, \w
just fails. You never enter the same subpattern (meant in the most
general case: .* is a subpattern, for instance) in the same state.
Something had to change behind you in order for a subpattern to be
re-entered.

I think the misunderstanding is rather simple. You keep talking like
you prepend a .*? to the rule we're matching. I think that's wrong
(and this is where I'm making a design call, so we can dispute on this
once we're clear that it's this that is being disputed). I think
there is a special rule:

rule matchanywhere($rx) { .*? <$rx> }

Which makes a *subrule call* to the rule we're matching. Therefore
::: just breaks out of that subrule, and backtracks into the .*?
again.

Because of this, I think there will be a difference between ::: and
<commit> at the top level, but not :: and :::.

Luke

Larry Wall

unread,
May 13, 2005, 10:19:50 PM5/13/05
to Perl6 Language List
On Sat, May 14, 2005 at 01:15:36AM +0000, Luke Palmer wrote:
: I think the misunderstanding is rather simple. You keep talking like

: you prepend a .*? to the rule we're matching. I think that's wrong
: (and this is where I'm making a design call, so we can dispute on this
: once we're clear that it's this that is being disputed). I think
: there is a special rule:
:
: rule matchanywhere($rx) { .*? <$rx> }
:
: Which makes a *subrule call* to the rule we're matching. Therefore
: ::: just breaks out of that subrule, and backtracks into the .*?
: again.

I want ::: to break out of *that* dynamic scope (or the equivalent
"matchrighthere" scope), but not ::.

Larry

Luke Palmer

unread,
May 14, 2005, 12:26:44 AM5/14/05
to Perl6 Language List

I'm not sure that's such a good idea. When you say:

rule foo() { a* ::: b }

You know precisely where that ::: is going to take you: right out of
the rule. That's the way it works in grammars, and there's no
implicit anything else that you're breaking out of. But you're saying
that when we use a bare // matching a string, that's no longer the
case? In other words, this:

$str ~~ / a* ::: b /

Is different from:

$str ~~ / <foo> /

That seems like a pretty obvious indirection, and a mistake to break
it. There's nothing there except <foo>, how could it act differently?

Luke

Patrick R. Michaud

unread,
May 14, 2005, 1:34:29 AM5/14/05
to Luke Palmer, Perl6 Language List
On Sat, May 14, 2005 at 04:26:44AM +0000, Luke Palmer wrote:
> On 5/14/05, Larry Wall <la...@wall.org> wrote:
> > I want ::: to break out of *that* dynamic scope (or the equivalent
> > "matchrighthere" scope), but not ::.
>
> I'm not sure that's such a good idea. When you say:
>
> rule foo() { a* ::: b }
>
> You know precisely where that ::: is going to take you: right out of
> the rule. [...] But you're saying that when we use a bare //
> matching a string, that's no longer the case? In other words, this:
>
> $str ~~ / a* ::: b /
>
> Is different from:
>
> $str ~~ / <foo> /
>
> That seems like a pretty obvious indirection, and a mistake to break
> it. There's nothing there except <foo>, how could it act differently?

Because $str ~~ / <foo> / puts the ::: in a subrule, whereas
$str ~~ / a* ::: b / does not. It's the same sort of difference
that one gets between

{ return if $a; }

and

sub foo() { return if $a; }

{ foo() }

It's clear that the C<return> in the first case affects control flow in
in the current sub, while the nested C<return> of foo() in the second
case does not.

Pm

Patrick R. Michaud

unread,
May 14, 2005, 4:17:12 PM5/14/05
to Perl6 Language List
On Fri, May 13, 2005 at 01:07:20PM -0700, Larry Wall wrote:
> On Fri, May 13, 2005 at 11:54:47AM -0500, Patrick R. Michaud wrote:
> : $r1 = rx / abc :: def | ghi :: jkl | mn :: op /;
> : $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /;
> : $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /;
>
> I would prefer that $r1 work like $r3, not like $r2, for two reasons.

Now implemented as such in Parrot r8103. And yes, it now means that

rx :w /foo/;
rx /:w::foo/;


rx /[:w::foo]/;

are all identical, which is very nice.

> By the way, I still think of it as "a group of alternatives" even
> if there's only one alternative, and no |. But I can see how that

> can be misread to imply at least two alternatives. [...]


> And if there's no alternative, you only have one alternative.
> Ain't English wonderful?

...and this last bit means we can strike the "It is illegal
to use C<::> outside of an alternation" from S05, since we're always
inside of an alternation (group of alternatives), even if there's
only one alternative.

That sentence has now been struck.

Many thanks for the clarification and discussion.

Pm

Reply all
Reply to author
Forward
0 new messages