Re: Nested captures

10 views
Skip to first unread message

Carl Franks

unread,
May 9, 2005, 7:15:30 AM5/9/05
to perl6-language
Are you subscribed to perl6-compiler?
Yesterday Patrick Michaud posted "PGE features update (corrections)"
which describes the results you've got:

* Match objects for nested captures are nested into the surrounding
capture object. Thus, given

rulesub = p6rule(":w (let) ( (\w+) \:= (\S+) )")
match = rulesub("let foo := 123")

the outer match object contains two match objects ($/[0] and $/[1]),
and the second of these contains two match objects at
$/[1][0] and $/[1][1].

print match # outputs "let foo := 123"
$P0 = match[0] # first subcapture ($1)
print $P0 # outputs "let"
$P0 = match[1] # second subcapture ($2)
$P1 = $P0[0] # first nested capture ($2[0])
print $P1 # outputs "foo"
$P1 = $P0[1] # second nested capture ($2[1])
print $P1 # outputs "123"

Cheers,
Carl

Autrijus Tang

unread,
May 9, 2005, 7:34:29 AM5/9/05
to ca...@fireartist.com, perl6-language
On Mon, May 09, 2005 at 12:15:30PM +0100, Carl Franks wrote:
> Are you subscribed to perl6-compiler?

Yes, of course I am. :-)

> Yesterday Patrick Michaud posted "PGE features update (corrections)"
> which describes the results you've got:

Ahh. I must've missed it. Thanks for the pointer.

/me eagerly awaits new revelation from Damian...

Cheers,
/Autrijus/

Autrijus Tang

unread,
May 9, 2005, 7:05:33 AM5/9/05
to perl6-l...@perl.org
As Pugs now has Rule support via PGE (either with external parrot or a
faster, linked libparrot), I've been playing with the new capturing
semantics.

Currently, matching "123" against /(.(.(.)))/ produces this:

$0: 123
$1: 123
$1[0]: 23
$1[0][0]: 3

Instead of the Perl 5 behaviour:

$0: 123
$1: 123
$2: 23
$3: 3

Is this correct and intended? I tried consulting A/S/E05, but can't
find exact wording that defines this.

Thanks,
/Autrijus/

Damian Conway

unread,
May 9, 2005, 7:22:25 AM5/9/05
to perl6-language
I will be releasing a full description of the new capturing semantics in the
next day or two. It will be appended to the appropriate Synopsis, but I'll
also post it here. It may be as soon as tomorrow, but I'm away teaching this
week, so my time is restricted.

Damian

Damian Conway

unread,
May 9, 2005, 8:51:53 AM5/9/05
to Autrijus Tang, Patrick R. Michaud, perl6-language
Autrijus wrote:

> /me eagerly awaits new revelation from Damian...

Be careful what you wish for. Here's draft zero. ;-)

Note that there may still be bugs in the examples, or even in the design.
@Larry has thrashed this through pretty carefully, and Patrick has implemented
it for PGE, but it's 10.30 at night after a full day's teaching, so I may have
transcribed the post-thrashing, post-implementation corrections incorrectly. %-)

Damian

-----cut----------cut----------cut----------cut----------cut----------cut-----

=head1 Perl 6 rules capturing semantics

=head2 Match objects

All match attempts--successful or not--against any rule, subrule, or
subpattern (see below) return an object of (or derived from) class
C<Match>. That is:

$match_obj = $str ~~ /pattern/;
say "Matched" if $match_obj;

In any code that is not nested inside a rule, this returned object is
also automagically assigned to the lexical C<$/> variable. That is:

$str ~~ /pattern/;
say "Matched" if $/;

In any code that is nested inside a rule, the C<$/> variable holds the
surrounding rule's nascent C<Match> object (which can be modified via the
internal C<$/>. For example:

$str ~~ / foo # Match 'foo'
{ $/ = new Match: :str<bar> } # But pretend we matched 'bar'
/;

C<Match> objects have methods that provide addition information about
the match. For example:

if m/ def <ident> <codeblock> / {
say "Found sub def between index $/.from() and index { $/.to()-1 }";
}

A C<Match> object can also be treated as a boolean, an integer, a
string, an array, or a hash. See below.


=head2 Match results

A failed match returns a C<Match> object whose boolean value is false, whose
integer value is zero, whose string value is C<"">, and whose array and hash
components are empty. For example:

"bard" ~~ /food/;
say "Poet inedible" unless $/;

A successful match returns a C<Match> object whose boolean value is
true, whose integer value is typically 1 (except under the C<:g> or
C<:x> flags; see L<Capturing from non-singular matches>), whose string
value is the complete substring that was matched by the entire rule,
whose array component contains all subpattern (unnamed) captures, and
whose hash component contains all subrule (named) captures. For example:

if ($/) {
$count += $/;
say "Matched the substring: $/";
say "Parens captured: @{$/}";
say 'Subrules captured:';
for %{$/}.kv -> $subrule_name, $substr {
say "\t$subrule_name: $substr";
}
}


=head2 Subpattern captures

Any part of a rule enclosed in capturing parentheses is called a
I<subpattern>. For example:


# subpattern
# _________________/\____________________
# | |
# | subpattern subpattern |
# | __/\__ __/\__ |
# | | | | | |
m:w/ (I am the (walrus), ( khoo )**{2} kachoo) /;


Each subpattern in a rule produces a C<Match> object if it is
successfully matched. This object is assigned into the array inside the
C<Match> object belonging to the surrounding scope -- either the
C<Match> object of the innermost surrounding subpattern (if the
subpattern is nested) or else the C<Match> object of the rule itself.
These assignments to the array are, of course, undone if the subpattern
is backtracked out of.

For example, if the following pattern matched successfully:

# subpat-A
# _________________/\____________________
# | |
# | subpat-B subpat-C |
# | __/\__ __/\__ |
# | | | | | |
m:w/ (I am the (walrus), ( khoo )**{2} kachoo) /;

then the C<Match> objects representing the matches made by subpat-B and
subpat-C would be successively assigned into the array inside subpat-A's
C<Match> object. Then subpat-A's C<Match> object would be assigned into the
array inside the C<Match> object for the entire rule (i.e. C<$/>'s array).

The array elements of a C<Match> object are referred to using either the
standard array access notation (e.g. C<$/[0]>, C<$/[1]>, C<$/[2]>, etc.)
or else via the corresponding lexically scoped numeric aliases (i.e.
C<$1>, C<$2>, C<$3>, etc.)

So:

say "$/[1] found between $/[0] and $/[2]";

is the same as:

say "$2 found between $1 and $3";

Note that the standard array access notation uses zero-based indices
(0,1,2...), whereas the corresponding numeric variables are
numbered by ordinal position (1,2,3...)

Since the array elements of the rule's C<Match> object (i.e. C<$/>)
store individual C<Match> objects representing the substrings that where
matched and captured by the first, second, third, etc. I<outermost>
(i.e. unnested) subpatterns, these elements can be treated like fully
fledged match results. For example:

if m/ (\d\d\d\d)-(\d\d)-(\d\d) (BCE?|AD|CE)?/ {
($yr, $mon, $day) = ($1, $2, $3); # Or: ($yr, $mon, $day) = $/[0..2]
$era = $4 if $4; # Tests if 4th parens matched
@datepos = ($1.from() .. $3.to()-1); # $1, $2, etc. are full Match objs
}


=head2 Nested subpattern captures

Nested subpatterns (i.e. nested capturing parens) are I<not> captured
directly into the array of the rule's C<Match> object. Instead, the
captures made by nested subpatterns appear in the array inside the
C<Match> object belonging to the surrounding subpattern. This is quite
different to Perl 5 semantics:

# Perl 5...
#
# $1----------------------------- $5--------- $6--------------------
# | $2-- $3--------------- | | | | $7-- $8------ |
# | | | | $4-- | | | | | | | | | |
# | | | | | | | | | | | | | | | |
m/ ( The (\S+) (guy|gal|g(\S+) ) ) (sees|calls) ( the (\S+) (gal|guy) ) /;


In Perl 6, nested parens produce properly nested captures:

# Perl 6...
#
# $1----------------------------- $2--------- $3--------------------
# | $1[0] $1[1]------------ | | | | $3[0] $3[1]--- |
# | | | | $1[1][0] | | | | | | | | | |
# | | | | | | | | | | | | | | | |
m/ ( The (\S+) (guy|gal|g(\S+) ) ) (sees|calls) ( the (\S+) (gal|guy) ) /;


This means that the internal structure of the arrays in a rule's final
C<Match> object mirrors (and preserves!) both the nesting structure of
subpatterns in the rule, and the dynamic structure of the hierarchical
way in which those subpatterns matched. This "reconstructability" can be
taken even further (see L<The C<:parsetree> flag> below).

There may also be shortcuts for accessing nested components of a subpattern,
specifically:

# Perl 6...
#
# $1----------------------------- $2--------- $3--------------------
# | $1.1 $1.2------------- | | | | $3.1 $3.2---- |
# | | | | $1.2.1 | | | | | | | | | |
# | | | | | | | | | | | | | | | |
m/ ( The (\S+) (guy|gal|g(\S+) ) ) (sees|calls) ( the (\S+) (gal|guy) ) /;

but this has not yet been decided.


=head2 Quantified subpattern captures

If a subpattern is directly quantified using any quantifier -- except C<?>,
or C<??> -- it no longer produces a single C<Match> object. Instead, it
produces an array of C<Match> objects, which will have been collected
from the sequence of individual matches made by the repeated subpattern.

Because a quantified subpattern returns an array of C<Match> objects,
the corresponding array element for the quantified capture will store an
array reference, rather than a single C<Match> object. For example:

# $1 $2
if m/ (\w+) \: (\w+ \s+)* / {
say "Key was: $1"; # Unquantified subpat produces single Match
say "Values were: @{$2}"; # Quantified subpat produces array of Matches
}

Note that whether a quantified subpattern returns a single C<Match>
object, or an array of C<Match> objects is determined statically (by the
nature of the quantifier), not dynamically (by the actual number of
repetitions that occur in the match).

If a subpattern is directly quantified using the C<?> or C<??> quantifier,
it produces a single C<Match> object. That object is "successful" if the
subpattern did match, and "unsuccessful" if it was skipped. That is:

if m/ next (\w+)? if (.*) / {
say "Found a 'next'";
say "(targeted at $1)" if $1;
say "Condition was: $2";
}

Note that if a capture is quantified as optional in this way, a C<Match>
object is I<always> generated and assigned into the array inside the
surrounding scope's C<Match> object. This ensures that the index/ordinal of
subsequent subpatterns can still be determined statically.


=head2 Indirectly quantified subpattern captures

A subpattern may sometimes be nested inside a quantified non-capturing
structure:

# non-capturing quantified
# __________/\_________ __/\__
# | || |
# | $1 $2 || |
# | _^_ ___^___ || |
# | | | | | || |
m/ [ (\w+) \: (\w+ \s+)* ]**{2...} /

Non-capturing brackets I<don't> create a separate nested lexical scope,
so the two subpatterns inside them are actually still in the rule's top-
level scope. Hence their top-level designations: C<$1> and C<$2>. Such
subpatterns are called "indirectly quantified" subpatterns. In
Perl 5, any repeated captures of this kind:

# Perl 5 equivalent...
m/ (?: (\w+) \: (\w+ \s+)* ){2,} /x

would overwrite the previous captures to C<$1> and C<$2> each time the
surrrounding non-capturing parens iterated. So C<$1> and C<$2> would
contain only the captures from the final repetition.

This does not happen in Perl 6. Any indirectly quantified subpattern is
treated like a directly quantified subpattern. Specifically, an
indirectly quantified subpattern also returns an array of C<Match>
objects, so the corresponding array element for the indirectly
quantified capture will store an array reference, rather than a single
C<Match> object.

if m/ [ (\w+) \: (\w+ \s+)* ]**{2...} / {
say "Keys were: @{$1}";
say "Values were: @{$2}";
}

Remember though that, if the outer quantified structure is a I<capturing>
structure (i.e. a subpattern) then it I<will> introduce a nested
lexical scope. That outer quantified structure will then
return an array of C<Match> objects representing the captures
of the inner parens for I<every> iteration (as described above).

Whereas using non-capturing parentheses for the outer quantifier causes
all of the inner subpatterns to flatten their captures into C<$1> and
C<$2>, using capturing parentheses for the outer quantifier retains the
internal match structure of each repetition. That is:


# $/[0]
# __________/\_________
# | |
# | $/[0][0] $/[0][1] |
# | _^_ ___^___ |
# | | | | | |
if m/ ( (\w+) \: (\w+ \s+)* )**{2...} / {

# Outer subpattern ($/[0]) quantified, so $1 contains an array.
# Let's iterate it...
for @{$1}.kv => $i, $inner_subpatterns {

# First inner subpattern ($/[0][0]) is unquantified, so it
# produces a single Match...
say "Key $i was: $inner_subpatterns[0]";

# Second inner subpattern ($/[0][1]) is quantified, so it
# produces an array of Matches...
say "Values $i were: @{$inner_subpatterns[1]}";
}
}


=head2 Subpattern numbering

As the previous sections explained, the index/ordinal of a given subpattern
can always be statically determined. However, this does not mean that they
have to be monotonically increasing. Indeed, the hierarchical nature of nested
Perl 6 subpatterns already ensures that this is not the case.

But even when there is no nesting of subpatterns it can be much more useful
not to number all top-level subpattern sequentially, as Perl 5 does:

# Perl 5...
# $1 $2 $3 $4 $5 $6
$tune_up5 = qr/ (don't) (ray) (me) (for) (solar tea), (d'oh!)
# $7 $8 $9 $10 $11
| (every) (green) (BEM) (devours) (faces)
/x;

Specifically, there are significant advantages to numbering the
subpatterns in each branch of an alternation (i.e. oneither side of a
C<|>) independently, restarting the numbering at the beginning of each
branch. And this is precisely what Perl 6 does:

# Perl 6...
# $1 $2 $3 $4 $5 $6
$tune_up6 = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!)
# $1 $2 $3 $4 $5
| (every) (green) (BEM) (devours) (faces)
/;

In other words, unlike in Perl 5, in Perl 6 $1 doesn't represent the
capture made by the first subpattern that appears in the rule; it
represents the capture made by the first subpattern of whichever
alternative actually matched.

And that is extremely useful because it means that the array inside <$/>
will not contain large numbers of leading C<undef> values
corresponding to unmatched subpatterns from failed alternatives:

# Perl 5...
@captures = $EGBDF =~ $tune_up5;

# @captures is assigned: ( (undef)x6, qw(every green BEM devours faces) )

Instead, only the "meaningful" subpattern captures are returned:

# Perl 6...
@captures = $EGBDF ~~ $tune_up6;

# @captures is assigned: <every green BEM devours faces>
# (no leading undefs)

A more common example is likely to be a series of alternative commands:

$cmd ~~ m:w/ (put) (\S+) in (\S+)
| (get) (\S+) from (\S+)
| (save) (\S+) to (\S+)
/ or next;

($cmd, $item, $location) = ($1, $2, $3);


Of course, the leading C<undef>s that Perl 5 would produce do convey
(albeit awkwardly) which alternative actually matched. If that
information is important, Perl 6 has several far cleaner ways to
preserve it. For example:

rule alt (Str $n) { {$/ = $n} }

m/ <alt tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
| <alt BEM> (every) (green) (BEM) (devours) (faces)
/;

if ($/) {
given $<alt> {
when 'tea' { say "I hate solar tea" }
when 'BEM' { say "I love bug-eyed monsters" }
}
}


It's even possible to mimic the monotonic Perl 5 semantics. See
L<Numbered scalar aliasing> below for details.


=head2 Subrule captures

Any call to a named rule within a pattern is known as a I<subrule>.

Any bracketed construct that is aliased (see L<Aliasing>) to a
named variable is also a subrule.

For example, this rule contains three subrules:

# subrule subrule subrule
# __^__ _______^______ __^__
# | | | | | |
m/ <ident> $<spaces>:=(\s*) <digit>+ /

Just like subpatterns, each successfully matched subrule within a rule
produces a C<Match> object. But, unlike subpatterns, that C<Match>
object is assigned to an entry of a hash. Specifically, to an entry of
the hash inside the C<Match> object corresponding to the innermost
surrounding rule or subpattern. For example:

# .... $/ ......................................
# : :
# : .......... $/[0] ............ :
# : : : :
# : $/<ident> : $/[0]<ident> : :
# : __^__ : __^__ : :
# : | | : | | : :
m:w/ <ident> \: ( known as <ident> previously )? /


The hash entries of a C<Match> object are referred to using any of the
standard hash access notations (C<$/{'foo'}>, C<< $/<bar> >>, C<$/«baz»>,
etc.), or else via corresponding lexically scoped aliases (C<< $<foo> >>,
C<$«bar»>, C<< $<baz> >>, etc.) So the previous example also implies:

# $<ident> $1<ident>
# __^__ __^__
# | | | |
m:w/ <ident> \: ( known as <ident> previously )? /


In other words, the hash elements of a rule's C<Match> object store nested
C<Match> objects, each of which represents a substring matched-and-captured by
a named subrule call (or by a capture that was aliased to a name using the
C<< $<name>:= >> syntax). For example:

if m/ (<YYYY>)-(<MM>)-(<DD>) $<ERA>:=(BCE?|AD|CE)?/ {
($year, $month, $day) = ($<YYYY>, $<MM>, $<DD>);
$era = $<ERA> if $<ERA>;
@indices = ($<YYYY>.from() .. $<DD>.to()-1);
}

Note that it makes no difference whether the subrule is angle-bracketted (like
C<< <YYYY> >> or aliased (like C<< $<ERA>:= >>. The name's the thing.


=head2 Repeated captures of the same subrule

If a subrule appears two (or more) times in the same lexical scope
within a rule (i.e. within the same subpattern and alternation), or if
the subrule is quantified anywhere within the rule (except with C<?>
or C<??>), then its corresponding hash entry no longer stores a
C<Match> object.

Instead, just like a quantified subpattern, a directly quantified,
indirectly quantified, or explicitly repeated subrule results in an
array of C<Match> objects. Successive matches of the subrule (whether
from separate calls, or from a quantified repetition) append their
individual C<Match> objects to this array. For example, with two or more
subrules of the same name, the corresponding hash entry contains an
reference to an array, which in turn contains the individual C<Match>
objects from each subrule match:

if m:w/ mv <file> <file> / {
$from = $<file>[0];
$to = $<file>[1];
}

Likewise, with an indirectly quantified subrule:

if m:w/ mv [ <file> ]**{2} / {
$from = $<file>[0];
$to = $<file>[1];
}

Likewise, with both repetition and quantification:

if m:w/ mv [ <file> ]+ <file> / {
$to = pop @{$<file>};
@from = @{$<file>};
}

Note that it is always possible to determine statically whether a particular
hash entry in C<$/> will be a scalar, or an array reference, simply by
counting the number of occurrences of the subrule in each lexical scope.

However, if a subrule is explicitly renamed (or aliased -- see L<Aliasing>),
then only the "final" name counts when deciding whether it is or isn't
repeated. For example:

rule dir := rule file;

if m:w/ mv <file> <dir> / { # Only one occurrence of <file>, so scalar
$from = $<file>;
$to = $<dir>;
}


Likewise, I<none> of the following constructions cause C<< <file> >> to
produce an array of C<Match> objects, since in none of them are there
two or more C<< <file> >> subrules in the same lexical scope:

if m:w/ (keep) <file> | (toss) <file> / { # Each <file> is in a separate
# alternation, hence not
# repeated in any one scope
$action = $1;
$target = $<file>;
}

if m:w/ <file> \: (<file>|none)? / { # Second <file> nested in subpattern
# which confers different scope
$actual = $/<file>;
$virtual = $/[0]<file> if $/[0]<file>;
}

On the other hand, unaliased square brackets don't confer a separate
scope (because they don't have an associated C<Match> object). So:

if m:w/ <file> \: [<file>|none]? / { # Second <file> in same scope
$actual = $/<file>[0];
$virtual = $/<file>[1] if $/<file>[1];
}


=head2 Aliasing

Aliases can be named or numbered; may be scalar-, array-, or hash-like;
and may be applied to either capturing or non-capturing constructs.
The following sections explain the semantics of each of those dozen
combinations.


=head3 Named scalar aliases applied to non-capturing brackets

If an named scalar alias is applied to a set of non-capturing brackets:

# ___/non-capturing brackets\__
# | |
# | |
m:w/ $<key>:=[ (<[A-E]>) (\d**{3..6}) (X?) ] /;

then the corresponding entry in the rule's hash is assigned a C<Match> object
whose:

=over

=item *

Boolean value is true,

=item *

Integer value is 1,

=item *

String value is the complete substring matched by the contents of the square
brackets,

=item *

Array and hash are both empty.

=back

This last outcome (the empty hash and array) might be surprising, but
it's a natural consequence of the fact that square brackets do not
create a nested lexical scope, so any subpattern or subrule captures
within the square brackets are in the rule's lexical scope, not in that
of the alias. Consequently, any subpatterns or subrules in the square
brackets still I<do> set the appropriate hash or array entries, but they
set the appropriate hash or array entries of the rule's C<Match> object,
not the C<Match> object of the alias.

That means, if the above example matches successfully:

=over

=item *

C<< $/<key> >> will contain the complete substring matched by the square
brackets (in a C<Match> object, as described above),

=item *

C<< $/[0] >> will contain the A-E letter,

=item *

C<< $/[1] >> will contain the digits,

=item *

C<< $/[2] >> will contain the optional X.

=back


=head3 Named scalar aliasing to subpatterns

On the other hand, if an named scalar alias is applied to a set of
I<capturing> parens:

# ______/capturing parens\_____
# | |
# | |
m:w/ $<key>:=( (<[A-E]>) (\d**{3..6}) (X?) ) /;

then the capturing parens no longer capture into the array of the rule's
C<Match> object (like unadorned parens would). Instead the aliased parens
capture into the hash of the C<Match> object; specifically into the hash
element whose key is the alias name.

So, in the above example, a successful match sets
C<< $<key> >> (i.e. C<< $/<key> >>), but I<not> C<$1> (i.e. not C<< $/[0] >>).

Another way to think about it is that aliased parens create a kind of
lexically scoped named subrule; that the contents of the brackets are
treated as if they were part of a separate subrule whose name is the
alias. That is, the above example is exactly equivalent to:

rule key { (<[A-E]>) (\d**{3..6}) (X?) }
m:w/ <key> /;

Specifically, after either version matches:

=over

=item *

C<< $/<key>[0] >> will contain the A-E letter (in a C<Match> object, of course),

=item *

C<< $/<key>[1] >> will contain the digits,

=item *

C<< $/<key>[2] >> will contain the optional X.

=back

Note that only aliased parens have this "on-the-fly-subrule" effect.
Aliased square brackets (as explained in L<Named scalar aliases applied
to non-capturing brackets>) only capture the substring the square
brackets matched; any internal captures proceed exactly as they
would if the alias were not there.

This can provide a handy optimization when calling a subrule. If only the
complete substring to be matched is of interest, rather than the full
hierarchical capture information, then a pattern like:

m/ <XML_file> /

(which presumably does a large amount of hierarchical capturing and
returns a very complex set of nested C<Match> objects), could be rewritten:

m/ $<XML_str>:=[«XML_file»] /

instead. Here the C<< <XML_file> >> subrule is called using double brackets
instead, which calls it as a non-capturing subrule. It still matches the same
substring, of course, which is then captured by the C<< $<XML_str>:= >> alias.

Note too that, because a subrule call like C<«XML_file»> is a bracketed
non-capturing construct, it obeys the rules for C<[...]> (as described in
L<Named scalar aliases applied to non-capturing brackets>), so the above
optimization could just be written:

m/ $<XML_str>:=«XML_file» /


=head3 Named scalar aliasing to subrules

An unaliased capturing subrule assigns its C<Match> object to the hash
entry whose key is the name of the subrule:

if m:/ ID\: <ident> / {
say "Identified as $/<ident>";
}

But if a subrule is aliased, it assigns its C<Match> object to the hash entry
whose key is the name of the alias instead. And, more importantly, it
I<doesn't> assign anything to the hash entry whose key is the subrule
name. That is:

if m:/ ID\: $<id>:=<ident> / {
say "Identified as $/<id>"; # and $/<ident> is undefined
}

Hence aliasing a subrule I<changes> the destination of the subrule's C<Match>
object. This is particulatly useful for differentiating two or more calls to
the same subrule in the same scope. For example:

if m:w/ mv <file> $<dir>:=<file> / {
$from = $<file>;
$to = $<dir>;
}

In this example, the final match of the C<< <file> >> subrule is not appended
onto an array in C<< $/<file> >>, but is assigned to the hash element
corresponding to the alias name: C<< $/<dir> >>.


=head3 Numbered scalar aliasing

If a numbered alias is used instead of a named alias:

m/ $2:=(<-[:]>*) \: $1:=<ident> /

the behaviour is exactly the same as for a named alias, except that the
resulting C<Match> object is assigned to the corresponding element of
the appropriate array, rather than to an element of the hash.

For example:

m:w/ $1:=[ (<[A-E]>) (\d**{3..6}) (X?) ] /;
# $/[0] contains a match object storing the complete substring
# matched by the square brackets


m:w/ $2:=( (<[A-E]>) (\d**{3..6}) (X?) ) /;
# $/[1] contains the match object returned by the outer subpattern


if m:/ ID\: $3:=<ident> / {
say "Identified as $3"; # and $/<ident> is undefined
}

The only addition behaviour is that, if any numbered alias is used, the
numbering of subsequent unaliased subpatterns in the same scope automatically
increments from that alias number (much like enum values increment from
the last explicit value). That is:

# ---$2--- -$3- ---$7--- -$8-
# | | | | | | | |
m/ $2:=(food) (bard) $7:=(bazd) (quxd) /;


This behaviour is particularly useful for reinstituting Perl5 semantics
for consecutive subpattern numbering in alternations:

$tune_up6 = rx/ (don't) (ray) (me) (for) (solar tea), (d'oh!)
| $7:=(every) (green) (BEM) (devours) (faces)
# $8 $9 $10 $11
/;

It also provides an easy way in Perl 6 to reinstitute the unnested
numbering semantics of nested Perl 5 subpatterns:

# Perl 5...
# $1
# _____________/\______________
# | $2 $3 $4 |
# | __/\___ ____/\____ /\ |
# | | | | | | | |
m/ ( (<[A-E]>) (\d**{3..6}) (X?) ) /;


# Perl 6...
# $1
# _____________/\______________
# | $1[0] $1[1] $1[2] |
# | __/\___ ____/\____ /\ |
# | | | | | | | |
m/ ( (<[A-E]>) (\d**{3..6}) (X?) ) /;


# Perl 6 simulating Perl 5...
# $1
# _______________/\________________
# | $2 $3 $4 |
# | __/\___ ____/\____ /\ |
# | | | | | | | |
m/ $1:=[ (<[A-E]>) (\d**{3..6}) (X?) ] /;

The non-capturing brackets don't introduce a scope, so the subpatterns within
them are at rule scope, and hence numbered at the top level. Aliasing the
square brackets to C<$1> means that the next subpattern at the same level
(i.e. the C<< (<[A-E]>) >>) is numbered sequentially (i.e. C<$2>), etc.


=head3 Scalar aliases applied to quantified constructs

All of the above semantics apply equally to aliases which are applied to
quantified structures. The only difference is that, if the aliased construct
is a subrule or subpattern, that quantified subrule or subpattern will have
returned an array of C<Match> objects (as described in L<Quantified
subpattern captures> and L<Repeated captures of the same subrule>). So
the corresponding array element or hash entry for the alias will contain
an array reference instead of a single C<Match> object. Hence aliasing
and quantification are completely orthogonal.

For example:

if m/ mv $<from>:=<file>+ / {
# <from>+ returns an array of Match objects,
# so $/<from> contains array of Match objects,
# one for each successful call to <file>

# $/<file> does not exist (pre-empted by the alias)
}


if m/ mv $<from>:=(\S+ \s+)+ / {
# Quantified subpattern returns an aray of Match objects, so
# $/<from> contains array of Match objects,
# one for each successful match of the subpattern

# $/[0] does not exist (pre-empted by the alias)
}

A set of quantified I<non-capturing> brackets always returns a
single C<Match> object which contains only the complete substring
that was matched by the full set of repetitions of the brackets (as
described in L<Named scalar aliases applied to non-capturing brackets>).

So, if an alias is applied to a set of quantified I<non-capturing>
brackets, the corresponding array element or hash entry for the alias
will be assigned that single C<Match> object. For example:

"coffee fifo fumble" ~~ m/ .*? $<effs>:=[f <-[f]>**{1..2} \s*]+ /;

say $<effs>; # prints "fee fifo fum"


=head3 Array aliasing

An alias can also be specified using an array as the alias instead of scalar.
For example:

m/ mv @<from>:=[(\S+) \s+]* <dir> /;

Using the C<< @<alias>:= >> notation instead of a C<< $<alias>:= >> has
several effects. The first is that the corresponding hash entry or array
element I<always> receives an array of C<Match> objects, even if the
construct being aliased would normally return a single C<Match>
object. That is:

m/ $<names>:=<ident> /; # $/<names> assigned a single Match object

m/ @<names>:=<ident> /; # $/<names> assigned an array which contains
# a single Match object

This is useful for creating consistent capture semantics across structurally
different alternations (by enforcing array captures in all branches):

m:w/ Mr?s? $<names>:=<ident> W\. $<names>:=<ident>
| Mr?s? @<names>:=<ident>
/;

say "name: @{$<names>}";

If an array alias is applied to a quantified pair of non-capturing
brackets, it captures the substrings matched by each repetition of the
brackets into separate elements of the corresponding array. That is:

m/ mv $<files>:=[ f.. \s* ]* /; # $<files> assigned a single Match
# object containing the
# complete substring matched by
# the full set of repetitions
# of the non-capturing brackets

m/ mv @<files>:=[ f.. \s* ]* /; # $<files> assigned an array, each
# element of which is a C<Match>
# object containing the substring
# matched by Nth repetition of
# the non-capturing bracket match


If an array alias is applied to a quantified pair of capturing parens
(i.e. to a subpattern), then the corresponding hash or array element is
assigned a list constructed by concatenating the array values of each
C<Match> object returned by one repetition of the subpattern. That is,
an array alias on a subpattern flattens and collects all nested
subpattern captures within the aliased subpattern. For example:

if m:w/ $<pairs>:=( (\w+) \: (\N+) ) / {

# Scalar alias, so $/<pairs> contains an array of Match objects,
# each of which has its own array of two subcaptures...

for @{$<pairs>} => $pair {
say "Key: $pair[0]";
say "Val: $pair[1]";
}
}


if m:w/ @<pairs>:=( (\w+) \: (\N+) ) / {
# Array alias, so $/<pairs> contains an array of Match objects,
# each of which is one of the two subcaptures within the
# subpattern, all flattened back into the outer array...

for @{$<pairs>} => $key, $val {
say "Key: $key";
say "Val: $val";
}
}

Likewise, if an array alias is applied to a quantified subrule, then the
hash or array element corresponding to the alias is assigned a list
containing the array values of each C<Match> object returned by each
repetition of the subrule, all flattened into a single array. That is,
an array alias on a subrule flattens and collects all the subpattern
captures that occurred within the aliased subrule. For example:

rule pair :w { (\w+) \: (\N+) }

if m:w/ $<pairs>:=<pair>+ / {
# Scalar alias, so $/<pairs> contains an array of Match objects,
# each of which is the result of the <pair> subrule call...

for @{$<pairs>} => $pair {
say "Key: $pair[0]";
say "Val: $pair[1]";
}
}


if m:w/ mv @<pairs>:=<pair>+ / {
# Array alias, so $/<pairs> contains an array of Match objects,
# each of which is one of the captures that occurred within the
# subrule, flattened back into the outer array...

for @{$<pairs>} => $key, $val {
say "Key: $key";
say "Val: $val";
}
}

In other words, an array alias is useful to flatten into a single array
any nested captures that might occur within a repeated subpattern or subrule.
Whereas a scalar alias is useful to preserve (within a top-level array)
the internal structure of each repetition.

Note that, outside a rule, C<< @<foo> >> is simply a shorthand for
C<< @{$<foo>} >>, so the above C<for> loop could also have been written:

for @<pairs> => $key, $val {
say "Key: $key";
say "Val: $val";
}


It is also possible to use a numbered variable as an array alias.
The semantics are exactly as described above, with the sole difference
being that the resulting array of C<Match> objects is assigned into the
appropriate element of the rule's match array, rather than to a key of
its match hash. For example:

if m/ mv \s+ @1:=((\w+) \s+)+ $2:=(\w+) / {
# | |
# | |
# | \___ Scalar alias, so $2 as normal
# |
# \___ Array alias, so $1 assigned a flattened array
# of just the (\w+) captures from each repetition

@from = @{$1};
$to = $2;
}

Note that, outside a rule, C<@1> is simply a shorthand for C<@{$1}>, so the
first assignment above could also have been written:

@from = @1;


=head3 Hash aliasing

An alias can also be specified using a hash as the alias variable,
instead of scalar or array. For example:

m:w/ mv %<location>:=( (<ident>) \: (\N+) )+ /;

A hash alias causes the correponding hash or array element in the
current scope's C<Match> object to be assigned a hash (rather than an
array or a single C<Match> object).

A hash alias cannot be applied to a quantified pair of non-capturing brackets.
Attempting to do so is a compile-time detectable error.

If a hash alias is applied to a pair of capturing parens (i.e. to
a subpattern), then the corresponding hash or array element is assigned a
hash. Each entry in that hash is constructed as follows:

=over

=item 1.

If the subpattern was unquantified, take the single C<Match> object it returns
and place it in an array. If the subpattern was quantified, take the array of
C<Match> objects it returns. Then, for each C<Match> object in the array...

=over

=item 1a.

Evaluate that C<Match> object as an array to produce a list.

=item 1b.

Use the first element of the list as the next key.

=item 1c.

Use the remaining element(s) of the list as the corresponding value(s).
If there are no remaining elements, the value is C<undef>.
If there is one remaining element, the value is that element.
If there are two or more remaining elements, the value is a reference to an
array containing those elements.

=back

=back

In other words, if a hash alias is applied to a subpattern, the first
pair of capturing parens within the subpattern provides the hash keys,
and the remaining capturing parens (if any) provide the corresponding
values. If the subpattern is unquantified then the resulting hash will
have only a single key; if the subpattern is quantified, the hash may
have multiple keys. For example:

# key val
# _^_ _^_
# | | | |
if m:w/ %<pairs>:=( (\w+) \: (\N+) )+ / {

# Hash alias, so $/<pairs> contains a hash, in which each key is
# provided by the first subcapture and each value is provided by
# the second...

for %{$/<pairs>} -> $pair { # Hash in list context produces pairs
say "Key: $pair.key";
say "Val: $pair.value";
}
}

If there are three or more captures within the aliased subpattern, the
second and subsequent captures are converted to an array:

# key val[0] val[1] val[2]
# _^_ _^_ _^_ _^_
# | | | | | | | |
if m:w/ %<synonyms>:=( (\w+) \: (\S+) (\S+) (\S+) )+ / {

# $/<synonyms> contains a hash, in which each key is provided by
# the first subcapture and each value is an array containing the
# second, third, and fourth subcaptures...

for %{$/<synonyms>} => $syn {
say "Key: $syn.key";
say "Vals: @{$syn.value}";
}
}

Note that, outside a rule, C<< %<foo> >> is a shortcut for C<< %{$/<foo>} >>,
so the previous C<for> loop could equally well have been written:

for %<synonyms> => $syn {
say "Key: $syn.key";
say "Vals: @{$syn.value}";
}


If a hash alias is applied to a subrule, then the corresponding hash or
array element is once again assigned a hash. Each entry in that hash is
constructed in exactly the same way as for a hash-aliased subpattern.

That is, the first subpattern capture within the subrule is used as each
key, and the remaining subpattern captures are used as the corresponding
values. For example:

rule one_to_one :w { (\w+) \: (\N+) }

if m:w/ %<pairs>:=<one_to_one>+ / {

# Hash alias, so $/<pairs> contains a hash, in which each key is
# provided by the first subcapture in <one_to_one> and each
# value is provided by the second subcapture within the
# subrule...

for %<pairs> -> $pair {
say "One: $pair.key";
say "One: $pair.value";
}
}

Likewise, if the subrule captures more than two subpatterns:

rule one_to_many :w { (\w+) \: (\S+) (\S+) (\S+) }

if m:w/ %<synonyms>:=<one_to_many>+ / {

# Hash alias, so $/<pairs> contains a hash, in which each key is
# provided by the first subcapture within C<one_to_many>, and
# each value is an array containing the subrule's second, third,
# and fourth subcaptures...

for %<pairs> -> $pair {
say "One: $pair.key";
say "Many: @{$pair.value}";
}
}


As with array aliases, it is also possible to use a numbered variable as
a hash alias. Once again, the only difference is where the resulting
C<Match> object is stored:

rule one_to_many :w { (\w+) \: (\S+) (\S+) (\S+) }

if m:w/ %1:=<one_to_many>+ / {
# $/[0] contains a hash, in which each key is provided by the
# first subcapture within C<one_to_many>, and each value is an
# array containing the subrule's second, third, and fourth
# subcaptures...

for %{$/[0]} -> $pair {
say "One: $pair.key";
say "Many: @{$pair.value}";
}
}

And, of course, outside the rule, C<%1> is a shortcut for C<%{$1}>:

for %1 => $pair {
say "One: $pair.key";
say "Many: @{$pair.value}";
}


=head3 External aliasing

As a final alternative, instead of using internal aliases like:

m/ mv @<files>:=<ident>+ $<dir>:=<ident> /

the name of an ordinary variable can be used as an "external alias", like so:

m/ mv @files:=<ident>+ $dir:=<ident> /

In this case, the behaviour of each alias is exactly as described in the
previous sections, except that the resulting capture(s) are assigned
directly to the variables of the specified name that exist in the scope
in which the rule declared. For example:

if m/ mv @files:=[ <ident> ]+ $dir:=<ident> / {
say "From: @files";
say " To: $dir";
}

Note that, because they bind statically to variables in the
I<declaration> scope, not dynamically to variables in the I<calling>
scope, external aliases are generally best used only in ad hoc pattern
matches like the one shown above. It is generally a Very Bad Idea to use
external aliases in a named rule. That's because, if that rule is
subsequently used as a subrule within a pattern match, the external
aliases will assign to variables in the scope where the rule was
I<declared>, not the scope in which it was I<used> as a subrule. For example:

grammar Shell::Commands {
rule mv { mv @files:=[ <ident> ]+ $dir:=<ident> }
}

if m/<Shell::Commands.mv>/ {
say "From: @files"; # Bzzzt! @Shell::Commands::files was set
say " To: $dir"; # Bzzzt! @Shell::Commands::dir was set
}

Internal aliases are a far better choice in such cases, unless you truly
want the subtle cross-scoping effect that is achieved:

grammar Shell::Commands {

my $lastcmd;

rule cmd { $/:=<mv> | $/:=<cp> }

rule mv { $lastcmd:=(mv) $<files>:=[ <ident> ]+ $<dir>:=<ident> }
rule cp { $lastcmd:=(cp) $<files>:=[ <ident> ]+ $<dir>:=<ident> }

sub lastcmd { return $lastcmd }
}

while shift ~~ m/<Shell::Commands.cmd>/ {
say "From: @{$<files>}";
say " To: $<dir>";
}

say "Final command was { Shell::Commands::lastcmd() }";

=head2 The C<:parsetree> flag

Normally, subrule calls capture by name to a hash entry of the scope's
C<Match> object, whilst subpatterns capture positionally to that object's
array element. Usually that's sufficient, since most coders only want to
access captures either sequentially (in which case they use subpatterns)
or symbolically (in which case the use subrules).

But a small number of implementers -- predominantly the writers of
compilers, translaters, code browsers, refactoring tools, etc.) need to
know both the order in which parts of a rule match I<and> the symbolic
names of those parts.

To support that, Perl 6 rules and matches can be specified with a
special flag: C<:parsetree>. Under this flag the capture behaviour of both
subpatterns and subrules alters from that described in the preceding sections.

Under C<:parsetree> the C<Match> objects generated by successful
subpatterns are still captured into the array of the surrounding scope's
C<Match> object, but now those objects not actually instances of class
C<Match>. Instead, they are blessed into a class derived from C<Match>:
C<Match::Subpattern>.

if ( m:parsetree/ (Volume\:) (\d+) / ) {
for @{$/}.kv -> $i, $cap {
when Match::Subpattern {
say "Node $i is a subpattern."
say "It captured: '$cap'";
}
say "";
}
}

which might print:

Node 0 is a subpattern.
It captured: 'Volume:'

Node 1 is a subpattern.
It captured: '11'


Under C<:parsetree>, the behaviour of subrules is changed even more
drastically. The C<Match> objects generated by successful subrules are
no longer assigned into the hash of the surrounding scope's C<Match>
object. Instead, they are appended (like subpatterns) onto the array of
surrounding scope's C<Match> object.

Moreover, the C<:parsetree> flag overrides the exemption of C<< «name» >>
subrule calls, so they act as if they were C<< <name> >> calls instead. They
generate C<Match> objects, and those objects are also appended onto the
surrounding scope's C<Match> array.

This is true even for automagically inserted non-capturing subrules,
such as the C<«ws»> calls inserted by the C<:words> flag.

In addition, each C<Match> object returned by a subrule is now blessed
into a class derived from the C<Match::Subrule> class (which itself is
derived from the C<Match> class). The actual name of the class into which
each subrule's C<Match::Subrule> object is blessed is the same as the name
of the subrule call that generated it.

So, for example:

if ( m:w:parsetree/ <label> <ident>/ ) {
for @{$/}.kv -> $i, $cap {
given $cap {
when Match::Subrule {
say "Node $i is a subrule named '$cap.class()'.";
say "It captured: '$cap'";
}
}
say "";
}
}

might print somthing like:

Node 0 is a subrule named 'ws'.
It captured: ''

Node 1 is a subrule named 'label'.
It captured: 'From:'

Node 2 is a subrule named 'ws'.
It captured: ' '

Node 3 is a subrule named 'ident'.
It captured: 'postmaster'


Note that, if a rule contains both subpattern and subrule captures, they will
be interleaved in the order in which they appear in the input, and can be
dealt with polymorphically. For example:

if ( m:w:parsetree/ (From\:) <ident>(\@\S+)/ ) {
for @{$/}.kv -> $i, $cap {
given ($cap) {
when Match::Subrule {
say "Node $i is a subrule named '$cap.class()'.";
say "It captured: '$cap'";
}
when Match::Subpattern {
say "Node $i is a subpattern.";
say "It captured: '$cap'";
}
}
say "";
}
}

which might print:

Node 0 is a subrule named 'ws'.
It captured: ''

Node 1 is a subpattern.
It captured: 'From:'

Node 2 is a subrule named 'ws'.
It captured: ' '

Node 3 is a subrule named 'ident'.
It captured: 'postmaster'

Node 4 is a subpattern.
It captured: '@perl.org'


Better still, because each C<Match>-derived object is blessed into a
particular class related to the subpattern or rule that created it, it's
easy to create handlers in those classes and make the processing fully
polymorphic (and far more specific):

method Match::Subpattern::describe ($self: $index) {
say "Node $index is a subpattern that matched: '$self'";
}

method ws::describe ($self: $index) {
say "Node $index is the whitespace: '$self'";
}

method ident::describe ($self: $index) {
say "Node $index is the identifier: '$self'.";
}


if ( m:w:parsetree/ (From\:) <ident>(\@\S+)/ ) {
my $i = 0;
.describe($i++) for @{$/};
}


which might then print:

Node 0 is the whitespace: ''
Node 1 is a subpattern that matched: 'From:'
Node 2 is the whitespace: ' '
Node 3 is the identifier: 'postmaster'
Node 4 is a subpattern that matched: '@perl.org'

One final feature of the C<:parsetree> flag is that it automatically
propagates to every subrule that a C<:parsetree>'d rule calls. And, from
there, recursively into any subrules that those subrules call. Et
cetera. Note that this will almost certainly require a one-time
recompilation of those subrules, unless they had originally been
specified with C<:parsetree> themselves, but that will be entirely
transparent to the user.

This propagation of the C<:parsetree> flag means that the C<Match> objects
returned by subrules will contain arrays with the same linearized,
objectified contents. Effectively, a C<:parsetree>'s rule will return an
array of arrays of arrays etc. corresponding to the hierarchical
structure of the data that the rule matched.

Which opens up the possibility of processing that data both
polymorphically I<and> hierarchically. For example, if we added:

# Factor out the ugly mail address matching...
rule mailaddr { <ident> \@ (\S+) }

# And specify how to describe the resulting data structure...
method mailaddr::describe ($self: $index) {
say "Node $index is a mail address, which consists of:";
my $subindex = 0;
temp wrap say { call "\t", @_ } # Indent when describing the bits...
.describe($index~'.'~$subindex++) for @{$self};
}

then we could update our original pattern match:

if ( m:w:parsetree/ (From\:) <mailaddr>/ ) {
my $i = 0;
.describe($i++) for @{$/};
}

The resulting syntax tree would now describe itself hierarchically:

Node 0 is the whitespace: ''
Node 1 is a subpattern that matched: 'From:'
Node 2 is the whitespace: ' '
Node 3 is a mail address, which consists of:
Node 3.1 is the identifier: 'postmaster'
Node 3.2 is a subpattern that matched: '@perl.org'


=head2 Capturing from non-singular matches

=head3 Matching under the C<:x> and C<:g> flags

When an entire rule is successfully matched with repetitions
(specified via the C<:x> and C<:g> flags), it often produces a series
of distinct matches.

However, a successful match under the these flags still returns a single
C<Match> object in C<$/>. But the values of this match object are slightly
different from a "one-ping-only" match:

=over

=item *

The boolean value of C<$/> after such matches is true or false, depending on
whether the pattern matched at all.

=item *

The integer value is the number of times the pattern matched.

=item *

The string value is the substring from the start of the first match to
the end of the last match (I<including> any intervening parts of the
string that the rule skipped over in order to find later matches).

=item *

There are no array contents or hash entries.

=back

For example:

if $text ~~ m:words:globally/ (\S+:) <rocks> / {
say "Matched {+$/} different ways";

say 'Full match context is:';
say $/;
}

The list of individual match objects corresponding to each separate
match is also available via the C<.matches> method. For example:

if $text ~~ m:words:globally/ (\S+:) <rocks> / {
for $/.matches -> $m {
say "Match between $m.from() and { $m.to()-1 }";
say 'Right on, dude!' if $m[0] eq 'Perl';
say "Rocks like $m<rocks>";
}
}


=head3 Matching under the C<:overlap> and C<:exhaustive> flags

Unlike the multiple matches of the C<:x> and C<:g> flags, success under
the C<:overlap> and C<:exhaustive> flags doesn't necessarily produce
a sequence of disjoint matches, but rather a disjunction of
alternative matches.

A successful match under the C<:overlap> or C<:exhaustive> flags still
returns a single C<Match> object in C<$/> (all matches do) and the C<.matches>
method of this object still returns all the distinct C<Match> objects for each
alternative match (in the order the matches were found).

But the values of the top-level C<Match> object returned by an overlapping or
exhaustive match are unusual:

=over

=item *

The boolean value of C<$/> after such matches is true or false, depending on
whether the pattern matched at all.

=item *

The integer value is the number of distinct ways in which the pattern matched.

=item *

The string value is a disjunction of all the distinct matches.

=item *

The array contents are a list of disjunctions of all the corresponding
unnamed captures from all the distinct matches. That is, C<$1> is a
disjunction of the C<$1> value of each of the successful matches that sets a
C<$1>.

=item *

The hash values are disjunctions of all the corresponding
named captures from all the distinct matches. That is, C<< $<foo> >> is
a disjunction of the C<< $<foo> >> value of each of the successful matches
that sets a C< $<foo> >>.

=back

For example:

if $text ~~ m:words:exhaustive/ (\S+:) <rocks> / {
say "Matched {+$/} different ways";

say 'Right on, dude!' if $1 eq 'Perl'; # Disjunctive match against
# all possible $1's from
# any of the exhaustive matches

say 'Found these variations on "rocks":';
say for $<rocks>.values; # List all possible substrings
# successfully matched by <rocks>
# in any of the exhaustive matches
}

As mentioned above, the individual match objects for each alternative
match are also available (in canonical order) via the C<.matches>
method. For example:

if $text ~~ m:words:exhaustive/ (\S+:) <rocks> / {
for $/.matches -> $m {
say 'Right on, dude!' if $m[0] eq 'Perl'; # Normal match against
# match $m's $1's

say "Rocks like $m<rocks>"; # Substring matched by <rocks>
# in match $m
}
}

=head2 Executive summary of proposed changes

=over

=item *

Angles create subrules, which return a C<Match> object that is
captured into the hash of their surrounding scope's C<Match> object.

=item *

Parens create subpatterns, which return a C<Match> object that is
captured into the array of their surrounding scope's C<Match> object.

=item *

A subpattern is like an inlined subrule (except that it captures
into an array, rather than a hash).

=item *

Subpatterns nest lexically, and the captures they return are likewise
hierarchical.

=item *

The number associated with a subpattern reflects its ordinal position in its
immediately surrounding scope, not its ordinal position in the overall rule.
As a result, these numbers are hierarchical, rather than linear.

=item *

Quantifiers (except C<?> and C<??>) cause a matched subrule or subpattern to
return an array of C<Match> objects, instead of just a single object.

=item *

Two or more calls to the same subrule or subpattern in the same lexical scope
also cause the matched subrules/subpatterns to accumulate their C<Match>
objects in an array.

=item *

Scalar aliases rename or renumber the construct they're applied to, changing
the location in which the construct's C<Match> object's is stored, but not its
captuing semantics.

=item *

Array aliases rename or renumber the construct they're applied to, and also
cause its corresponding C<Match> object(s) always to be returned in an array.

=item *

The elements of that array are a flattened list of the C<Match> objects
returned by the subpatterns nested inside the aliased construct.

=item *

Hash aliases rename or renumber the construct they're applied to, and also
cause its corresponding C<Match> object(s) always to be returned in a hash.

=item *

The keys of this hash are C<Match> objects returned by the the first
subpattern nested inside the aliased construct. The values are the C<Match>
objects returned by the remaining nested subpatterns.

=item *

The C<:parsetree> flag modifies capture semantics to preserve the parse
sequence, the identity information, and the hierarchical structure of
captures, whilst also supporting object-oriented processing of the
resulting parse tree.

=back

Patrick R. Michaud

unread,
May 9, 2005, 11:33:33 AM5/9/05
to Damian Conway, Autrijus Tang, perl6-language
Here's some more commentary to draft zero of the capturing semantics
(thanks, Damian!), based partially on PGE's current implementation.

On Mon, May 09, 2005 at 10:51:53PM +1000, Damian Conway wrote:
> [...]
> =head2 Nested subpattern captures
> [...]


> There may also be shortcuts for accessing nested components of a subpattern,
> specifically:
>
> # Perl 6...
> #
> # $1----------------------------- $2--------- $3--------------------
> # | $1.1 $1.2------------- | | | | $3.1 $3.2---- |
> # | | | | $1.2.1 | | | | | | | | | |
> # | | | | | | | | | | | | | | | |
> m/ ( The (\S+) (guy|gal|g(\S+) ) ) (sees|calls) ( the (\S+) (gal|guy)
> ) /;
>
> but this has not yet been decided.

After thinking on this a bit, I'm hoping we don't do this -- at least not
initially. I'm not sure there's a lot of advantage of C< $1.1 > over
C< $1[0] >, and one starts to wonder about things like $1.$j.2 and
$1[$j].2 and the like.

> =head2 Quantified subpattern captures
> [...]


> If a subpattern is directly quantified using the C<?> or C<??> quantifier,
> it produces a single C<Match> object. That object is "successful" if the
> subpattern did match, and "unsuccessful" if it was skipped.

I'm not sure that PGE has these exact semantics for C<?> yet -- I'll have
to check.

> =head2 Indirectly quantified subpattern captures

> [...]


> A subpattern may sometimes be nested inside a quantified non-capturing
> structure:
>
> # non-capturing quantified
> # __________/\_________ __/\__
> # | || |
> # | $1 $2 || |
> # | _^_ ___^___ || |
> # | | | | | || |
> m/ [ (\w+) \: (\w+ \s+)* ]**{2...} /
>

> [...] In Perl 5, any repeated captures of this kind:


>
> # Perl 5 equivalent...
> m/ (?: (\w+) \: (\w+ \s+)* ){2,} /x
>
> would overwrite the previous captures to C<$1> and C<$2> each time the
> surrrounding non-capturing parens iterated. So C<$1> and C<$2> would
> contain only the captures from the final repetition.
>
> This does not happen in Perl 6. Any indirectly quantified subpattern is
> treated like a directly quantified subpattern. Specifically, an
> indirectly quantified subpattern also returns an array of C<Match>
> objects, so the corresponding array element for the indirectly
> quantified capture will store an array reference, rather than a single
> C<Match> object.

It might be worthwhile to add a note here that one can still get
at the results of the final repetition by using $1[-1] and $2[-1].

> =head2 Subpattern numbering
> [...]


> Of course, the leading C<undef>s that Perl 5 would produce do convey
> (albeit awkwardly) which alternative actually matched. If that
> information is important, Perl 6 has several far cleaner ways to
> preserve it. For example:
>
> rule alt (Str $n) { {$/ = $n} }
>
> m/ <alt tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
> | <alt BEM> (every) (green) (BEM) (devours) (faces)
> /;

If the C< alt > rule is accepting a string argument, the match
statement probably needs to read

m/ <alt: tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
| <alt: BEM> (every) (green) (BEM) (devours) (faces)
/;

> =head2 Repeated captures of the same subrule
>

> =head3 Scalar aliases applied to quantified constructs

> [...]


> A set of quantified I<non-capturing> brackets always returns a
> single C<Match> object which contains only the complete substring
> that was matched by the full set of repetitions of the brackets (as
> described in L<Named scalar aliases applied to non-capturing brackets>).

At present, PGE isn't working this way -- aliased quantified non-capturing
brackets returns an array of match objects, same as other quantified
structures. This can be changed, but I kind of like the consistency
that results --

"coffee fifo fumble" ~~ m/ .*? $<effs>:=[f <-[f]>**{1..2} \s*]+ /;

PGE currently gives $<effs> an array of matches, same as for the
other capturing constructs. If someone wants to capture the full
set, it's easy enough to do

"coffee fifo fumble" ~~ m/ .*? $<effs>:=[ [f <-[f]>**{1..2} \s*]+ ] /;

and it's pretty clear what was intended.

> =head3 Array aliasing
> =head3 Hash aliasing
> =head3 External aliasing


> =head2 The C<:parsetree> flag

> etc.

At the moment PGE doesn't support these, and probably won't until
they're actually needed in the course of developing the compiler
(or until someone adds them).

> [...]


> Moreover, the C<:parsetree> flag overrides the exemption of C<< «name» >>
> subrule calls, so they act as if they were C<< <name> >> calls instead. They
> generate C<Match> objects, and those objects are also appended onto the
> surrounding scope's C<Match> array.

Do we still have the C<< «name» >> syntax for rules? S05 doesn't
mention it, A05 mentions it as a non-capturing subrule but I think
we've since changed to C<< <?name> >> instead. If we don't have
C<< «name» >> I'll adjust S05/A05 accordingly.

Pm

Paul Seamons

unread,
May 9, 2005, 11:47:14 AM5/9/05
to perl6-l...@perl.org
> =item *
>
> Quantifiers (except C<?> and C<??>) cause a matched subrule or subpattern to
> return an array of C<Match> objects, instead of just a single object.

What is the effect of the quantifiers C<**{0,1}> and C<**{0,1}?> ? Will they
behave like ? and ?? and return a single object - or will they cause the
quantified subrule or subpattern to return as an array of C<Match> objects?

Paul

Patrick R. Michaud

unread,
May 9, 2005, 12:02:58 PM5/9/05
to Paul Seamons, perl6-l...@perl.org

First, I much prefer an alternate wording to Damian's:

The C<*>, C<+>, and C<**{...}> quantifiers all produce an array
of C<Match> objects instead of just a single object.

To answer your question, C<**{0..1}> always produces an array of
Match objects (think C<**{$m..$n}> where $m and $n may not be
immediately known), while C<?> always produces a single Match object.

Both C<**{0..1}> and C<?> will match "zero or one occurrence" of the
thing being quantified, but a non-matching C<**{0..1}> results in
a zero-length array, while a non-matching C<?> results in an
"unsuccessful" Match object.

Pm

Patrick R. Michaud

unread,
May 9, 2005, 12:19:21 PM5/9/05
to Paul Seamons, perl6-l...@perl.org
On Mon, May 09, 2005 at 11:02:58AM -0500, Patrick R. Michaud wrote:
> On Mon, May 09, 2005 at 09:47:14AM -0600, Paul Seamons wrote:
> > > =item *
> > >
> > > Quantifiers (except C<?> and C<??>) cause a matched subrule or subpattern to
> > > return an array of C<Match> objects, instead of just a single object.
> >
> First, I much prefer an alternate wording to Damian's:
>
> The C<*>, C<+>, and C<**{...}> quantifiers all produce an array
> of C<Match> objects instead of just a single object.

Perhaps better is:

The C<*>, C<+>, and C<**{...}> (but not C<?> or C<??>) quantifiers


cause a matched subrule or subpattern to return an array of C<Match>
objects, instead of just a single object.

And since I've noticed that a lot of people who see this document
end up asking about the relationship between C<?> and C<**{0..1}>,
perhaps we should just put an explicit note in there somewhere
about it. For example, at the end of the section we could say
something like:

Note that the C<?> and C<**{0..1}> both mean "match zero or one
occurrence", but C<?> always produces a single C<Match> object
(which may be an unsuccessful match) and C<**{0..1}> always
produces an array of C<Match> objects (which will likely be
empty for an unsuccessful match).

Pm

Larry Wall

unread,
May 9, 2005, 12:14:02 PM5/9/05
to perl6-language
On Mon, May 09, 2005 at 10:33:33AM -0500, Patrick R. Michaud wrote:
: > =head2 Subpattern numbering

: > [...]
: > Of course, the leading C<undef>s that Perl 5 would produce do convey
: > (albeit awkwardly) which alternative actually matched. If that
: > information is important, Perl 6 has several far cleaner ways to
: > preserve it. For example:
: >
: > rule alt (Str $n) { {$/ = $n} }
: >
: > m/ <alt tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
: > | <alt BEM> (every) (green) (BEM) (devours) (faces)
: > /;
:
: If the C< alt > rule is accepting a string argument, the match
: statement probably needs to read
:
: m/ <alt: tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
: | <alt: BEM> (every) (green) (BEM) (devours) (faces)
: /;

This seems like a rather ugly syntax for what is essentially a label,
or a <null> rule. I wonder if we can come up with something a little
prettier. Something like:

m/ <null:tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
| <null:BEM> (every) (green) (BEM) (devours) (faces)
/;

m/ <tea:=> (don't) (ray) (me) (for) (solar tea), (d'oh!)
| <BEM:=> (every) (green) (BEM) (devours) (faces)
/;

m/ <:tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
| <:BEM> (every) (green) (BEM) (devours) (faces)
/;

or even plain label syntax:

m/ tea: (don't) (ray) (me) (for) (solar tea), (d'oh!)


| BEM: (every) (green) (BEM) (devours) (faces)
/;

if we recognize that : makes no sense as a backtrack control on a
non-quantified item.

Larry

Patrick R. Michaud

unread,
May 9, 2005, 12:36:37 PM5/9/05
to perl6-language
On Mon, May 09, 2005 at 09:14:02AM -0700, Larry Wall wrote:
> : m/ <alt: tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
> : | <alt: BEM> (every) (green) (BEM) (devours) (faces)
> : /;
>
> This seems like a rather ugly syntax for what is essentially a label,
> or a <null> rule. I wonder if we can come up with something a little
> prettier.

I wonder if it's deserving of much in the way of special syntax at all,
given that we have a variety of ways to do it (closures come to mind).
In the example above, one could just as easily test $1 for "don't" vs.
"every" to figure out which alternation matched. Indeed, a simple answer
is:

m/ $<tea>:=<null> (don't) (ray) (me) (for) (solar tea), (d'oh!)
| $<bem>:=<null> (every) (green) (BEM) (devours) (faces)
/;

and then

if ($/<tea>) { say "I hate solar tea" }
if ($/<bem>) { say "I love bug-eyed monsters" }

But from your examples:

> m/ <null:tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
> | <null:BEM> (every) (green) (BEM) (devours) (faces)
> /;

Hmm, capturing to $<null> seems odd.

> m/ <tea:=> (don't) (ray) (me) (for) (solar tea), (d'oh!)
> | <BEM:=> (every) (green) (BEM) (devours) (faces)
> /;

Please, not this one -- it looks too much like a subrule call to
tea("=") (from A05).

> m/ <:tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
> | <:BEM> (every) (green) (BEM) (devours) (faces)
> /;

This one has possibilities. It looks like a generalization of
pair constructors though, so one could also conceivably do things
like <:tea(0)> and <:tea('foo')>. With that one could then write

m/ <:alt('tea')> (don't) (ray) (me) (for) (solar tea), (d'oh!)
| <:alt('BEM')> (every) (green) (BEM) (devours) (faces)
/;

and have

given $<alt> {
when 'tea' { say "I hate solar tea" }
when 'BEM' { say "I love bug-eyed monsters" }
}

> or even plain label syntax:


>
> m/ tea: (don't) (ray) (me) (for) (solar tea), (d'oh!)
> | BEM: (every) (green) (BEM) (devours) (faces)
> /;
>
> if we recognize that : makes no sense as a backtrack control on a
> non-quantified item.

This sounds too "special-case" to me. Also, I think it does make
sense to backtrack control on non-quantified subrules and subpatterns,
so we'd have to say that : has this meaning only after a non-quantified
literal. I feel there are too many other good ways to do it to
add this one.

Pm

Larry Wall

unread,
May 9, 2005, 12:03:47 PM5/9/05
to perl6-l...@perl.org
On Mon, May 09, 2005 at 09:47:14AM -0600, Paul Seamons wrote:
: > =item *

: >
: > Quantifiers (except C<?> and C<??>) cause a matched subrule or subpattern to
: > return an array of C<Match> objects, instead of just a single object.
:
: What is the effect of the quantifiers C<**{0,1}> and C<**{0,1}?> ?

That would be **{0..1} and **{0..1}? actually, since we're treating ..
as a real range and {} as a real closure. (Though they're presumably
optimized away for constant ranges.)

: Will they

: behave like ? and ?? and return a single object - or will they cause the
: quantified subrule or subpattern to return as an array of C<Match> objects?

The latter. In the abstract it would be nice to define ? in terms of
**{0..1}, but the simple fact is that we can't afford to let **{$n..$m}
change the form of its return value merely because $n just happens
to be 0 and $m just happens to be 1. And that actually makes the
distinction between ? and **{0..1} more useful, insofar as it lets you
specify which form of return you want.

Larry

Uri Guttman

unread,
May 9, 2005, 1:53:47 PM5/9/05
to Patrick R. Michaud, Damian Conway, Autrijus Tang, perl6-language
>>>>> "PRM" == Patrick R Michaud <pmic...@pobox.com> writes:

PRM> After thinking on this a bit, I'm hoping we don't do this -- at least not
PRM> initially. I'm not sure there's a lot of advantage of C< $1.1 > over
PRM> C< $1[0] >, and one starts to wonder about things like $1.$j.2 and
PRM> $1[$j].2 and the like.

i would say that you can use .1 only when all the indexes are literals
like $1.2.1. anything else must be a proper index expression on $1 like
$1[$j][1].

mixing those would scare me more than anything and it isn't much of a
hardship to use the full expression form when you need it. also in perl,
indexing isn't used nearly as often as for loops, so if you did grab
something that was in a array in some match value, you would more likely
loop over it than index into it. so again, the hardship of the index
syntax isn't a big deal as it should be rarely needed.

just my $1.02. :)

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Larry Wall

unread,
May 9, 2005, 2:34:10 PM5/9/05
to perl6-language
On Mon, May 09, 2005 at 10:33:33AM -0500, Patrick R. Michaud wrote:
: After thinking on this a bit, I'm hoping we don't do this -- at least not

: initially. I'm not sure there's a lot of advantage of C< $1.1 > over
: C< $1[0] >, and one starts to wonder about things like $1.$j.2 and
: $1[$j].2 and the like.

Or maybe it should generalize the other direction. We just got through
the great bracket shift to make $x<a> mean $x{'a'} so that we can recognize
constant hash subscripts easily. Maybe $x.1 is just the numeric analog
of that. And $x.$j.2 could just fall out of that, where the indirect
method dispatcher knows to turn a numeric method name into a subscript.

Larry

Larry Wall

unread,
May 9, 2005, 3:14:35 PM5/9/05
to perl6-language
On Mon, May 09, 2005 at 02:08:31PM -0500, Patrick R. Michaud wrote:
: Hmmm, then would $x.$j.2 then be equivalent to $x[$j-1][1] ?

Ouch.

Larry

Patrick R. Michaud

unread,
May 9, 2005, 3:08:31 PM5/9/05
to perl6-language

Hmmm, then would $x.$j.2 then be equivalent to $x[$j-1][1] ?

Pm

Uri Guttman

unread,
May 9, 2005, 4:47:37 PM5/9/05
to perl6-language
>>>>> "LW" == Larry Wall <la...@wall.org> writes:

LW> On Mon, May 09, 2005 at 12:14:35PM -0700, Larry Wall wrote:
LW> : On Mon, May 09, 2005 at 02:08:31PM -0500, Patrick R. Michaud wrote:
LW> : : Hmmm, then would $x.$j.2 then be equivalent to $x[$j-1][1] ?
LW> :
LW> : Ouch.

LW> Maybe that's a good reason to switch from 1-based to 0-based
LW> $<digit> vars. Not sure what that would do to the current $0 though.
LW> Most of the time $/ can stand in for it, I guess, though s/.../$//
LW> is visually problematic. We could maybe resurrect $&.

or do what i mentioned, not allow mixing of the two styles of match
access. i don't see any real win for mixing them. indexing into matched
arrays will not be so common to deserve conflating the 0 and 1 based
indexing as well as the notations. leave it with $1.1 and $1[0] as being
the two styles. you must use literal integers with the former and it is
1 based. you can use any expressions with the latter and it is 0
based. by allowing $1[$j].1 you save only 1 char over $1[$j][0] and
would cause major confusion IMO.

Larry Wall

unread,
May 9, 2005, 4:30:27 PM5/9/05
to perl6-language
On Mon, May 09, 2005 at 12:14:35PM -0700, Larry Wall wrote:
: On Mon, May 09, 2005 at 02:08:31PM -0500, Patrick R. Michaud wrote:
: : Hmmm, then would $x.$j.2 then be equivalent to $x[$j-1][1] ?
:
: Ouch.

Maybe that's a good reason to switch from 1-based to 0-based


$<digit> vars. Not sure what that would do to the current $0 though.

Most of the time $/ can stand in for it, I guess, though s/.../$//

is visually problematic. We could maybe resurrect $&.

Larry

Mark A Biggar

unread,
May 9, 2005, 4:43:39 PM5/9/05
to Larry Wall, perl6-language
Can I say $*1, $*2, etc, to get perl5 flattened peren counting captures? We need something like that to make perl5->perl6 translation easier; otherwise we'd have to parse perl5 RE instead of just slapping on a ":p5". Unless ":p5" also means that you get a single already fattened match objct.

--
Mark Biggar
ma...@biggar.org
mark.a...@comcast.net
mbi...@paypal.com


> On Mon, May 09, 2005 at 02:08:31PM -0500, Patrick R. Michaud wrote:
> : Hmmm, then would $x.$j.2 then be equivalent to $x[$j-1][1] ?
>
> Ouch.
>

> Larry

Patrick R. Michaud

unread,
May 9, 2005, 7:52:17 PM5/9/05
to mark.a...@comcast.net, Larry Wall, perl6-language
On Mon, May 09, 2005 at 08:43:39PM +0000, mark.a...@comcast.net wrote:
> Can I say $*1, $*2, etc, to get perl5 flattened peren counting captures?
> We need something like that to make perl5->perl6 translation easier;
> otherwise we'd have to parse perl5 RE instead of just slapping on a ":p5".
> Unless ":p5" also means that you get a single already fattened match objct.

PGE will have a perl5 RE parser built in to handle the ":p5"
option, and it will return match objects according to perl 5's
capture semantics with no nesting (i.e., counting left parens).

I will probably start a perl 5 RE parser just to get it started, but
after that I'd prefer to turn it over to someone else to maintain.
Note that PGE will already provide the matching engine itself --
we simply need something that converts p5 expression trees into
PGE's expression trees.

In addition, I'm planning to write a glob/wildcard parser, which does
matches based on Unix filename globbing syntax.

Pm

Uri Guttman

unread,
May 10, 2005, 1:55:47 AM5/10/05
to Damian Conway, Autrijus Tang, Patrick R. Michaud, perl6-language
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:


DC> grammar Shell::Commands {

DC> my $lastcmd;

DC> rule cmd { $/:=<mv> | $/:=<cp> }

DC> rule mv { $lastcmd:=(mv) $<files>:=[ <ident> ]+ $<dir>:=<ident> }
DC> rule cp { $lastcmd:=(cp) $<files>:=[ <ident> ]+ $<dir>:=<ident> }

DC> sub lastcmd { return $lastcmd }
DC> }

DC> while shift ~~ m/<Shell::Commands.cmd>/ {
DC> say "From: @{$<files>}";
DC> say " To: $<dir>";
DC> }

since files and dirs are internal aliases (their names are in <>),
shouldn't those match accesses be $/<files> and $/<dir>?

Damian Conway

unread,
May 9, 2005, 6:52:22 PM5/9/05
to Larry Wall, perl6-language
Larry Wall wrote:
> On Mon, May 09, 2005 at 12:14:35PM -0700, Larry Wall wrote:
> : On Mon, May 09, 2005 at 02:08:31PM -0500, Patrick R. Michaud wrote:
> : : Hmmm, then would $x.$j.2 then be equivalent to $x[$j-1][1] ?
> :
> : Ouch.
>
> Maybe that's a good reason to switch from 1-based to 0-based
> $<digit> vars. Not sure what that would do to the current $0 though.
> Most of the time $/ can stand in for it, I guess, though s/.../$//
> is visually problematic. We could maybe resurrect $&.

Actually, I'd be just as happy *not* to have $1.1.1 at all. There's no real
win over $1[0][0], and I think it would be better to leave the multi-dot
syntax to be visually unambiguous as a version number.

Damian

Ph. Marek

unread,
May 10, 2005, 2:47:35 AM5/10/05
to Autrijus Tang, perl6-language
On Monday 09 May 2005 19:36, Autrijus Tang wrote:
> On Mon, May 09, 2005 at 10:51:53PM +1000, Damian Conway wrote:
> > Autrijus wrote:
> > >/me eagerly awaits new revelation from Damian...
> >
> > Be careful what you wish for. Here's draft zero. ;-)
>
> ...and here is my status report of the Zero-Day exploit, err,
> implementation, in Pugs. :-)
That's .... great.
I'm just waiting for the next time, when you announce the implementation
before the draft.

I'm really looking forward to meet you in Vienna next month.


Regards,

Phil

Damian Conway

unread,
May 9, 2005, 6:52:59 PM5/9/05
to Patrick R. Michaud, perl6-language
Patrick R. Michaud wrote:

> On Mon, May 09, 2005 at 09:14:02AM -0700, Larry Wall wrote:
>
>>: m/ <alt: tea> (don't) (ray) (me) (for) (solar tea), (d'oh!)
>>: | <alt: BEM> (every) (green) (BEM) (devours) (faces)
>>: /;
>>
>>This seems like a rather ugly syntax for what is essentially a label,
>>or a <null> rule. I wonder if we can come up with something a little
>>prettier.
>
> I wonder if it's deserving of much in the way of special syntax at all,
> given that we have a variety of ways to do it (closures come to mind).
> In the example above, one could just as easily test $1 for "don't" vs.
> "every" to figure out which alternation matched. Indeed, a simple answer
> is:
>
> m/ $<tea>:=<null> (don't) (ray) (me) (for) (solar tea), (d'oh!)
> | $<bem>:=<null> (every) (green) (BEM) (devours) (faces)
> /;
>
> and then
>
> if ($/<tea>) { say "I hate solar tea" }
> if ($/<bem>) { say "I love bug-eyed monsters" }

Yes, I think this is the right answer. Much better not to multiply entities
without necessity.

Well done, Patrick!

Damian

Damian Conway

unread,
May 10, 2005, 6:58:34 AM5/10/05
to Uri Guttman, Autrijus Tang, Patrick R. Michaud, perl6-language
> DC> rule mv { $lastcmd:=(mv) $<files>:=[ <ident> ]+ $<dir>:=<ident> }
> DC> rule cp { $lastcmd:=(cp) $<files>:=[ <ident> ]+ $<dir>:=<ident> }
>
> DC> sub lastcmd { return $lastcmd }
> DC> }
>
> DC> while shift ~~ m/<Shell::Commands.cmd>/ {
> DC> say "From: @{$<files>}";
> DC> say " To: $<dir>";
> DC> }
>
> since files and dirs are internal aliases (their names are in <>),
> shouldn't those match accesses be $/<files> and $/<dir>?

Sure. Just as $42 is a shorthand for $/[42], so too $<whatever> is a shorthand
for $/<whatever>.

Damian

Aaron Crane

unread,
May 10, 2005, 7:16:55 AM5/10/05
to perl6-l...@perl.org
Damian Conway writes:
> Just as $42 is a shorthand for $/[42], so too $<whatever> is a
> shorthand for $/<whatever>.

Isn't $42 a shorthand for $/[41] ?

I think that having 1-based digit-variables but 0-based array indexes on
$/ is really confusing; mistakes of this sort seem to confirm my view.

--
Aaron Crane

Luke Palmer

unread,
May 10, 2005, 7:36:55 AM5/10/05
to perl6-l...@perl.org

Yeah, I'm pretty sure they should be consolidated somehow. Of course
we could make $# zero-based. But there are other possibilities:

* $/[0] is like $&, the full text of the match
* $/[0] is the name of the matching rule, like in P::RD

The latter isn't applicable everywhere, but I have to say that it was
pretty handy. But I'm not sure I ever used its zeroth position to my
advantage, so it would probably be better of a method on $/.

The former case is actually quite elegant if we extend it to the new
matching semantics. Consider:

"foobarbaz" ~~ / (foo (bar)) (baz) /

Then:

$/[0] eq "foobarbaz";
$/[1] is a match object
$/[1][0] eq "foobar"
$/[1][1][0] eq "bar"
$/[2][0] eq "baz"

So now we have the strigification behavior on $/: it just returns its
zeroth element. The current meaning of $0 is now consistent and means
exactly the same thing it used to.

But I'm in the middle of a movie, so I haven't really thought through
the rest of it clearly. I think it breaks down somewhat in the
presence of quantifiers.

Luk

Uri Guttman

unread,
May 10, 2005, 9:52:50 AM5/10/05
to Damian Conway, Autrijus Tang, Patrick R. Michaud, perl6-language
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:

DC> rule mv { $lastcmd:=(mv) $<files>:=[ <ident> ]+ $<dir>:=<ident> }
DC> rule cp { $lastcmd:=(cp) $<files>:=[ <ident> ]+ $<dir>:=<ident> }
DC> sub lastcmd { return $lastcmd }
DC> }
DC> while shift ~~ m/<Shell::Commands.cmd>/ {
DC> say "From: @{$<files>}";
DC> say " To: $<dir>";
DC> }
>> since files and dirs are internal aliases (their names are in <>),
>> shouldn't those match accesses be $/<files> and $/<dir>?

DC> Sure. Just as $42 is a shorthand for $/[42], so too $<whatever> is a
DC> shorthand for $/<whatever>.

but then what about the different index bases for $42 and $/[42]? i
don't think that has been resolved (nor has mixing the $1.1 and $1[1]
syntaxes).

Damian Conway

unread,
May 11, 2005, 3:48:59 AM5/11/05
to perl6-language
Uri Guttman wrote:

> DC> Sure. Just as $42 is a shorthand for $/[42], so too $<whatever> is a
> DC> shorthand for $/<whatever>.
>
> but then what about the different index bases for $42 and $/[42]? i
> don't think that has been resolved (nor has mixing the $1.1 and $1[1]
> syntaxes).

Bear in mind that that reply was posted in haste, late at night, after a long
day of teaching. We're lucky it as only off by one! %-)

But it does raise an important point: the discrepancy between $42 and $/[41]
*is* a great opportunity for off-by-on errors. Previously, however, @Larry
have tossed back and forth the possibility of using $0 as the first capture
variable so that the indices of $/[0], $/[1], $/[2] match up with the "names"
of $0, $1, $2, etc.

I think this error--unintentional, I swear!--argues strongly that internal
consistency within Perl 6 is more important than historical consistency with
Perl 5's $1, $2, $3...

But that's only the opinion of one(@Larry), not of $Larry.

Damian

Thomas Sandlaß

unread,
May 11, 2005, 4:13:19 AM5/11/05