Numification of captured match

1 view
Skip to first unread message

Autrijus Tang

unread,
May 12, 2005, 3:23:20 PM5/12/05
to perl6-l...@perl.org
Thit has led to surprising results in Pugs's Net::IRC:

if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
my $socket = connect($0, $1);
}

If $1 is a match object here, and connect() assumes Int on its second
argument, then it will connect to port 1, as the match object numifies
to 1 (indicating a successful match).

I "fixed" this for 6.2.3 by flattening $0, $1, $2 into plain scalars
(for nonquantified matches), and use $/[0] etc to store match objects,
but I'm not sure this treatment is right.

Is it really intended that we get into habit of writing this?

if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
my $socket = connect(~$0, +$1);
}

It looks... weird. :)

Thanks,
/Autrijus/

Patrick R. Michaud

unread,
May 12, 2005, 3:55:36 PM5/12/05
to Autrijus Tang, perl6-l...@perl.org
On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
> Is it really intended that we get into habit of writing this?
>
> if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> my $socket = connect(~$0, +$1);
> }
>
> It looks... weird. :)

And it would have to be

if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {

my $socket = connect(~$0, ~$1);
}

because +$1 still evaluates to 1. (The ~ in front of $0 is
probably optional.)

My suggestion is that a match object in numeric context is the
same as evaluating its string value in a numeric context. If
we need a way to find out the number of match repetitions (what
the numeric context was intended to provide), it might be better
done with an explicit C<.matchcount> method or something like that.

Pm

Jonathan Scott Duff

unread,
May 12, 2005, 10:31:52 PM5/12/05
to Patrick R. Michaud, Autrijus Tang, perl6-l...@perl.org
On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:
> On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
> > Is it really intended that we get into habit of writing this?
> >
> > if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> > my $socket = connect(~$0, +$1);
> > }
> >
> > It looks... weird. :)
>
> And it would have to be
>
> if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> my $socket = connect(~$0, ~$1);
> }
>
> because +$1 still evaluates to 1.

That's some subtle evil.

> My suggestion is that a match object in numeric context is the
> same as evaluating its string value in a numeric context.

While I agree that this would be the right behavior it still feels
special-casey, hackish and wrong.

If, as an optimization, you could tell PGE that you didn't need Match
objects and only cared about the string results of your captures, that
might be better. For instance,

if 'localhost:80' ~~ m:s/^(.+)\:(\d+)$/ {
my $socket = connect($0, $1);
}
:s for :string (assuming that hasn't already been taken)

> If
> we need a way to find out the number of match repetitions (what
> the numeric context was intended to provide), it might be better
> done with an explicit C<.matchcount> method or something like that.

Surely that would just be +@{$1}? Or have I crossed the perl[56]
streams again?

-Scott
--
Jonathan Scott Duff
du...@pobox.com

Rob Kinyon

unread,
May 12, 2005, 10:39:55 PM5/12/05
to du...@pobox.com, Patrick R. Michaud, Autrijus Tang, perl6-l...@perl.org
On 5/12/05, Jonathan Scott Duff <du...@pobox.com> wrote:
> On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:
> > On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
> > > Is it really intended that we get into habit of writing this?
> > >
> > > if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> > > my $socket = connect(~$0, +$1);
> > > }
> > >
> > > It looks... weird. :)
> >
> > And it would have to be
> >
> > if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> > my $socket = connect(~$0, ~$1);
> > }
> >
> > because +$1 still evaluates to 1.
>
> That's some subtle evil.
>
> > My suggestion is that a match object in numeric context is the
> > same as evaluating its string value in a numeric context.
>
> While I agree that this would be the right behavior it still feels
> special-casey, hackish and wrong.
>
> If, as an optimization, you could tell PGE that you didn't need Match
> objects and only cared about the string results of your captures, that
> might be better. For instance,
>
> if 'localhost:80' ~~ m:s/^(.+)\:(\d+)$/ {
> my $socket = connect($0, $1);
> }
> :s for :string (assuming that hasn't already been taken)

What about the fact that anything matching (\d+) is going to be an Int
and anything matching (.+) is going to be a String, and so forth.
There is sufficient information in the regex for P6 to know that $0
should smart-convert into a String and $1 should smart-convert into a
Int. Can't we just do that?

Rob

Larry Wall

unread,
May 12, 2005, 11:10:42 PM5/12/05
to perl6-l...@perl.org
On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:

I think we already said something like that once some number of
months ago. +$1 simply has to be the numeric value of the match.
It's not as much of a problem as a Perl 5 programmer might think,
since ?$1 is still true even if +$1 is 0. Anyway, while we could have
a method for the .matchcount, +$1[] should work fine too. And maybe
even +@$1, presuming that "a match object can function as an array"
actually means "a match object knows when it's being asked to supply
an array reference".

Actually, it's not clear to me offhand why @1 shouldn't mean $1[]
and %1 shouldn't mean $1{}.

Larry

Patrick R. Michaud

unread,
May 13, 2005, 12:46:45 AM5/13/05
to perl6-l...@perl.org
On Thu, May 12, 2005 at 08:10:42PM -0700, Larry Wall wrote:
> On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:
> : On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
> : > Is it really intended that we get into habit of writing this?
> : >
> : > if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> : > my $socket = connect(~$0, +$1);
> : > }
> : >
> : > It looks... weird. :)
> :
> : And it would have to be
> :
> : if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> : my $socket = connect(~$0, ~$1);
> : }
> :
> : because +$1 still evaluates to 1. (The ~ in front of $0 is
> : probably optional.)
> :
> : My suggestion is that a match object in numeric context is the
> : same as evaluating its string value in a numeric context. If
> : we need a way to find out the number of match repetitions (what
> : the numeric context was intended to provide), it might be better
> : done with an explicit C<.matchcount> method or something like that.
>
> I think we already said something like that once some number of
> months ago.

I guess I've been led astray (or downright confused) by the capture
specs then, when it says:

A successful match returns a C<Match> object whose boolean value is
true, whose integer value is typically 1 (except under the C<:g> or
C<:x> flags; see L<Capturing from non-singular matches>), whose string
value is the complete substring that was matched by the entire rule,
whose array component contains all subpattern (unnamed) captures, and
whose hash component contains all subrule (named) captures.

and later

If an named scalar alias is applied to a set of non-capturing
brackets:
m:w/ $<key>:=[ (<[A-E]>) (\d**{3..6}) (X?) ] /;
then the corresponding entry in the rule's hash is assigned a
C<Match> object whose:
* Boolean value is true,
* Integer value is 1,
* String value is the complete substring matched by the
contents of the square brackets,
* Array and hash are both empty.

and under the :g option...

if $text ~~ m:words:globally/ (\S+:) <rocks> / {
say "Matched {+$/} different ways";
say 'Full match context is:';
say $/;
}

So, are the Match objects returned from subpattern captures
treated differently in numeric context than the Match objects
coming from named scalar aliases or the match itself... ?

> It's not as much of a problem as a Perl 5 programmer might think,
> since ?$1 is still true even if +$1 is 0. Anyway, while we could have
> a method for the .matchcount, +$1[] should work fine too.

With .matchcount I wasn't concerned about the number of repetitions
stored in $1 -- I was trying to get at the numeric value that $/
would've returned under the :g option. But in re-reading the draft
of the :globally option I see we already have one --
C< $/.matches > in numeric context should supply it for us.

So I'm guessing that we're all in agreement that +$/, +$1, and
+$<subrule> all refer to the numeric value of the string matched,
as opposed to what's currently written about their values in the
draft...? Or am I still missing the picture entirely?

Pm

Damian Conway

unread,
May 13, 2005, 12:00:10 AM5/13/05
to Larry Wall, perl6-l...@perl.org
Larry Wall wrote:

> I think we already said something like that once some number of
> months ago. +$1 simply has to be the numeric value of the match.

Agreed.


> Anyway, while we could have
> a method for the .matchcount, +$1[] should work fine too.

Yep.


> Actually, it's not clear to me offhand why @1 shouldn't mean $1[]
> and %1 shouldn't mean $1{}.

It *does*. According to the recent capture semantics document:

> Note that, outside a rule, C<@1> is simply a shorthand for C<@{$1}>,

and:

> And, of course, outside the rule, C<%1> is a shortcut for C<%{$1}>:


Damian

Larry Wall

unread,
May 13, 2005, 12:13:14 AM5/13/05
to perl6-l...@perl.org
On Fri, May 13, 2005 at 02:00:10PM +1000, Damian Conway wrote:
: >Actually, it's not clear to me offhand why @1 shouldn't mean $1[]

: >and %1 shouldn't mean $1{}.
:
: It *does*. According to the recent capture semantics document:
:
: > Note that, outside a rule, C<@1> is simply a shorthand for C<@{$1}>,
:
: and:
:
: > And, of course, outside the rule, C<%1> is a shortcut for C<%{$1}>:

In that case it's very much less clear to me why it shouldn't mean that. :-)

Larry

Jonathan Scott Duff

unread,
May 13, 2005, 12:33:30 AM5/13/05
to perl6-l...@perl.org
On Thu, May 12, 2005 at 08:10:42PM -0700, Larry Wall wrote:

So the "counting" idiom in S05 becomes one of:

$match_count += @{m:g/pattern/};
$match_count += list m:g/pattern/;
$match_count += m:g/pattern/.matchount;
$match_count += (m:g/pattern/)[]; # maybe

???

Damian Conway

unread,
May 13, 2005, 1:43:43 AM5/13/05
to Patrick R. Michaud, perl6-l...@perl.org
Patrick surmised:

> So I'm guessing that we're all in agreement that +$/, +$1, and
> +$<subrule> all refer to the numeric value of the string matched,
> as opposed to what's currently written about their values in the
> draft...?

Yes. The semantics proposed in the draft have proved to be too orthogonal for
practical use. ;-)

Damian

Reply all
Reply to author
Forward
0 new messages