Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

dynamic arguments (was: The Sort Problem)

32 views
Skip to first unread message

Jonathan Lang

unread,
Feb 12, 2004, 11:14:40 PM2/12/04
to Luke Palmer, Perl 6 Language
Jonathan Lang wrote:
> Luke Palmer wrote:
> > I've been thinking about this problem which comes up in my code a lot:
> >
> > @sorted = sort { $^a.foo('bar').compute <=> $^b.foo('bar').compute}
> > @unsorted;
> >
> > Often the expressions on each side are even longer than that. But one
> > thing remains: both sides are exactly the same, substitute a $^b for
> > a $^a.
>
> The problem, then, isn't specific to sort; it happens any time that you
> find yourself needing to do the same things to multiple arguments. As
> such, the solution ought to be more general than just sort.
>
> You're looking for something to the effect of:
>
> @sorted = sort { parameters($^a <=> $^b).foo('bar').compute }
>
> That is, you need a way to factor C<.foo('bar').compute> out from each
> of the arguments that it applies to. For list arguments, this is
> straightforward: pipe the list arguments through a map before you pipe
> them into the routine. A similar approach could be used with named
> arguments. The problem comes when you deal with positional arguments.

How about including something similar to <==, but which binds the elements
of the list to the various positional parameters? For instance:

@sorted = sort {infix:<=> args map {$_.foo('bar').compute}, $^a, $^b }
@unsorted;

Where

@x = $a, $b, $c;
routine args @x;

is equivelent to

routine $a, $b, $c;

=====
Jonathan "Dataweaver" Lang

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Luke Palmer

unread,
Feb 12, 2004, 11:43:31 PM2/12/04
to Jonathan Lang, Perl 6 Language
Jonathan Lang writes:
> How about including something similar to <==, but which binds the elements
> of the list to the various positional parameters? For instance:
>
> @sorted = sort {infix:<=> args map {$_.foo('bar').compute}, $^a, $^b }
> @unsorted;
>
> Where
>
> @x = $a, $b, $c;
> routine args @x;
>
> is equivelent to
>
> routine $a, $b, $c;

We already have that. It's spelled:

routine *@x;

Or

routine * <== @x;

Luke

Jonathan Lang

unread,
Feb 12, 2004, 11:57:14 PM2/12/04
to Luke Palmer, Perl 6 Language

Then you've got your solution:

@sorted = sort {infix:<=> * map {$_.foo('bar').compute}, $^a, $^b }
@unsorted;

or

@sorted = sort {($^a, $^b) ==>
map {$_.foo('bar').compute} ==>
infix:<=> * } @unsorted;

Luke Palmer

unread,
Feb 13, 2004, 1:52:59 AM2/13/04
to Jonathan Lang, Perl 6 Language
Jonathan Lang writes:
> > We already have that. It's spelled:
> >
> > routine *@x;
> >
> > Or
> >
> > routine * <== @x;
>
> Then you've got your solution:
>
> @sorted = sort {infix:<=> * map {$_.foo('bar').compute}, $^a, $^b }
> @unsorted;
>
> or
>
> @sorted = sort {($^a, $^b) ==>
> map {$_.foo('bar').compute} ==>
> infix:<=> * } @unsorted;

No, I don't think we do. Uri's given some good points in this thread
about specifying what you want out of the sort, instead of some
arbitrary callback interface.

But it needs some major syntax work so it can feel more like it's a part
of the language instead of a library function. Not, mind, that I think
Perl's syntax needs to be changed at all to accommodate.

I'm thinking that C<sort> might take a list of closures or pairs (with
the value being a closure). I'm still not sure how that separates from
the data to be sorted, but it's probably related to the problem of how
C<for> works in declaration.

The return value of each closure is dispatched into some sort of generic
<=>, hopefully one that blows up if its arguments are of different
types[1]. That way if you have a method that always returns a number,
you can just leave it alone. Otherwise you're expected to prefix with
the approprate contextifier.

For example:

sort { .foo('bar').compute }, desc => { ~.baz }, @data;
# ascending by default string compares

Pairs don't exactly seem fit here. Maybe we're supposed to think of
these these closures as higher-level 'extractors', and therefore attach
properties to them:

sort { .foo('bar').compute }, { ~.baz } but descending, @data;

Or even something on the value that tells <=> to negate whatever it
returns:

sort { .foo('bar').compute }, { ~.baz but swap }, @data;

That's starting to expose implementation again, though.

Moving a different direction now. Instead of being honest and saying
that sort does each of these comparisons until one of them is nonzero,
let's pretend that it sorts with respect to the first comparison and
feeds the lists of equal elements into the second comparison, etc.

Sort might then look like:

sub sort (?&extract, ?&next, *@data)

And our example turns into:

sort { .foo('bar').compute }
{ reverse sort { ~.baz } *@$_ }
*@data;

That hurts, especially when there are more than two keys. I'd really
like to do that with the piping operators instead of nested closures,
but that screws with the simple case.

There might be even more ways of looking at this that would result in
more novel syntaxes. Brainstorming required.

Luke

[1] There's something to be said for a generic comparison, but not one
that blows up. Mostly because certain containers want their elements to
be well ordered. But we could just as easily say that elements that
aren't intercompatible with the short-tempered <=> aren't allowed in
said containers.

Rod Adams

unread,
Feb 13, 2004, 3:14:31 AM2/13/04
to Perl 6 Language
Here's my stab at a sort syntax, pulling syntax over from REs:

@out
<== sort key:ri($_->[2]), key:s($_->[4])
<== @in;

Basicly, you have a list of RE syntax like C<key> values, whilch take
various modifiers to say how to play with that key, and then an expr on
how to generate the key given element $_.

Possible modifiers: (verbose versions welcome)
:r reverse/descending
:n force numeric comparisons
:s force string comparisons (default)
:u unicode (implies :s)
:i case insensitive (implies :s)
:l localized collation order
:x call custom compare sub (a la P5)

This allows:

@out = sort keys %hash; # everything gets defaulted
@out = sort key:x{myfunc($a) cmp myfunc($b)}(), @in; # handy for P5
migration, but not much else
@out = sort key(myfunc($_)), @in; # same as above, but much better.
@out = sort key(%lookup{$_->{remotekey}}), key:ir($_->{priority}), @in;
# complex in P5, easy here.

Advantages:
- Uses syntax idioms used elsewhere in P6.
- Common cases are easy
- Decent huffman coding.

Disadvantages:
- Do we really want things that look like REs that are not REs?
- If we do this, are we setting ourselves up to create other RE-like
creatures for grep, for, etc, to the point
where people will start wanting to roll their own in modules?

Thoughts?
-- Rod

Uri Guttman

unread,
Feb 13, 2004, 3:41:35 AM2/13/04
to Rod Adams, Perl 6 Language
>>>>> "RA" == Rod Adams <r...@rodadams.net> writes:

RA> Here's my stab at a sort syntax, pulling syntax over from REs:

RA> @out
RA> <== sort key:ri($_->[2]), key:s($_->[4])
RA> <== @in;

RA> Basicly, you have a list of RE syntax like C<key> values, whilch take
RA> various modifiers to say how to play with that key, and then an expr
RA> on how to generate the key given element $_.

RA> Possible modifiers: (verbose versions welcome)
RA> :r reverse/descending
RA> :n force numeric comparisons
RA> :s force string comparisons (default)
RA> :u unicode (implies :s)
RA> :i case insensitive (implies :s)
RA> :l localized collation order
RA> :x call custom compare sub (a la P5)

i would support longer modifier names as aliases like rules do.

also numeric comparisons can be float or int. semantically they are the
same but it does affect internal stuff if you do a GRT implementation. i
hate to see this exposure of internals but if we want speed here it
needs to be shown.

RA> This allows:

RA> @out = sort keys %hash; # everything gets defaulted
RA> @out = sort key:x{myfunc($a) cmp myfunc($b)}(), @in; # handy for P5
RA> migration, but not much else
RA> @out = sort key(myfunc($_)), @in; # same as above, but much better.
RA> @out = sort key(%lookup{$_->{remotekey}}), key:ir($_->{priority}),
RA> @in; # complex in P5, easy here.

overall i like it.

RA> Advantages:
RA> - Uses syntax idioms used elsewhere in P6.
RA> - Common cases are easy
RA> - Decent huffman coding.

huffman isn't needed here IMO. sort isn't called so often so longer
modifier names are not a hinderance. rules will be called way more often
so short modifier aliases are useful huffman there. a good syntax that
handles multiple keys and is easy to remember and read is more important
than brevity. and you can spread keys out over multiple lines:

@out = sort
key( %lookup{ .{remotekey} } ),
key:insensitive:descending:locale( 'somewhere' )( .{priority} ),
key:float ( substr( 0, 10 ) ),
key:integer ( /foo(\d+)bar/ ),
key:compare( { ^$a <=> ^$b } )( /(\d+)$/ ),
key:compare( \&my_compare_sub ) ( /(\d+)$/ ),
@in ;

i think the . replaces $_->. in fact IIRC -> isn't used anymore for
deref.

do we need , between keys? sort basically will expect a list (without
commas?) of keys and then a list of data.

note that the code in key needs to be called in a list context so
grabbing (which i did perl5 style there. gotta get perl6::rules from
damian!) will work. maybe rules can deal with that in a better way than
i can think of.

note that i passed an argument to the locale modifier.

note that the custome compare callbacks can be a block or a sub
name/ref. the callback sub would be passed 2 args as usual.

do we need the ==> or <== stuff anywhere?

RA> Disadvantages:
RA> - Do we really want things that look like REs that are not REs?

other than the short names, it doesn't look like rule stuff.

RA> - If we do this, are we setting ourselves up to create other RE-like
RA> creatures for grep, for, etc, to the point
RA> where people will start wanting to roll their own in modules?

well, with the long names that isn't a problem.

RA> Thoughts?

drop the short names. but i like it. let's see if it passes larry's
muster.

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Dan Sugalski

unread,
Feb 13, 2004, 9:02:35 AM2/13/04
to Luke Palmer, Jonathan Lang, Perl 6 Language
At 11:52 PM -0700 2/12/04, Luke Palmer wrote:
>But it needs some major syntax work so it can feel more like it's a part
>of the language instead of a library function. Not, mind, that I think
>Perl's syntax needs to be changed at all to accommodate.

Since everyone's well past mad here and deep into drug-induced brain
damage territory...

If you're *really* looking to get fancy, why not just allow the sort
specification to be done with SQL? Comfortable, well-understood,
already has a decade or so of stupid things welded into it (so
everyone can stop trying to think up new stupid things to weld in),
and there are several grammars for it so it'd be no work to speak of
for me.

Heck, you could even unify map, grep, and sort and, if you fed in a
list of pair lists or hashes, return parts of each record^Wlist
element rather than the whole thing.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Angel Faus

unread,
Feb 13, 2004, 9:12:43 AM2/13/04
to Dan Sugalski, Perl 6 Language

Friday 13 February 2004 15:02, Dan Sugalski wrote:
>
> If you're *really* looking to get fancy, why not just allow the
> sort specification to be done with SQL? Comfortable,
> well-understood, already has a decade or so of stupid things welded
> into it [...]
>
> Heck, you could even unify map, grep, and sort [...]
>

That would be postmodern indeed.

-angel

Austin Hastings

unread,
Feb 14, 2004, 3:45:22 PM2/14/04
to Uri Guttman, Rod Adams, Perl 6 Language

--- Uri Guttman <u...@stemsystems.com> wrote:
> @out = sort
> key( %lookup{ .{remotekey} } ),
> key:insensitive:descending:locale( 'somewhere' )( .{priority} ),
> key:float ( substr( 0, 10 ) ),
> key:integer ( /foo(\d+)bar/ ),
> key:compare( { ^$a <=> ^$b } )( /(\d+)$/ ),
> key:compare( \&my_compare_sub ) ( /(\d+)$/ ),
> @in ;
>

...

> note that the custome compare callbacks can be a block or a sub
> name/ref. the callback sub would be passed 2 args as usual.

Off the top of my head, I can't think of a case where the compare sub
would be needed unless the key was not well-ordered. Does anyone have
an example of a case where the key-extraction sub approach doesn't
reduce the problem to a Scalar comparison?

=Austin

Uri Guttman

unread,
Feb 14, 2004, 4:17:18 PM2/14/04
to Austin_...@yahoo.com, Rod Adams, Perl 6 Language
>>>>> "AH" == Austin Hastings <austin_...@yahoo.com> writes:

AH> --- Uri Guttman <u...@stemsystems.com> wrote:
>> @out = sort

>> key:compare( \&my_compare_sub ) ( /(\d+)$/ ),
>> @in ;

>> note that the custome compare callbacks can be a block or a sub


>> name/ref. the callback sub would be passed 2 args as usual.

AH> Off the top of my head, I can't think of a case where the compare sub
AH> would be needed unless the key was not well-ordered. Does anyone have
AH> an example of a case where the key-extraction sub approach doesn't
AH> reduce the problem to a Scalar comparison?

there are rare times when you need a computed comparison, say with a
specialized collation sequence. if this sequence isn't supported by
locale or whatever, you need to do a callback comparison. i can't think
of a real world example but maybe with some odd symbolic character set
or a custom one. in any case, supporting the callback is easy. it has
one drawback which is that it disallows the GRT but the guts could
fallback to the ST then. in both cases, key extraction is the same.

Austin Hastings

unread,
Feb 14, 2004, 4:28:25 PM2/14/04
to Uri Guttman, Perl 6 Language

--- Uri Guttman <u...@stemsystems.com> wrote:
> >>>>> "AH" == Austin Hastings <austin_...@yahoo.com> writes:
>
> AH> --- Uri Guttman <u...@stemsystems.com> wrote:
> >> @out = sort
> >> key:compare( \&my_compare_sub ) ( /(\d+)$/ ),
> >> @in ;
>
> >> note that the custome compare callbacks can be a block or a sub
> >> name/ref. the callback sub would be passed 2 args as usual.
>
> AH> Off the top of my head, I can't think of a case where the
> compare sub
> AH> would be needed unless the key was not well-ordered. Does
> anyone have
> AH> an example of a case where the key-extraction sub approach
> doesn't
> AH> reduce the problem to a Scalar comparison?
>
> there are rare times when you need a computed comparison, say with a
> specialized collation sequence. if this sequence isn't supported by
> locale or whatever, you need to do a callback comparison. i can't
> think
> of a real world example but maybe with some odd symbolic character
> set
> or a custom one. in any case, supporting the callback is easy. it has
> one drawback which is that it disallows the GRT but the guts could
> fallback to the ST then. in both cases, key extraction is the same.

Hmm. If this is all, then I think it's worth putting the onus on the
developer to write a better key extractor.

What you're saying, basically, is that the "special comparator" would
be doing dynamic key generation.

IMO, let the developer write a xxx_2_UTF32 key extractor. In all
likelihood, once it's written he may find a way to use that format
throughout the code.

=Austin

Uri Guttman

unread,
Feb 14, 2004, 4:58:17 PM2/14/04
to Austin_...@yahoo.com, Perl 6 Language
>>>>> "AH" == Austin Hastings <austin_...@yahoo.com> writes:

AH> --- Uri Guttman <u...@stemsystems.com> wrote:

>> there are rare times when you need a computed comparison, say with
>> a specialized collation sequence. if this sequence isn't supported
>> by locale or whatever, you need to do a callback comparison. i
>> can't think of a real world example but maybe with some odd
>> symbolic character set or a custom one. in any case, supporting the
>> callback is easy. it has one drawback which is that it disallows
>> the GRT but the guts could fallback to the ST then. in both cases,
>> key extraction is the same.

AH> Hmm. If this is all, then I think it's worth putting the onus on the
AH> developer to write a better key extractor.

possibly.

AH> What you're saying, basically, is that the "special comparator" would
AH> be doing dynamic key generation.

not exactly. key extraction is getting the key out of the record. there
could be some munging applied at that phase as well. comparison is just
a replacement for <=> or cmp. it could be some strange collate sequence
based on specialized data. a contrived example is sorting by length but
only for strings beginning with i-k, otherwise go with alphabetic sort.
you would have to extract a dummy length value for the keys which don't
start with i-k and the name would be the second key in all cases. but
there are more complex cases where it is easier to put the logic in the
comparison code instead of the key extraction part.

the issue is that the comparison could have logic beyond simple
comparisons. so we do need to support the compare code block or sub
callback. as i said, when you use that you could (if we do it this way)
lose major speedups but that is a standard penalty to pay when you need
custom code in the comparison.

Rod Adams

unread,
Feb 14, 2004, 5:30:34 PM2/14/04
to Austin_...@yahoo.com, Uri Guttman, Perl 6 Language
Austin Hastings wrote:

>Off the top of my head, I can't think of a case where the compare sub
>would be needed unless the key was not well-ordered. Does anyone have
>an example of a case where the key-extraction sub approach doesn't
>reduce the problem to a Scalar comparison?
>
>

I can't find the P5 code I used for it right off, but I remember a case
when I was playing around with various football (US) stats. I was
ranking teams, and had something akin to:

@teams = sort {$b->{WinPct} <=> $a->{WinPct} ||
($a->{Conference} eq $b->{Conference}
?($b->{ConfWinPct} <=> $a->{ConfWinPct} ||
$b->{ConfWon} <=> $a->{ConfWon} ||
$a->{ConfLost} <=> $b->{ConfLost})
:($a->{Division} eq $b->{Division}
?($b->{DivWinPct} <=> $a->{DivWinPct} ||
$b->{DivWon} <=> $a->{DivWon} ||
$a->{DivLost} <=> $b->{DivLost})
:0
)
) ||
$b->{Won} <=> $a->{Won} ||
$a->{Lost} <=> $b->{Lost} ||
$b->{GamesPlayed} <=> $a->{GamesPlayed}
} @teams;

Creating a keycode for this situation is not a trivial task at all. So
sorts do occur in the real world for which there is no straightforward
way to generate a sortkey. Albeit considerably rare.

Also, I think there is utility in have a compare sub supported so that:
1) porting P5 code is easier. (a minor design rationale, but it exists)
2) people used to thinking in terms of compare subs (from C, P5, and
points of the programming universe) can still think that way.
3) most importantly to me, so that There's More Than One Way to Do It.

-- Rod Adams

Damian Conway

unread,
Feb 15, 2004, 5:59:45 PM2/15/04
to Larry Wall, Perl 6 Language
Here's a proposed syntax and semantics for C<sort> that tries to preserve the
(excellent) features of Uri's "on the right track" proposal whilst integrating
it into the Perl 6 design without multiplying entities (especially colons!)
unnecessarily.

Suppose C<sort>'s signature is:

type KeyExtractor ::= Code(Any) returns Any;

type Comparator ::= Code(Any, Any) returns Int;

type Criterion ::= KeyExtractor
| Comparator
| Pair(KeyExtractor, Comparator)
;

type Criteria ::= Criterion
| Array of Criterion
;

multi sub *sort(Criteria ?$by = {$^a cmp $^b}, *@data) {...}


In other words, C<sort> takes as its (optional) first argument either a single
sort criterion, or an array of criteria.

Each of those sort criteria may be a block/closure, or a pair of
block/closures. Each block/closure may take either one argument (in which case
it's a key extractor) or two arguments (in which case it's a comparator).

If a key-extractor block returns number, then C<< <=> >> is used to compare
those keys. Otherwise C<cmp> is used. In either case, the returned keys are
cached to optimize subsequent comparisons against the same element.

If a key-extractor/comparator pair is specified, the key-extractor is the key
of the pair and the comparator the value. The extractor is used to retreive
(and cache) keys, which are then passed to the comparator.

Which means that (a slightly extended version of) Uri's proposed:

@out = sort
key( %lookup{ .{remotekey} } ), #1
key:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2
key:float ( substr( 0, 10 ) ), #3
key:integer ( /foo(\d+)bar/ ), #4
key:compare( { ^$a <=> ^$b } )( /(\d+)$/ ), #5
key:compare( \&my_compare_sub ) ( /(\d+)$/ ), #6
key:compare( { just_guess $^b, $^a } ), #7
@in;

would become:

@out = sort
[ { ~ %lookup{ .{remotekey} } }, #1
{ use locale 'somewhere'; lc .{priority} } is descending, #2
{ + substr( 0, 10 ) }, #3
{ int /foo(\d+)bar/ }, #4
{ + m/(\d+)$/.[1] }, #5
{ /(\d+)$/ } => &my_compare_sub, #6
{ just_guess $^b, $^a }, #7
],
@in;

So to specify a key extractor we provide a block with one argument, as in
examples #1 through #5. Then to specify that the key is compared with C<cmp>
we return a string (using unary C<~> to be explicit about it) as in #1. To
specify that the key is compared with C<< <=> >> we return a number (using
unary C<+> to be explicit about it) as in #3 and #5. If we think that C<sort>
might be able to optimize integer comparisons, we can explicitly return an
integer, as in #4. If we want locale-sensitive sorting, we specify that with
the appropriate C<use locale> statement, as in #2.

To specify a comparator, we provide a block with two arguments, as in #7. That
block is always expected to return an integer.

To specify a key-extractor *and* an associated comparator, we bind them
together in a Pair, as in #6.

The only tricky bits are how to specify a key extractor for keys that are to
be sorted in descending order and/or case-insensitively. The case
insensitivity could be handled by simple C<lc>'ing, as in #2. Descending sort
order could be handled by a trait on the block/closure, as in #2.
Alternatively, case-insensitivity could be specified by trait as well.

Note that simple sorts would be unchanged:

@sorted = sort @unsorted;
@sorted = sort {$^b <=> $^a} @unsorted;

But now one could also very neatly code cases that formerly required an Orcish
or Transformed sort. For example, these:

@sorted = sort {(%M{$^a}//-M $^a) <=> (%M{$^b}//-M $^b)} @unsorted;

@sorted = map $_[1], sort {$^a[0] <=> $^b[0]}, map [-M,$_], @unsorted;

would both become:

@sorted = sort {-M} @unsorted;


Damian

Joe Gottman

unread,
Feb 15, 2004, 6:43:32 PM2/15/04
to Perl 6 Language

----- Original Message -----
From: "Damian Conway" <dam...@conway.org>
To: "Larry Wall" <la...@wall.org>
Cc: "Perl 6 Language" <perl6-l...@perl.org>
Sent: Sunday, February 15, 2004 5:59 PM
Subject: [perl] Re: The Sort Problem


> Here's a proposed syntax and semantics for C<sort> that tries to preserve
the
> (excellent) features of Uri's "on the right track" proposal whilst
integrating
> it into the Perl 6 design without multiplying entities (especially
colons!)
> unnecessarily.
>
> Suppose C<sort>'s signature is:
>
> type KeyExtractor ::= Code(Any) returns Any;
>
> type Comparator ::= Code(Any, Any) returns Int;
>
> type Criterion ::= KeyExtractor
> | Comparator
> | Pair(KeyExtractor, Comparator)
> ;
>
> type Criteria ::= Criterion
> | Array of Criterion
> ;
>
> multi sub *sort(Criteria ?$by = {$^a cmp $^b}, *@data) {...}
>

If we use this signature, won't the code
sort ('foo', 'bar', 'glarch');

attempt to use the first parameter as a Criteria? Since sort has to be a
multi sub anyhow, why don't we do

multi sub *sort(Criteria $by : *@data) {...}
multi sub *sort( : *@data) { ...} # returns sort {$^a cmp $^b} @data

Joe Gottman


Damian Conway

unread,
Feb 15, 2004, 6:59:57 PM2/15/04
to Perl 6 Language
Joe Gottman asked:

> If we use this signature, won't the code
> sort ('foo', 'bar', 'glarch');
>
> attempt to use the first parameter as a Criteria?

No. Because a string like 'foo' wouldn't match the first parameter's type.


> Since sort has to be a multi sub anyhow, why don't we do
>
> multi sub *sort(Criteria $by : *@data) {...}
> multi sub *sort( : *@data) { ...} # returns sort {$^a cmp $^b} @data

We probably should (from an implementation efficiency standpoint if nothing else).


Damian

Uri Guttman

unread,
Feb 15, 2004, 7:23:05 PM2/15/04
to Damian Conway, Larry Wall, Perl 6 Language
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:

DC> Suppose C<sort>'s signature is:

DC> type KeyExtractor ::= Code(Any) returns Any;

DC> type Comparator ::= Code(Any, Any) returns Int;

DC> type Criterion ::= KeyExtractor
DC> | Comparator
DC> | Pair(KeyExtractor, Comparator)
DC> ;

DC> type Criteria ::= Criterion
DC> | Array of Criterion
DC> ;

DC> multi sub *sort(Criteria ?$by = {$^a cmp $^b}, *@data) {...}

DC> Each of those sort criteria may be a block/closure, or a pair of
DC> block/closures. Each block/closure may take either one argument (in
DC> which case it's a key extractor) or two arguments (in which case it's
DC> a comparator).

DC> If a key-extractor block returns number, then C<< <=> >> is used to
DC> compare those keys. Otherwise C<cmp> is used. In either case, the
DC> returned keys are cached to optimize subsequent comparisons against
DC> the same element.

i would make cmp the default as it is now. you sort strings much more
often than you sort numbers.

DC> If a key-extractor/comparator pair is specified, the key-extractor is
DC> the key of the pair and the comparator the value. The extractor is
DC> used to retreive (and cache) keys, which are then passed to the
DC> comparator.

DC> Which means that (a slightly extended version of) Uri's proposed:

DC> @out = sort
DC> key( %lookup{ .{remotekey} } ), #1
DC> key:insensitive:descending:locale( 'somewhere' )( .{priority} ), #2
DC> key:float ( substr( 0, 10 ) ), #3
DC> key:integer ( /foo(\d+)bar/ ), #4
DC> key:compare( { ^$a <=> ^$b } )( /(\d+)$/ ), #5
DC> key:compare( \&my_compare_sub ) ( /(\d+)$/ ), #6
DC> key:compare( { just_guess $^b, $^a } ), #7
DC> @in;

DC> would become:

DC> @out = sort
DC> [ { ~ %lookup{ .{remotekey} } }, #1

if string cmp is the default, wouldn't that ~ be redundant?
and i keep forgetting that ~ is now the stringify operator :)

DC> { use locale 'somewhere'; lc .{priority} } is descending, #2

hmmm, this needs more work IMO. requiring the coder to lc the key is
moving the case insensitivity feature back into code. wouldn't 'is
insensitive' be ok?

how does the internal guts get the descending/insensitive flags/traits
passed to it? i know it is an internals problem but i am just pondering
it.

DC> { + substr( 0, 10 ) }, #3
DC> { int /foo(\d+)bar/ }, #4

i would also expect int to be a default over float as it will be used
more often. + is needed there since the regex returns a string. in the
#3 case that would be an int as well. so we need a 'float' cast
thingy. BTW, the only way to get a number as a key is from a structure
where the field was assigned as a number/int. that may not happen a lot
so the int/float cast will probably be needed here to sort correctly


DC> { + m/(\d+)$/.[1] }, #5
DC> { /(\d+)$/ } => &my_compare_sub, #6

missing a close } it seems.

DC> { just_guess $^b, $^a }, #7

is that a reverse order sort? why not skip the args and do this:

{ &just_guess is descending }, #7

DC> ],

so the first arg to sort is either a single compare block or a anon list
of them? i figure we need the [] to separate the criteria from the input
data. but what about this odd case,

sort [...], [...], [...]

now that is stupid code but it could be trying to sort the refs by their
address in string mode. or it could be a sort criteria list followed by
2 refs to input records.

DC> @in;

DC> So to specify a key extractor we provide a block with one argument, as
DC> in examples #1 through #5. Then to specify that the key is compared
DC> with C<cmp> we return a string (using unary C<~> to be explicit about
DC> it) as in #1. To specify that the key is compared with C<< <=> >> we
DC> return a number (using unary C<+> to be explicit about it) as in #3
DC> and #5. If we think that C<sort> might be able to optimize integer
DC> comparisons, we can explicitly return an integer, as in #4. If we want
DC> locale-sensitive sorting, we specify that with the appropriate C<use


locale> statement, as in #2.

DC> To specify a comparator, we provide a block with two arguments, as in
DC> #7. That block is always expected to return an integer.

so #7 is a call to just_guess which is passed the 2 args to compare. it
must return an int like cmp/<=>. as i pointed out above, i don't see why
you even need to show the ^$a and ^$b args? they will be passed into
just_guess that way. let is descending handle the sort ordering.

DC> To specify a key-extractor *and* an associated comparator, we bind
DC> them together in a Pair, as in #6.

that works for me.

DC> The only tricky bits are how to specify a key extractor for keys that
DC> are to be sorted in descending order and/or case-insensitively. The
DC> case insensitivity could be handled by simple C<lc>'ing, as in
DC> #2. Descending sort order could be handled by a trait on the
DC> block/closure, as in #2. Alternatively, case-insensitivity could be
DC> specified by trait as well.

ok, i like a trait better than coding in lc. better semantics and
clearer to read. the is descending trait could just cause the sort
callbacks to be called with reversed arguments and so the callback code
never has to worry about that.

DC> Note that simple sorts would be unchanged:

DC> @sorted = sort @unsorted;
DC> @sorted = sort {$^b <=> $^a} @unsorted;

DC> But now one could also very neatly code cases that formerly required
DC> an Orcish or Transformed sort. For example, these:

DC> @sorted = sort {(%M{$^a}//-M $^a) <=> (%M{$^b}//-M $^b)} @unsorted;

wow, that is UGLY! but i get it after a few hours of study. :) just the
orcish maneuver but with //. i think you also mean //= there.

DC> @sorted = map $_[1], sort {$^a[0] <=> $^b[0]}, map [-M,$_],
DC> @unsorted;

the familiar ST.

DC> would both become:

DC> @sorted = sort {-M} @unsorted;

that still wants to be cached somehow as -M is expensive. but that is an
internal thing and the ST/GRT both can handle it. so -M there is a
simple key extraction on the files in @unsorted. assuming no internal
caching would this be correct?

@sorted = sort {%M{$_} //= -M} @unsorted;

i assume //= will be optimized and -M won't be called if it is cached.

also where does %M get declared and/or cleared before this? can it be
done in the block:

@sorted = sort {my %M ; %M{$_} //= -M} @unsorted;

another -M problem is that it currently returns a float so that must be
marked/cast as a float.

@sorted = sort {float -M} @unsorted;

that cast causes a <=> compare (since i would rather see cmp be the
default) and also can let the internal GRT properly merge a float key
instead of an int. it should work without it (not sure how) but the
float marker is important to optimization and a proper comparison.
maybe the fact that the compiler knows -M returns a float can be used to
mark it internally and the explicit float isn't needed here. but data
from a user record will need to be marked as float as the compiler can't
tell.

anyhow, i am glad i invoked your name and summoned you into this
thread. :)

Damian Conway

unread,
Feb 15, 2004, 9:02:37 PM2/15/04
to Perl 6 Language
Uri wrote:


> DC> If a key-extractor block returns number, then C<< <=> >> is used to
> DC> compare those keys. Otherwise C<cmp> is used. In either case, the
> DC> returned keys are cached to optimize subsequent comparisons against
> DC> the same element.
>
> i would make cmp the default as it is now.

Err. That's kinda what "Otherwise C<cmp> is used" means. ;-)

> DC> @out = sort
> DC> [ { ~ %lookup{ .{remotekey} } }, #1
>
> if string cmp is the default, wouldn't that ~ be redundant?

How do you know that the values of %lookup are strings?
How would the optimizer know?

> DC> { + substr( 0, 10 ) }, #3
> DC> { int /foo(\d+)bar/ }, #4
>
> i would also expect int to be a default over float as it will be used
> more often. + is needed there since the regex returns a string. in the
> #3 case that would be an int as well. so we need a 'float' cast
> thingy.

Unary C<+> *is* the "float cast thingy"!


> BTW, the only way to get a number as a key is from a structure
> where the field was assigned as a number/int. that may not happen a lot
> so the int/float cast will probably be needed here to sort correctly

That's the point.

If you want to force numeric comparison of keys you explicitly cast each key
to number using unary C<+> or C<int>. If you want to force stringific
comparison you explicitly cast the key to string using unary C<~>.

If you don't explicitly cast either way, C<sort> just DWIMs by looking at the
actual type of the keys returned by the extractor. If any of them isn't a
number, it defaults to C<cmp>.

> DC> { + m/(\d+)$/.[1] }, #5
> DC> { /(\d+)$/ } => &my_compare_sub, #6
>
> missing a close } it seems.

Yup. Thanks.

> DC> { just_guess $^b, $^a }, #7
>
> is that a reverse order sort? why not skip the args and do this:
>
> { &just_guess is descending }, #7

Because I wanted to show a plain old two-parameter block being used as a
*comparator* (not a one-parameter block being used as a key extractor).

> so the first arg to sort is either a single compare block or a anon list
> of them? i figure we need the [] to separate the criteria from the input
> data.

Yep.


> but what about this odd case,
>
> sort [...], [...], [...]
>
> now that is stupid code but it could be trying to sort the refs by their
> address in string mode.

In which case we probably should have written it:

sort <== [...], [...], [...]


> or it could be a sort criteria list followed by
> 2 refs to input records.

Only if the first array ref contains nothing but Criterion objects.


> DC> To specify a comparator, we provide a block with two arguments, as in
> DC> #7. That block is always expected to return an integer.
>
> so #7 is a call to just_guess which is passed the 2 args to compare. it
> must return an int like cmp/<=>.

Yep.


> as i pointed out above, i don't see why
> you even need to show the ^$a and ^$b args?

So the block knows it has two parameters.


> they will be passed into just_guess that way. let is descending handle the sort ordering.

But you *can't* apply C<is descending> to a Code reference.

Nor are we sure that the order *is* descending. Maybe the C<just_guess>
predicate is invariant to argument order and there were other reasons to pass
the args in that order. Or maybe we reversed the order because we know that in
C<just_guess> never returns zero, but defaults to its second argument being
smaller, in which case we needed to reverse the args so that the C<sort>
remained stable.

The point is that I wanted to show a vanilla two-parameter compare block.
(And, boy, am I ever sorry I whimsically reversed the args to indicate
generality ;-)

> DC> @sorted = sort {(%M{$^a}//-M $^a) <=> (%M{$^b}//-M $^b)} @unsorted;
>
> wow, that is UGLY! but i get it after a few hours of study. :) just the
> orcish maneuver but with //. i think you also mean //= there.

Yup. Should indeed be //=

> DC> @sorted = sort {-M} @unsorted;
>
> that still wants to be cached somehow as -M is expensive.

It *is* cached. It's a one-parameter block. So its a key extractor. So it
automagically caches the keys it extracts.


> so -M there is a
> simple key extraction on the files in @unsorted.

Yup.


> assuming no internal caching

Key extractors will always cache.


> @sorted = sort {%M{$_} //= -M} @unsorted;
>
> i assume //= will be optimized and -M won't be called if it is cached.
>
> also where does %M get declared and/or cleared before this?

Exactly the problem. That's why key extractors aways cache.


> can it be
> done in the block:
>
> @sorted = sort {my %M ; %M{$_} //= -M} @unsorted;

If you'd gone insane and particularly wanted to do it that way, you'd need
something like:

@sorted = sort {state %M ; %M{$_} //= -M} @unsorted;

to ensure the cache persisted between calls to the key extractor.


> another -M problem is that it currently returns a float so that must be
> marked/cast as a float.
>
> @sorted = sort {float -M} @unsorted;

No. *Because* -M returns a number, C<sort> automatically knows to use numeric
comparison on those keys.


> maybe the fact that the compiler knows -M returns a float can be used to
> mark it internally and the explicit float isn't needed here.

Exactly.


> but data
> from a user record will need to be marked as float as the compiler can't
> tell.

It *can* tell if the elements are typed. But, yes, most of the time if you
want to ensure numeric comparison you will explicitly prefix with a C<+> to
give the compiler a hint. Otherwise C<sort> will have to fall back on looking
at the keys that are extracted and working out at run-time which type of
comparison to use (kinda like the smartmatch operator does).


> anyhow, i am glad i invoked your name and summoned you into this
> thread. :)

Well, that makes *one* of us.

;-)

Damian

Uri Guttman

unread,
Feb 15, 2004, 11:39:05 PM2/15/04
to Damian Conway, Perl 6 Language
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:

DC> Uri wrote:

DC> @out = sort
DC> [ { ~ %lookup{ .{remotekey} } }, #1
>> if string cmp is the default, wouldn't that ~ be redundant?

DC> How do you know that the values of %lookup are strings?
DC> How would the optimizer know?

because that would be the default comparison and the extracted key value
would be stringified unless some other marker is used. most sorts are on
strings so this would be a useful huffman and removal of a redundancy.

DC> { + substr( 0, 10 ) }, #3
DC> { int /foo(\d+)bar/ }, #4
>> i would also expect int to be a default over float as it will be used
>> more often. + is needed there since the regex returns a string. in the
>> #3 case that would be an int as well. so we need a 'float' cast
>> thingy.

DC> Unary C<+> *is* the "float cast thingy"!

hmm, so + is float but int is needed otherwise? int is more commonly a
sort key than float. it just feels asymetrical with one having a symbol
and the other a named operator.

DC> If you want to force numeric comparison of keys you explicitly cast
DC> each key to number using unary C<+> or C<int>. If you want to force
DC> stringific comparison you explicitly cast the key to string using
DC> unary C<~>.

or ~ is the default if nothing else is specified. it matches up with cmp
being the default comparison as you agreed above.

DC> If you don't explicitly cast either way, C<sort> just DWIMs by looking
DC> at the actual type of the keys returned by the extractor. If any of
DC> them isn't a number, it defaults to C<cmp>.

that doesn't work for me. the GRT can't scan all keys before it
decides on what comparison operator is needed. the GRT needs to merge
keys as they are extracted and it needs to do the correct conversion to
a byte string then and not later. you can't DWIM this away if you want
the optimization. the ST can get away with it since you are still using
a compare block even if it is generated internally by the sort function.


DC> { just_guess $^b, $^a }, #7
>> is that a reverse order sort? why not skip the args and do this:
>> { &just_guess is descending },
>> #7

DC> Because I wanted to show a plain old two-parameter block being used as
DC> a *comparator* (not a one-parameter block being used as a key
DC> extractor).

that seems like extra redundant code just to mark a callback vs a
extract block. there should be a cleaner way to do that. maybe a null
extract key in the pair?

{ '' => &just_guess }
{ undef => &just_guess } # will => autoquote undef there?

then the records are full keys with no special extraction but there is a
callback comparator. no need to declare any arguments to the callback
sub here since you know it is that and not key extract code.

>> but what about this odd case,
>> sort [...], [...], [...]
>> now that is stupid code but it could be trying to sort the refs by
>> their
>> address in string mode.

DC> In which case we probably should have written it:

DC> sort <== [...], [...], [...]

i did ask in another post whether <== or ==> would fit in here. so that
line forces it all to be data and my silly example has the first anon
list of criteria and the rest as data. works for me so far.

>> or it could be a sort criteria list followed by
>> 2 refs to input records.

DC> Only if the first array ref contains nothing but Criterion objects.

but what defines them as criteria objects? they look just like records
which could be sorted as well. i don't see any syntactical markers that
{} means criteria block vs a hash ref. DWIM guessing isn't good enough
for me here. sort in p5 already had some issues with that IIRC.

>> as i pointed out above, i don't see why
>> you even need to show the ^$a and ^$b args?

DC> So the block knows it has two parameters.

but the callback sub knows it has two params since it has to be written
that way. sort always calls the code block with 2 params.

>> they will be passed into just_guess that way. let is descending
>> handle the sort ordering.

DC> But you *can't* apply C<is descending> to a Code reference.

then how did you apply 'is insensitive'? aside from how it is done
(traits and such), we need a syntax that conveys the semantics of
insensitive and descending into the sort func. it will use that info to
reverse the key order to comparison subs or modify the key merging of
the GRT to effect those flags. what i am saying is i think that you need
to go back to the drawing board to find a clean syntax to mark those
flags. note that neither the code block nor the callback needs to see
them, only the sort guts ever needs to see them. so we are communicating
info to the sort about this key. the code block/callback only ever sees
two arguments to compare and nothing else. maybe this will clarify for
you the intentions of those flags and why they have nothing to do with
the code block but rather describe the behavior of this key.

DC> Nor are we sure that the order *is* descending. Maybe the
DC> C<just_guess> predicate is invariant to argument order and there
DC> were other reasons to pass the args in that order. Or maybe we
DC> reversed the order because we know that in C<just_guess> never
DC> returns zero, but defaults to its second argument being smaller,
DC> in which case we needed to reverse the args so that the C<sort>
DC> remained stable.

i just don't like the reverse args concept at all. it is not
semantically useful to sorting. sorts care about ascending and
descending, not a vs b in args.

DC> The point is that I wanted to show a vanilla two-parameter compare
DC> block. (And, boy, am I ever sorry I whimsically reversed the args
DC> to indicate generality ;-)

but i am glad you did since it brings up this issue. i don't think we
should allow any args there at all as they are not needed. compare
blocks get called with 2 args. some sort of descending marker reverses
the args before the call. also in the GRT, descending causes special data
munging so the keys will sort in descending order so that has to be
passed on to the guts somehow.

DC> @sorted = sort {-M} @unsorted;
>> that still wants to be cached somehow as -M is expensive.

DC> It *is* cached. It's a one-parameter block. So its a key extractor. So
DC> it automagically caches the keys it extracts.

??? who and what caches it? it will get called again and again for the
same record. where is the definition that 1 param code blocks do
caching? do they all do that?

>> assuming no internal caching

DC> Key extractors will always cache.

so this is a key extraction feature about caching. the ST and GRT don't
need caching so that is a waste if those are used. only a orcish sort
needs caching.

>> @sorted = sort {%M{$_} //= -M} @unsorted;
>> i assume //= will be optimized and -M won't be called if it is
>> cached.
>> also where does %M get declared and/or cleared before this?

DC> Exactly the problem. That's why key extractors aways cache.

but they can't always cache. it depends on the implementation and
possibly at runtime (selecting orchish, ST or GRT based on some other
criteria).

>> can it be
>> done in the block:
>> @sorted = sort {my %M ; %M{$_} //= -M} @unsorted;

DC> If you'd gone insane and particularly wanted to do it that way, you'd
DC> need something like:

DC> @sorted = sort {state %M ; %M{$_} //= -M} @unsorted;

DC> to ensure the cache persisted between calls to the key extractor.

and will that get cleared before each sort?

>> another -M problem is that it currently returns a float so that must be
>> marked/cast as a float.
>> @sorted = sort {float -M} @unsorted;

DC> No. *Because* -M returns a number, C<sort> automatically knows to use
DC> numeric comparison on those keys.

>> maybe the fact that the compiler knows -M returns a float can be used to
>> mark it internally and the explicit float isn't needed here.

DC> Exactly.

ok for this case but not for data from a record.

>> but data
>> from a user record will need to be marked as float as the compiler can't
>> tell.

DC> It *can* tell if the elements are typed. But, yes, most of the time if
DC> you want to ensure numeric comparison you will explicitly prefix with
DC> a C<+> to give the compiler a hint. Otherwise C<sort> will have to
DC> fall back on looking at the keys that are extracted and working out at
DC> run-time which type of comparison to use (kinda like the smartmatch
DC> operator does).

only in the ST. the orcish does its compares on the fly without a full
prescan. and the GRT can't prescan as it mungs and merges keys on the
fly.

>> anyhow, i am glad i invoked your name and summoned you into this
>> thread. :)

DC> Well, that makes *one* of us.

DC> ;-)

when are you going to get your life sorted out? :)

Luke Palmer

unread,
Feb 16, 2004, 1:30:58 AM2/16/04
to Uri Guttman, Damian Conway, Perl 6 Language
Uri Guttman writes:
> because that would be the default comparison and the extracted key value
> would be stringified unless some other marker is used. most sorts are on
> strings so this would be a useful huffman and removal of a redundancy.

While I like where most of this is going, I beg to differ on this point.
I end up sorting numbers more often then strings, except when I'm
sorting the keys of a hash. I'm always annoyed in my one liners when I
have to write out:

sort { $a <=> $b } ...

And I'd be thrilled if it were as easy as:

sort { +$_ } ...

Luke

Uri Guttman

unread,
Feb 16, 2004, 1:43:02 AM2/16/04
to Luke Palmer, Damian Conway, Perl 6 Language
>>>>> "LP" == Luke Palmer <fibo...@babylonia.flatirons.org> writes:

LP> Uri Guttman writes:
>> because that would be the default comparison and the extracted key value
>> would be stringified unless some other marker is used. most sorts are on
>> strings so this would be a useful huffman and removal of a redundancy.

LP> While I like where most of this is going, I beg to differ on this point.
LP> I end up sorting numbers more often then strings, except when I'm
LP> sorting the keys of a hash. I'm always annoyed in my one liners when I
LP> have to write out:

LP> sort { $a <=> $b } ...

LP> And I'd be thrilled if it were as easy as:

LP> sort { +$_ } ...

oh, i don't want to see <=> anywhere in sort code. i don't mind + or int
as markers for that. you seem to actually be agreeing with me that
strings are the default key type and + is needed for a numeric sort. but
do you do integer or float sorts? and as i said, i don't like the
asymmetry of int vs +.

Damian Conway

unread,
Feb 16, 2004, 1:16:54 AM2/16/04
to Perl 6 Language
Uri persisted:

> DC> How do you know that the values of %lookup are strings? DC> How
> would the optimizer know?
>
> because that would be the default comparison and the extracted key value
> would be stringified unless some other marker is used. most sorts are on
> strings so this would be a useful huffman and removal of a redundancy.

Leaving aside your assertion about the commonest type of sort (which I
don't see how you can have reliable evidence about), we're not talking
about "markers"; these are *operators* that ensure that the keys extracted
are of a particular static type.


> DC> Unary C<+> *is* the "float cast thingy"!
>
> hmm, so + is float

No. C<+> is number.


> but int is needed otherwise? int is more commonly a sort key than float.
> it just feels asymetrical with one having a symbol and the other a named
> operator.

Sorry, but that's the way the language works. The more general and usual
conversion (to number, not to integer) has the shorter name.


> DC> If you want to force numeric comparison of keys you explicitly

> DC> cast each key to number using unary C<+> or C<int>. If you want to
> DC> force stringific comparison you explicitly cast the key to string
> DC> using unary C<~>.


>
> or ~ is the default if nothing else is specified. it matches up with cmp
> being the default comparison as you agreed above.

Okay, if you want to think of it that way, fine.


> DC> If you don't explicitly cast either way, C<sort> just DWIMs by

> DC> looking at the actual type of the keys returned by the extractor.
> DC> If any of them isn't a number, it defaults to C<cmp>.


>
> that doesn't work for me. the GRT can't scan all keys before it decides
> on what comparison operator is needed. the GRT needs to merge keys as
> they are extracted and it needs to do the correct conversion to a byte
> string then and not later. you can't DWIM this away if you want the
> optimization.

EXACTLY!!! So, if you want the GRT to kick in, you have to make sure the
return type of your block is appropriate. If you don't give the compiler that
hint, you don't get the optimized sorting.


> DC> Because I wanted to show a plain old two-parameter block being
> used as DC> a *comparator* (not a one-parameter block being used as a
> key DC> extractor).
>
> that seems like extra redundant code just to mark a callback vs a
> extract block.

It's just an *example*, for heavens sake.


> there should be a cleaner way to do that.

There is. If it's just a callback that is already declared to take two
parameters, and you want to pass $^a and $^b in that order, then you can just
write:

sort &just_guess, @unsorted;

or, if it's one of many criteria, you write:

sort [ { -M }, { .{owner} }, &just_guess,
] @unsorted;


> maybe a null extract key in the pair?

Unnecessary. See above.

> >> or it could be a sort criteria list followed by 2 refs to input
> >> records.
>
> DC> Only if the first array ref contains nothing but Criterion
> objects.
>
> but what defines them as criteria objects?

Their *type*. Remember this:

type KeyExtractor ::= Code(Any) returns Any;

type Comparator ::= Code(Any, Any) returns Int;

type Criterion ::= KeyExtractor
| Comparator Pair(KeyExtractor, Comparator)
;

type Criteria ::= Criterion
| Array of Criterion
;

multi sub *sort(Criteria $by = {$^a cmp $^b}: *@data) {...}

???

The criteria have to be of a very specific type.


> they look just like records which could be sorted as well.

The point is that an array is only a list C<sort> criteria if every one of its
elements is a one- or two-parameter block, or a Pair of blocks. And the blocks
have to have very specific signatures too.

The *only* time there's going to be any ambiguity is if you wanted to sort a
list of arrays of C<sort> criteria...not a very common occurrence, I suspect. ;-)


> i don't see any syntactical markers that {} means criteria block vs a
> hash ref.

Because there aren't any. Blocks are blocks and hashes are hashes, and in Perl
6 we have a very clear set of syntactic rules that determines which is which.


> DWIM guessing isn't good enough for me here.

This *isn't* DWIM guessing. This is block parsing and static typing.


> sort in p5 already had some issues with that IIRC.

Only because Perl 5 has issues with hashes vs blocks.
Perl 6 fixes those issues.


> but the callback sub knows it has two params since it has to be written
> that way.

But it *isn't* a callback sub in my example!!! It's a call to subroutine
inside a block. And the *block* is the comparator C<sort> sees. The parameters
of the block are $^a and $^b and it's a two parameter block (and therefore a
valid comparator) precisely *because* it has those two parameters.

Look, FORGET about that example. It was obviously a bad example. It's
caused enough pain (to me if no-one else).

Here's another example (all praise be to Rod!):

@teams = sort [
# 1 parameter so it's a key extractor...
{+ $^team->{WinPct} },
# 2 parameters so it's a comparator...
{ ($^a->{Conference} eq $^b->{Conference}
? ($^b->{ConfWinPct} <=> $^a->{ConfWinPct} ||
$^b->{ConfWon} <=> $^a->{ConfWon} ||
$^a->{ConfLost} <=> $^b->{ConfLost})
: ($^a->{Division} eq $^b->{Division}
? ($^b->{DivWinPct} <=> $^a->{DivWinPct} ||
$^b->{DivWon} <=> $^a->{DivWon} ||
$^a->{DivLost} <=> $^b->{DivLost})
: 0
)
},
# 1 parameter each so all three are key extractors...
{+ $^team->{Won} } is descending,
{+ $^team->{Lost} },
{+ $^team->{GamesPlayed} } is descending,
] @teams;


> sort always calls the code block with 2 params.

No, it calls a comparator block with two *arguments*.

It recognizes the block as being a comparator in the first place precisely
*because* it has two parameters.


> DC> But you *can't* apply C<is descending> to a Code reference.
>
> then how did you apply 'is insensitive'?

I applied it to the *block* (i.e. a closure definition).
That's not the same thing as a Code reference.


> what i am saying is i think that you need to go back to the drawing
> board to find a clean syntax to mark those flags.

No, I think we have it...just put the appropriate trait on the extractor
*block* when you define it.


> i just don't like the reverse args concept at all. it is not
> semantically useful to sorting. sorts care about ascending and
> descending, not a vs b in args.

The problem is that sometimes comparator logic is suffiently complex that
a single comparator needs to have a "bit each way", as in Rod's
football team comparator above.


> but i am glad you did since it brings up this issue. i don't think we
> should allow any args there at all as they are not needed.

The *are* needed. They're the very things that tell the compiler that
the block has two parameters and may therefore be used as a comparator.


> some sort of descending marker reverses the args before the call.

As I explained above, that's not always sufficient. If you have to resort to
coding a comparator at all (instead of just using a key extractor) you're
almost certainly usign some criterion that *isn't* amenable to a simple
descent marker. Because it if *was* amenable, you'd have used a simple key
extractor instead.


> also in the GRT, descending causes special data munging so the keys will
> sort in descending order so that has to be passed on to the guts somehow.

Yes. As a trait on the key extractor.


> DC> @sorted = sort {-M} @unsorted;
> >> that still wants to be cached somehow as -M is expensive.
>
> DC> It *is* cached. It's a one-parameter block. So its a key
> extractor. So DC> it automagically caches the keys it extracts.
>
> ??? who and what caches it? it will get called again and again for the
> same record.

No it won't. That's what "caching" *means*.


> where is the definition that 1 param code blocks do caching?

They don't. I'm saying that when C<sort> *calls* a key extractor block
for a particular element, C<sort> will cache the result.

> DC> Key extractors will always cache.
>
> so this is a key extraction feature about caching.

Yep. Perhaps I should have said:

Key extractions will always be cached.


> the ST and GRT don't need caching so that is a waste if those are used.

ST and GRT *are* caching algorithms. Each of them caches a munged key and
thereafter just uses that (i.e. rather than recomputing the key). That's
the whole *point* of them.

My point is that when a sorting algorithm calls a key extractor, it will
cache the results...in whatever way is appropriate to the algorithm's needs.


> but they can't always cache.

Their results *can* always be cached. Their results *must* always be cached.
It makes no sense to call a key extractor block more than once for any
particular element being sorted.


> it depends on the implementation and possibly at runtime (selecting
> orchish, ST or GRT based on some other criteria).

Yes. The form of caching is an implicit part of the algorithm C<sort> chooses.


> DC> If you'd gone insane and particularly wanted to do it that way,
> you'd DC> need something like:
>
> DC> @sorted = sort {state %M ; %M{$_} //= -M} @unsorted;
>
> DC> to ensure the cache persisted between calls to the key extractor.
>
> and will that get cleared before each sort?

No. And that's exactly the problem. That's why you'd be insane to do the
caching yourself. That's why the caching *has* to be internal to C<sort>.


> >> maybe the fact that the compiler knows -M returns a float can be
> >> used to mark it internally and the explicit float isn't needed here.
>
> DC> Exactly.
>
> ok for this case but not for data from a record.

As I said before, it depends if the record is statically typed. If not,
then you either have to explicitly coerce it with C<~> or C<+> or C<int>, or
you let C<sort> work it out polymorphically at run-time (and pay for that by
forgoing some possible optimizations).


> >> but data from a user record will need to be marked as float as the
> >> compiler can't tell.
>
> DC> It *can* tell if the elements are typed. But, yes, most of the

> DC> time if you want to ensure numeric comparison you will explicitly
> DC> prefix with a C<+> to give the compiler a hint. Otherwise C<sort>
> DC> will have to fall back on looking at the keys that are extracted
> DC> rand working out at run-time which type of comparison to use (kinda
> DC> like the smartmatch operator does).


>
> only in the ST. the orcish does its compares on the fly without a full
> prescan. and the GRT can't prescan as it mungs and merges keys on the
> fly.

Then, naturally, if you don't provide the appropriate type information
C<sort> won't be able to select Orcish or GRT. Not that missing the Orcish
matters, since it's merely a jitted ST (which *will* still be available).

Damian

Luke Palmer

unread,
Feb 16, 2004, 3:04:01 AM2/16/04
to Damian Conway, Perl 6 Language
Damian Conway writes:
> type KeyExtractor ::= Code(Any) returns Any;
>
> type Comparator ::= Code(Any, Any) returns Int;
>
> type Criterion ::= KeyExtractor
> | Comparator Pair(KeyExtractor, Comparator)
> ;
>
> type Criteria ::= Criterion
> | Array of Criterion
> ;
>
> multi sub *sort(Criteria $by = {$^a cmp $^b}: *@data) {...}

This is neat stuff. But in order for this to work, we're relying on a
I<very> sophisticated type system. Reminiscent of C++ templates,
actually. And while I love C++ templates for academic challenges, they
don't seem quite what Perl is looking for.

This is pattern matching more than it is type comparison. And Perl's
all about pattern matching. I'm just wondering whether it needs I<two>
pattern archetectures.

If things like this are going to be happening (and thanks, Damian, for
demonstrating exactly how powerful this can be), I'd like types to at
least look more like regexes. Implementation difficulty has not been a
concern of Perl The Language's since the beginning.

Since I'm usually not good at these philosophical posts, I'll show some
examples.

multi sub infix:equals ( (Any) $x, ($1) $y ) { $x eq $y }

multi sub infix:equals ( Any $x, Any $y ) { undef }

Might be a way of nicely comparing apples to apples and rejecting
everything else as unequal. I don't really like the delcaration order
dependency, though. Perhaps:

multi sub infix:equals ({
$x := (Any), $y := ($1) { $x eq $y }
| $x := (Any), $y := (Any) { undef }
});

Which is I<really> starting to look like rules (which is what I wanted).

My summarizing thesis is: if the type system is going to be this rich a
sublanguage, why not make it a formal one? [1]

Luke

[1] Keep in mind that I'm fine with it I<not> becoming a sublanguage,
and relying on user-written code to do this kind of fancy matching. I'm
just saying if we want to make it a rich and powerful pattern matching
system, it ought to look like patterns.

Luke Palmer

unread,
Feb 16, 2004, 3:06:05 AM2/16/04
to Uri Guttman, Damian Conway, Perl 6 Language
Uri Guttman writes:
> >>>>> "LP" == Luke Palmer <fibo...@babylonia.flatirons.org> writes:
>
> LP> Uri Guttman writes:
> >> because that would be the default comparison and the extracted key value
> >> would be stringified unless some other marker is used. most sorts are on
> >> strings so this would be a useful huffman and removal of a redundancy.
>
> LP> While I like where most of this is going, I beg to differ on this point.
> LP> I end up sorting numbers more often then strings, except when I'm
> LP> sorting the keys of a hash. I'm always annoyed in my one liners when I
> LP> have to write out:
>
> LP> sort { $a <=> $b } ...
>
> LP> And I'd be thrilled if it were as easy as:
>
> LP> sort { +$_ } ...
>
> oh, i don't want to see <=> anywhere in sort code. i don't mind + or int
> as markers for that. you seem to actually be agreeing with me that
> strings are the default key type and + is needed for a numeric sort. but
> do you do integer or float sorts? and as i said, i don't like the
> asymmetry of int vs +.

Usually integer. But specifying integer is usually an optimization
guideline, whereas there is a much greater semantic difference between
sorting strings and floats. C<int> deserves to be longer as just an
optimization guideline.

And yes, I agree that string should be default.

Luke

Rod Adams

unread,
Feb 16, 2004, 2:46:24 PM2/16/04
to Perl 6 Language
Damian Conway wrote:

> Uri persisted:


> > but int is needed otherwise? int is more commonly a sort key than
> float.
> > it just feels asymetrical with one having a symbol and the other a
> named
> > operator.
>
> Sorry, but that's the way the language works. The more general and usual
> conversion (to number, not to integer) has the shorter name.


For the record, I do a lot of statistical work. On the sorts where I
care about speed, I'm using floats far more often than ints. Uri's usage
obviously varies from mine. Let's not hack up the language to make sort
more efficient for some arguable benefit.

I would entertain, however, C<float> and C<string> comparators (and
functions in core) that pretty much (if not precisely) alias C<+> and
C<~>, forcing said type.


> > DC> If you don't explicitly cast either way, C<sort> just DWIMs by
> > DC> looking at the actual type of the keys returned by the extractor.
> > DC> If any of them isn't a number, it defaults to C<cmp>.
> >
> > that doesn't work for me. the GRT can't scan all keys before it decides
> > on what comparison operator is needed. the GRT needs to merge keys as
> > they are extracted and it needs to do the correct conversion to a byte
> > string then and not later. you can't DWIM this away if you want the
> > optimization.
>
> EXACTLY!!! So, if you want the GRT to kick in, you have to make sure the
> return type of your block is appropriate. If you don't give the
> compiler that
> hint, you don't get the optimized sorting.


When fed a number that it doesn't trust to be an int or a float,
couldn't the GRT store the number with C<pack 'Nd', $_, $_>? (ok,
there's some prepping for sign and endian, but you get the point) Seems
like a not-terrible way to handle the issue.

> Here's another example (all praise be to Rod!):
>
> @teams = sort [
> # 1 parameter so it's a key extractor...
> {+ $^team->{WinPct} },
> # 2 parameters so it's a comparator...
> { ($^a->{Conference} eq $^b->{Conference}
> ? ($^b->{ConfWinPct} <=> $^a->{ConfWinPct} ||
> $^b->{ConfWon} <=> $^a->{ConfWon} ||
> $^a->{ConfLost} <=> $^b->{ConfLost})
> : ($^a->{Division} eq $^b->{Division}
> ? ($^b->{DivWinPct} <=> $^a->{DivWinPct} ||
> $^b->{DivWon} <=> $^a->{DivWon} ||
> $^a->{DivLost} <=> $^b->{DivLost})
> : 0
> )
> },
> # 1 parameter each so all three are key extractors...
> {+ $^team->{Won} } is descending,
> {+ $^team->{Lost} },
> {+ $^team->{GamesPlayed} } is descending,
> ] @teams;
>

Now that's just spiffy. Except in moving from my P5 version to your P6
version, you have to s/?/??/ and s/:/::/, IIRC.

> > DC> But you *can't* apply C<is descending> to a Code reference.
> > then how did you apply 'is insensitive'?
>
> I applied it to the *block* (i.e. a closure definition).
> That's not the same thing as a Code reference.
>
> > what i am saying is i think that you need to go back to the drawing
> > board to find a clean syntax to mark those flags.
>
> No, I think we have it...just put the appropriate trait on the extractor
> *block* when you define it.


I really don't see the problem with

@out = sort {lc $_}, @in;

It's short, simple, unambiguous, IMO very clean. But if people want to
take a tool that we're creating anyways elsewhere in the language
(traits) to provide another way to do it, that's fine too.

> > i just don't like the reverse args concept at all. it is not
> > semantically useful to sorting. sorts care about ascending and
> > descending, not a vs b in args.
>
> The problem is that sometimes comparator logic is suffiently complex that
> a single comparator needs to have a "bit each way", as in Rod's
> football team comparator above.


What my football example was meant to show is that no matter how much we
abuse poor C<sort> in the name of progress, we need to have a way to
drop it all and go back to how we did it in P5, for those truly
pathological cases where nothing else works. If people don't trust this,
I'll come up with something worse.

As a side note, the reason I couldn't find the original sort in question
is that I later scraped it in favor of a more robust 100 line sub to
figure out who ranked where.


But in more common case, how much it break things to have

type Comparator ::= Code(Any, Any) returns Int;

become

type Comparator ::= Code(Any, Any) returns Int

| '!' Code(Any, Any) returns Int;

where the '!' sets the reverse/descending trait?

So we could get things like:

@out = sort {!~ $_} @in;
@out = sort {!+ -M} @in;
@out = sort {! &complex} @in;


Overall, Damian's proposal looks very good.
Yeah Damian!

-- Rod Adams

PS -- Only pure and utter insanity would lead someone to not make string
the default sort comparator, and I don't think I've heard anyone here
mention they are in favor of doing anything else.

Larry Wall

unread,
Feb 19, 2004, 7:38:40 PM2/19/04
to Perl 6 Language
On Mon, Feb 16, 2004 at 01:04:01AM -0700, Luke Palmer wrote:
: This is pattern matching more than it is type comparison. And Perl's

: all about pattern matching. I'm just wondering whether it needs I<two>
: pattern archetectures.

I suspect it does, at least from the viewpoint of mere mortals. The
regex engine is built around the notion of ordered rules, whereas the
multi dispatch engine is designed to run rules in order of "distance".
I think we'll have done good enough if the regex engine can be used at
runtime against a list of actual argument types. The regex engine is
allowed to have deep, powerful nooks, after all. People are trained
not to expect to understand patterns on first glance. But I a strong
gut feeling that the dispatch engine needs to remain transparent to
casual readers--or there will never be any casual readers of Perl 6.

Of course, someone will doubtless attempt this unification, since a
single "use" can turn Perl into another language. It's just not a
language I would have bothered to learn 25 years ago...

Larry

Damian Conway

unread,
Feb 19, 2004, 8:29:03 PM2/19/04
to Perl 6 Language
The design team discussed "The Sort Problem" during yesterday's
teleconference. Here is Larry's decision: final, definitive, and unalterable
(well...for this week at least ;-)

-----cut---------cut---------cut---------cut---------cut---------cut----

C<sort> in Perl6 is a global multisub:

multi sub *sort(Criterion @by: *@data) {...}
multi sub *sort(Criterion $by: *@data) {...}


multi sub *sort( : *@data) {...}

where:

type KeyExtractor ::= Code(Any) returns Any;

type Comparator ::= Code(Any, Any) returns Int;

type Criterion ::= KeyExtractor
| Comparator
| Pair(KeyExtractor, Comparator)
;

That means that we can call C<sort> without a block (to sort stringifically
ascending with C<cmp>):

# Stringifically ascending...
@sorted = sort @unsorted;

or with a single two-argument block/closure (to sort by whatever the specified
comparator is):

# Numerically ascending...
@sorted = sort {$^a <=> $^b} @unsorted;

# Namewise stringifically descending case-insensitive...
@sorted = sort {lc $^b.name cmp lc $^a.name}
@unsorted;
# or...
@sorted = sort {$^b.name cmp $^a.name} is insensitive
@unsorted;
# or...
@sorted = sort {$^a.name cmp $^b.name} is descending is insensitive
@unsorted;

# Modtimewise numerically ascending...
@sorted = sort {-M $^a <=> -M $^b} @unsorted;

# Fuzz-ifically...
sub fuzzy_cmp($x, $y) returns Int;
@sorted = sort &fuzzy_cmp, @unsorted;


or with a single one-argument block/closure (to sort according whatever the
specified key extractor returns):

# Numerically ascending...
@sorted = sort {+ $^elem} @unsorted;
@sorted = sort {+ $_} @unsorted;

# Namewise stringifically descending case-insensitive...
@sorted = sort {~ $^elem.name} is descending is insensitive @unsorted;
@sorted = sort {lc $^elem.name} is descending @unsorted;
@sorted = sort {lc .name} is descending @unsorted;

# Modtimewise numerically ascending...


@sorted = sort {-M} @unsorted;

# Key-ifically...
sub get_key($elem) {...}
@sorted = sort &get_key, @unsorted;

or with a single extractor/comparator pair (to sort according to the extracted
key, using the specified comparator):

# Modtimewise stringifically descending...
@sorted = sort {-M}=>{$^b cmp $^a} @unsorted;

# Namewise fuzz-ifically...
@sorted = sort {.name}=>&fuzzy_cmp @unsorted;

or with an array of comparators and/or key extractors and/or
extractor-comparator pairs (to sort according to a cascading list of criteria):

# Numerically ascending
# or else namewise stringifically descending case-insensitive
# or else modtimewise numerically ascending
# or else namewise fuzz-ifically
# or else fuzz-ifically...
@sorted = sort [ {+ $^elem},
{$^b.name cmp $^a.name} is insensitive,
{-M},
{.name}=>&fuzzy_cmp,
&fuzzy_cmp,
],
@unsorted;

If a key-extractor block returns number, then C<< <=> >> is used to compare
those keys. Otherwise C<cmp> is used. In either case, the keys extracted by
the block are cached within the call to C<sort>, to optimize subsequent
comparisons against the same element. That is, a key-extractor block is only
ever called once for each element being sorted.

If a key-extractor/comparator pair is specified, the key-extractor is the key
of the pair and the comparator the value. The extractor is used to retreive
keys, which are then passed to the comparator.

The C<is descending> and C<is insensitive> traits on a key extractor or a
comparator are detected within the call to C<sort> (or possibly by the
compiler) and used to modify the case-sensitivity and "direction" of any
comparison operators used for the corresponding key or in the corresponding
comparator.


Note that ambiguous cases like:

@sorted = sort {-M}, {-M}, {-M};
@sorted = sort {$^a <=> $^b}, {$^a <=> $^b}, {$^a <=> $^b};
@sorted = sort [...], [...], [...];
# etc.

will be dispatched according to the normal multiple dispatch semantics
(which will mean that they will mean):

@sorted = sort {-M} <== {-M}, {-M};
@sorted = sort {$^a <=> $^b} <== {$^a <=> $^b}, {$^a <=> $^b};
@sorted = sort [...] <== [...], [...];
# etc.

and so one would need to write:

@sorted = sort <== {-M}, {-M}, {-M};
@sorted = sort <== {$^a <=> $^b}, {$^a <=> $^b}, {$^a <=> $^b};
@sorted = sort <== [...], [...], [...];
# etc.

to get C<cmp> comparison on all the arguments.

-----cut---------cut---------cut---------cut---------cut---------cut----


Thanks to everyone who contributed to this discussion (especially Uri). As you
see, the result is sort facility that is simultaneously much more powerful,
much easier-to-use in the simple cases, has the potential to produce more
maintainable code, and which has much better scope for internal optimization.

"Once again the Iron Designer rises to the supreme challenge of the
Mailinglist Stadium and expresses the true spirit of Perl 6!!!"

Zenzai!

Damian

Dave Whipp

unread,
Feb 19, 2004, 9:39:14 PM2/19/04
to perl6-l...@perl.org
"Damian Conway" <dam...@conway.org> wrote in message
news:403562DF...@conway.org...

> type KeyExtractor ::= Code(Any) returns Any;

> # Modtimewise numerically ascending...


> @sorted = sort {-M} @unsorted;


One thing I've been trying to figure out reading this: what is the signature
of prefix:-M ? i.e. how does it tell the outer block that it (the
outer-block) needs a parameter? There seems to be some transitive magic
going on here. Could similar magic be used to have infix:<=> require two
higher-order variables (e.g. could "sort { <=> } @unsorted" be made to
work?)


Dave.


Luke Palmer

unread,
Feb 19, 2004, 9:48:04 PM2/19/04
to Dave Whipp, perl6-l...@perl.org
Dave Whipp writes:
> "Damian Conway" <dam...@conway.org> wrote in message
> news:403562DF...@conway.org...
> > type KeyExtractor ::= Code(Any) returns Any;
>
> > # Modtimewise numerically ascending...
> > @sorted = sort {-M} @unsorted;
>
>
> One thing I've been trying to figure out reading this: what is the signature
> of prefix:-M ?

Presumably something like:

sub prefix:-M (?$file = $CALLER::_) {...}

> i.e. how does it tell the outer block that it (the outer-block) needs
> a parameter?

Because it operates on $_. It tells it the same way:

map { .name } @objects

Does. Of course, this is going to be tough on the compiler, who will
have to take the C<= $CALLER::_> part into account.

> There seems to be some transitive magic going on here. Could similar
> magic be used to have infix:<=> require two higher-order variables
> (e.g. could "sort { <=> } @unsorted" be made to work?)

No. Although you could do such a thing with:

sort &infix:<=>, @unsorted;

Luke

Uri Guttman

unread,
Feb 19, 2004, 9:48:39 PM2/19/04
to Damian Conway, Perl 6 Language
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:

DC> "Once again the Iron Designer rises to the supreme challenge of
DC> the Mailinglist Stadium and expresses the true spirit of Perl
DC> 6!!!"

and the challenge for next week is slicing squid with noodles!
(or cutting down the mightiest tree in the forest with a herring)

good job all.

Damian Conway

unread,
Feb 19, 2004, 10:00:05 PM2/19/04
to perl6-l...@perl.org
Dave Whipp wondered:

>> @sorted = sort {-M} @unsorted;
>
> One thing I've been trying to figure out reading this: what is the signature
> of prefix:-M ? i.e. how does it tell the outer block that it (the
> outer-block) needs a parameter?

It doesn't. As A6 explained:

http://dev.perl.org/perl6/apocalypse/A06.html#Bare_subs

any block that doesn't have placeholder-specified parameters but which refers
(even implicitly) to $_ will automatically have the signature of ($_).

That's why:

@odd = grep { $_ % 2 } @nums;

will still work in Perl 6.

Since a bare C<-M> implicitly refers to $_, the surrounding block
automagically gets a one-parameter signature and hence is (correctly!)
interpreted as a key extractor.

Don't you just love it when a plan^H^H^H^Hdesign comes together? ;-)


> There seems to be some transitive magic going on here.

There is. Kinda. Just not the type of magic you thought.


> Could similar magic be used to have infix:<=> require two
> higher-order variables (e.g. could "sort { <=> } @unsorted" be made to
> work?)

No. But this will work:

sort &infix:<=> @unsorted

Damian

Uri Guttman

unread,
Feb 19, 2004, 10:16:12 PM2/19/04
to Damian Conway, Perl 6 Language
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:

DC> # Stringifically ascending...


DC> @sorted = sort @unsorted;

DC> or with a single two-argument block/closure (to sort by whatever the
DC> specified comparator is):

DC> # Numerically ascending...
DC> @sorted = sort {$^a <=> $^b} @unsorted;

so because that has 2 placeholders, it is will match this signature:

type Comparator ::= Code(Any, Any) returns Int;

i have to remember that placeholders are really implied args to a code
block and not just in the expression

DC> # Namewise stringifically descending case-insensitive...
DC> @sorted = sort {lc $^b.name cmp lc $^a.name}
DC> @unsorted;
DC> # or...
DC> @sorted = sort {$^b.name cmp $^a.name} is insensitive
DC> @unsorted;
DC> # or...
DC> @sorted = sort {$^a.name cmp $^b.name} is descending is insensitive
DC> @unsorted;

TIMTOWTDI lives on!

DC> # Modtimewise numerically ascending...
DC> @sorted = sort {-M $^a <=> -M $^b} @unsorted;

DC> # Fuzz-ifically...
DC> sub fuzzy_cmp($x, $y) returns Int;
DC> @sorted = sort &fuzzy_cmp, @unsorted;

ok, so that is recognizes as a compare sub due to the 2 arg sig. so does
the sub must be defined/declared before the sort code is compiled?

DC> or with a single one-argument block/closure (to sort according
DC> whatever the specified key extractor returns):

DC> # Numerically ascending...
DC> @sorted = sort {+ $^elem} @unsorted;
DC> @sorted = sort {+ $_} @unsorted;

is $^elem special? or just a regular place holder? i see $_ will be set
to each record as we discussed.

DC> # Namewise stringifically descending case-insensitive...
DC> @sorted = sort {~ $^elem.name} is descending is insensitive @unsorted;
DC> @sorted = sort {lc $^elem.name} is descending @unsorted;
DC> @sorted = sort {lc .name} is descending @unsorted;

just getting my p6 chops back. .name is really $_.name so that makes
sense. and $^elem is just a named placeholder for $_ as before?

DC> # Key-ifically...
DC> sub get_key($elem) {...}
DC> @sorted = sort &get_key, @unsorted;

and that is parsed as an extracter code call due to the single arg
sig. again, it appears that it has to be seen before the sort code for
that to work.

DC> or with a single extractor/comparator pair (to sort according to the
DC> extracted key, using the specified comparator):

DC> # Modtimewise stringifically descending...
DC> @sorted = sort {-M}=>{$^b cmp $^a} @unsorted;

so that is a single pair of extractor/comparator. but there is no comma
before @unsorted. is that correct? see below for why i ask that.

DC> # Namewise fuzz-ifically...
DC> @sorted = sort {.name}=>&fuzzy_cmp @unsorted;

i first parsed that as being wrong and the {} should wrap the whole
thing. so that is a pair again of extractor/comparator.

DC> or with an array of comparators and/or key extractors and/or
DC> extractor-comparator pairs (to sort according to a cascading list of
DC> criteria):

DC> # Numerically ascending
DC> # or else namewise stringifically descending case-insensitive
DC> # or else modtimewise numerically ascending
DC> # or else namewise fuzz-ifically
DC> # or else fuzz-ifically...
DC> @sorted = sort [ {+ $^elem},
DC> {$^b.name cmp $^a.name} is insensitive,
DC> {-M},
DC> {.name}=>&fuzzy_cmp,
DC> &fuzzy_cmp,

i see the need for commas in here as it is a list of criteria.

DC> ],

but what about that comma? no other example seems to have one before the
@unsorted stuff.

DC> @unsorted;

DC> If a key-extractor block returns number, then C<< <=> >> is used to
DC> compare those keys. Otherwise C<cmp> is used. In either case, the keys
DC> extracted by the block are cached within the call to C<sort>, to
DC> optimize subsequent comparisons against the same element. That is, a
DC> key-extractor block is only ever called once for each element being
DC> sorted.

where does the int optimizer come in? just as you had it before in the
extractor code? that will need to be accessible to the optimizer if the
GRT is to work correctly.

i like that the key caching is defined here. we can implement it in
several different ways depending on optimization hints and such. we
could support the ST, GRT and orchish and select the best one for each
sort. or we could have one basic sort and load the others as pragmas or
modules.

DC> The C<is descending> and C<is insensitive> traits on a key extractor
DC> or a comparator are detected within the call to C<sort> (or possibly
DC> by the compiler) and used to modify the case-sensitivity and
DC> "direction" of any comparison operators used for the corresponding key
DC> or in the corresponding comparator.

or by reversing the order of the args passed to the comparator code. i
see you want to keep the alpha order of place holders as well as
descending. i just don't see the need for both but i can live with it.

so are those traits are only allowed/meaningful on comparison blocks?
or will an extraction block take them (extracting in insensitive mode
makes as much sense as comparing in that mode, in fact it would be
faster). so that is a little side issue. the trait insensitive is on the
comparator block but is best used by the extractor (which may not be
provided). the internals will need to be able to handle that. i assume
they will be able to see the trait on the comparator block and use it
for extraction. same with descending which is needed for extraction in
the GRT but not for orchish or ST. you have examples which show the
traits on either the extractor or comparator code blocks. that implies
that the guts can get those flags from either and use them as needed.

DC> Note that ambiguous cases like:

DC> @sorted = sort {-M}, {-M}, {-M};

DC> will be dispatched according to the normal multiple dispatch semantics
DC> (which will mean that they will mean):

DC> @sorted = sort {-M} <== {-M}, {-M};

DC> and so one would need to write:

DC> @sorted = sort <== {-M}, {-M}, {-M};

that clears up that one for me.

this is very good overall (notwithstanding my few nits and
questions). it will satisfy all sorts of sort users, even those who are
out of sorts.

thanx,

Luke Palmer

unread,
Feb 19, 2004, 11:03:39 PM2/19/04
to Uri Guttman, Damian Conway, Perl 6 Language
Uri Guttman writes:
> >>>>> "DC" == Damian Conway <dam...@conway.org> writes:
> DC> # Modtimewise numerically ascending...
> DC> @sorted = sort {-M $^a <=> -M $^b} @unsorted;
>
> DC> # Fuzz-ifically...
> DC> sub fuzzy_cmp($x, $y) returns Int;
> DC> @sorted = sort &fuzzy_cmp, @unsorted;
>
> ok, so that is recognizes as a compare sub due to the 2 arg sig. so does
> the sub must be defined/declared before the sort code is compiled?

Nope. C<sort> is declared as a multimethod. This works, too:

$code = sub ($a, $b) { -M $a <=> -M $b };
@sorted = sort $code, @unsorted;

> DC> or with a single one-argument block/closure (to sort according
> DC> whatever the specified key extractor returns):
>
> DC> # Numerically ascending...
> DC> @sorted = sort {+ $^elem} @unsorted;
> DC> @sorted = sort {+ $_} @unsorted;
>
> is $^elem special? or just a regular place holder? i see $_ will be set
> to each record as we discussed.

Those two statements are exactly the same in every way. Well, except
how they're writted. $^elem is indeed a regular placeholder. $_
becomes an implicit parameter when it is referred to, in the absence of
placeholders or another type of signature.

> DC> # Key-ifically...
> DC> sub get_key($elem) {...}
> DC> @sorted = sort &get_key, @unsorted;
>
> and that is parsed as an extracter code call due to the single arg
> sig. again, it appears that it has to be seen before the sort code for
> that to work.

Nope. Runtime dispatch as before.

> DC> or with a single extractor/comparator pair (to sort according to the
> DC> extracted key, using the specified comparator):
>
> DC> # Modtimewise stringifically descending...
> DC> @sorted = sort {-M}=>{$^b cmp $^a} @unsorted;
>
> so that is a single pair of extractor/comparator. but there is no comma
> before @unsorted. is that correct? see below for why i ask that.

Yes. Commas may be ommitted on either side of a block when used as an
argument. I would argue that they only be omitted on the right side, so
that this is unambiguous:

if some_function { ... }
{ ... }

Which might be parsed as either:

if (some_function { ... }) { ... }

Or:

if (some_function()) {...}
{...} # Bare block

> DC> or with an array of comparators and/or key extractors and/or
> DC> extractor-comparator pairs (to sort according to a cascading list of
> DC> criteria):
>
> DC> # Numerically ascending
> DC> # or else namewise stringifically descending case-insensitive
> DC> # or else modtimewise numerically ascending
> DC> # or else namewise fuzz-ifically
> DC> # or else fuzz-ifically...
> DC> @sorted = sort [ {+ $^elem},
> DC> {$^b.name cmp $^a.name} is insensitive,
> DC> {-M},
> DC> {.name}=>&fuzzy_cmp,
> DC> &fuzzy_cmp,
>
> i see the need for commas in here as it is a list of criteria.
>
> DC> ],
>
> but what about that comma? no other example seems to have one before the
> @unsorted stuff.

It's not a closure, so you need a comma.

> DC> @unsorted;
>
> DC> If a key-extractor block returns number, then C<< <=> >> is used to
> DC> compare those keys. Otherwise C<cmp> is used. In either case, the keys
> DC> extracted by the block are cached within the call to C<sort>, to
> DC> optimize subsequent comparisons against the same element. That is, a
> DC> key-extractor block is only ever called once for each element being
> DC> sorted.
>
> where does the int optimizer come in? just as you had it before in the
> extractor code? that will need to be accessible to the optimizer if the
> GRT is to work correctly.

If the block provably returns an int, C<sort> might be able to optimize
for ints. Several ways to provably return an int:

my $extractor = an int sub($arg) { $arg.num }
@sorted = sort $extractor, @unsorted;

Or with a smarter compiler:

@sorted = sort { int .num } @unsorted;

Or C<sort> might even check whether all the return values are ints and
then optimize that way. No guarantees: it's not a language-level issue.

> i like that the key caching is defined here.

Yeah. This is a language-level issue, as the blocks might have
side-effects.

> DC> Note that ambiguous cases like:
>
> DC> @sorted = sort {-M}, {-M}, {-M};
>
> DC> will be dispatched according to the normal multiple dispatch semantics
> DC> (which will mean that they will mean):
>
> DC> @sorted = sort {-M} <== {-M}, {-M};
>
> DC> and so one would need to write:
>
> DC> @sorted = sort <== {-M}, {-M}, {-M};
>
> that clears up that one for me.
>
> this is very good overall (notwithstanding my few nits and
> questions). it will satisfy all sorts of sort users, even those who are
> out of sorts.

Agreed. I'm very fond of it..

Luke

Luke Palmer

unread,
Feb 19, 2004, 11:08:34 PM2/19/04
to Uri Guttman, Damian Conway, Perl 6 Language
Luke Palmer writes:
> Yes. Commas may be ommitted on either side of a block when used as an
> argument. I would argue that they only be omitted on the right side, so
> that this is unambiguous:
>
> if some_function { ... }
> { ... }
>
> Which might be parsed as either:
>
> if (some_function { ... }) { ... }
>
> Or:
>
> if (some_function()) {...}
> {...} # Bare block

Silly me. That doesn't solve anything. I don't know why I thought it
did. I still think that this looks weird:

foo $bar { ... } $baz;

But that's just preference.

Luke

Joe Gottman

unread,
Feb 19, 2004, 11:14:49 PM2/19/04
to Perl 6 Language

----- Original Message -----
From: "Damian Conway" <dam...@conway.org>
To: "Perl 6 Language" <perl6-l...@perl.org>
Sent: Thursday, February 19, 2004 8:29 PM
Subject: [perl] The Sort Problem: a definitive ruling


> C<sort> in Perl6 is a global multisub:
>
> multi sub *sort(Criterion @by: *@data) {...}
> multi sub *sort(Criterion $by: *@data) {...}
> multi sub *sort( : *@data) {...}
>
> where:
>
> type KeyExtractor ::= Code(Any) returns Any;
>
> type Comparator ::= Code(Any, Any) returns Int;
>
> type Criterion ::= KeyExtractor
> | Comparator
> | Pair(KeyExtractor, Comparator)
> ;

snip

> If a key-extractor block returns number, then C<< <=> >> is used to
compare
> those keys. Otherwise C<cmp> is used. In either case, the keys extracted
by
> the block are cached within the call to C<sort>, to optimize subsequent
> comparisons against the same element. That is, a key-extractor block is
only
> ever called once for each element being sorted.
>
>

How do you decide whether a key-extractor block returns number? Do you
look at the signature, or do you simply evaluate the result of the
key-extractor for each element in the unsorted list? For example, what is
the result of the following code?

sort {$_.key} (1=> 'a', 10 => 'b', 2 =>'c');

There is nothing in the signature of the key-extractor to suggest that
all the keys are numbers, but as it turns out they all are. Will the sort
end up being numerical or alphabetic?


Joe Gottman


Damian Conway

unread,
Feb 19, 2004, 10:47:55 PM2/19/04
to Perl 6 Language
Uri checked:


> DC> @sorted = sort {$^a <=> $^b} @unsorted;
>
> so because that has 2 placeholders, it is will match this signature:
>
> type Comparator ::= Code(Any, Any) returns Int;

Correct.


> i have to remember that placeholders are really implied args to a code
> block and not just in the expression

Indeed.


> DC> sub fuzzy_cmp($x, $y) returns Int;
> DC> @sorted = sort &fuzzy_cmp, @unsorted;
>
> ok, so that is recognizes as a compare sub due to the 2 arg sig. so does
> the sub must be defined/declared before the sort code is compiled?

Yes. And, yes, declaration is sufficient.


> DC> @sorted = sort {+ $^elem} @unsorted;
> DC> @sorted = sort {+ $_} @unsorted;
>
> is $^elem special? or just a regular place holder?

Regular.


> just getting my p6 chops back. .name is really $_.name so that makes
> sense. and $^elem is just a named placeholder for $_ as before?

Yes and yes.


> DC> @sorted = sort &get_key, @unsorted;
>
> and that is parsed as an extracter code call due to the single arg
> sig.

Yes.


> again, it appears that it has to be seen before the sort code for
> that to work.

Correct.


> DC> or with a single extractor/comparator pair (to sort according to the
> DC> extracted key, using the specified comparator):
>
> DC> # Modtimewise stringifically descending...
> DC> @sorted = sort {-M}=>{$^b cmp $^a} @unsorted;
>
> so that is a single pair of extractor/comparator. but there is no comma
> before @unsorted.

Typo. I'm pretty sure it would actually need a comma there.


> DC> @sorted = sort {.name}=>&fuzzy_cmp @unsorted;

Comma required there too. :-(

> DC> ],
>
> but what about that comma?

It's required.


> no other example seems to have one before the @unsorted stuff.

The comma exception only applies to simple blocks. I messed up those two
examples. :-(


> where does the int optimizer come in?

When the key extractor is known to return an Int. Which would occur either
when it's explicitly declared to do that, or when the compiler can intuit a
block's return type from the type of value returned by the block (i.e. if the
block always returns the result of a call to C<int>).


> just as you had it before in the
> extractor code?

Yup.

> so are those traits are only allowed/meaningful on comparison blocks?
> or will an extraction block take them

Both. That's why I wrote:

The C<is descending> and C<is insensitive> traits

on a key extractor or a comparator..."
^^^^^^^^^^^^^

> you have examples which show the
> traits on either the extractor or comparator code blocks. that implies
> that the guts can get those flags from either and use them as needed.

Yep. Inside the body of C<sort> you'd access them as:

$by.trait{descending}
$by.trait{insensitive}

(unless Larry's changed the trait accessor syntax since last I looked).


Damian

Uri Guttman

unread,
Feb 19, 2004, 11:23:35 PM2/19/04
to Damian Conway, perl6-l...@perl.org
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:

DC> No. But this will work:

DC> sort &infix:<=> @unsorted

my brane hertz!!

so that declares (creates?) an infix op as a code block? and since <=>
is known to take 2 args it is parsed (or multidispatched) as a
comparator block for sort?

amazing how you and luke both came up with the exact same answer. p6
syntax like that is killing me slowly!

Uri Guttman

unread,
Feb 19, 2004, 11:16:20 PM2/19/04
to Luke Palmer, Damian Conway, Perl 6 Language
>>>>> "LP" == Luke Palmer <fibo...@babylonia.flatirons.org> writes:

LP> Uri Guttman writes:
>> >>>>> "DC" == Damian Conway <dam...@conway.org> writes:
DC> # Modtimewise numerically ascending...
DC> @sorted = sort {-M $^a <=> -M $^b} @unsorted;
>>
DC> # Fuzz-ifically...
DC> sub fuzzy_cmp($x, $y) returns Int;
DC> @sorted = sort &fuzzy_cmp, @unsorted;
>>
>> ok, so that is recognizes as a compare sub due to the 2 arg sig. so does
>> the sub must be defined/declared before the sort code is compiled?

LP> Nope. C<sort> is declared as a multimethod. This works, too:

LP> $code = sub ($a, $b) { -M $a <=> -M $b };
LP> @sorted = sort $code, @unsorted;

that didn't answer my question. your code is still seen before the sort
call. IMO, the signature is needed to be seen beforehand so sort can
decide if the code (in whatever form) is a 1 arg extractor or a 2 arg
comparator.

DC> # Key-ifically...
DC> sub get_key($elem) {...}
DC> @sorted = sort &get_key, @unsorted;
>>
>> and that is parsed as an extracter code call due to the single arg
>> sig. again, it appears that it has to be seen before the sort code for
>> that to work.

LP> Nope. Runtime dispatch as before.

oh, i get it now. braino on my part. it is multimethod dispatch at run
time using the parse tree from damian's post.


>>
DC> # Modtimewise stringifically descending...
DC> @sorted = sort {-M}=>{$^b cmp $^a} @unsorted;
>>
>> so that is a single pair of extractor/comparator. but there is no comma
>> before @unsorted. is that correct? see below for why i ask that.

LP> Yes. Commas may be ommitted on either side of a block when used as an
LP> argument. I would argue that they only be omitted on the right side, so

DC> ],
>>
>> but what about that comma? no other example seems to have one before the
>> @unsorted stuff.

LP> It's not a closure, so you need a comma.

ok, that is more p6 syntax that will need to be relearned often. be nice
when we have a real lang to play with and learn from. :)

DC> @unsorted;
>>
DC> If a key-extractor block returns number, then C<< <=> >> is used to
DC> compare those keys. Otherwise C<cmp> is used. In either case, the keys
DC> extracted by the block are cached within the call to C<sort>, to
DC> optimize subsequent comparisons against the same element. That is, a
DC> key-extractor block is only ever called once for each element being
DC> sorted.
>>
>> where does the int optimizer come in? just as you had it before in the
>> extractor code? that will need to be accessible to the optimizer if the
>> GRT is to work correctly.

LP> If the block provably returns an int, C<sort> might be able to optimize
LP> for ints. Several ways to provably return an int:

LP> my $extractor = an int sub($arg) { $arg.num }
LP> @sorted = sort $extractor, @unsorted;

LP> Or with a smarter compiler:

LP> @sorted = sort { int .num } @unsorted;

LP> Or C<sort> might even check whether all the return values are ints and
LP> then optimize that way. No guarantees: it's not a language-level issue.

oh, i understand that optimization is always optional and implementation
specific. but i just wanted to make sure the proper hints can be
provided to the sort optimizer. i didn't notice any int examples in
damian's opus whereas he covered them earlier in the thread.

>> i like that the key caching is defined here.

LP> Yeah. This is a language-level issue, as the blocks might have
LP> side-effects.

good point there. i usually think of them as an optimization issue. so
we should document that reason for them as well as being faster.

>> this is very good overall (notwithstanding my few nits and
>> questions). it will satisfy all sorts of sort users, even those who
>> are out of sorts.

LP> Agreed. I'm very fond of it..

good to hear. now let's dump some new fun problem on damian's strong
back and larry's nimble fingers. :)

and i should try to resurrect my Sort::GRT module as i have been
thinking about it. it will be a good thing to work from for the p6
implementation.

Uri Guttman

unread,
Feb 19, 2004, 11:35:40 PM2/19/04
to Joe Gottman, Perl 6 Language
>>>>> "JG" == Joe Gottman <jgot...@carolina.rr.com> writes:


JG> How do you decide whether a key-extractor block returns number? Do you
JG> look at the signature, or do you simply evaluate the result of the
JG> key-extractor for each element in the unsorted list? For example, what is
JG> the result of the following code?

JG> sort {$_.key} (1=> 'a', 10 => 'b', 2 =>'c');

JG> There is nothing in the signature of the key-extractor to suggest that
JG> all the keys are numbers, but as it turns out they all are. Will the sort
JG> end up being numerical or alphabetic?

my take is that either <=> or cmp (my pref) would be the default
comparator. if you want to force one, you need to use prefix ~ or + or
int.

oh, another reason to make cmp the default is BACKWARDS COMPATIBILITY!
we have support now for the old simple @sorted = sort @unsorted syntax
so the cmp should still be the default.

hey, i am remembering p6 syntax now! but give me a week and i will
forget it again :)

Damian Conway

unread,
Feb 19, 2004, 11:53:29 PM2/19/04
to perl6-l...@perl.org
Uri bemoaned:


> DC> sort &infix:<=> @unsorted
>
> my brane hertz!!
>
> so that declares (creates?) an infix op as a code block?

No. C<< &infix:<=> >> is the name of the binary C<< <=> >> operator.


> amazing how you and luke both came up with the exact same answer.

"Great minds..." etc. ;-)


> p6 syntax like that is killing me slowly!

No, it's gradually making your *stronger*. ;-)


Damian

Larry Wall

unread,
Feb 20, 2004, 12:15:46 AM2/20/04
to Perl 6 Language
On Fri, Feb 20, 2004 at 02:47:55PM +1100, Damian Conway wrote:
: Yep. Inside the body of C<sort> you'd access them as:

:
: $by.trait{descending}
: $by.trait{insensitive}
:
: (unless Larry's changed the trait accessor syntax since last I looked).

Well, if traits are just compile-time properties, and properties
are just mixed-in roles (usually enums), then it's more likely that
something like:

$by.Direction == descending
$by.Case == insensitive

would be the incantation. Or maybe if enums auto-booleanize, then
you could say

$by.Direction::descending
$by.Case::insensitive

And then

$by.descending
$by.insensitive

might be allowed as abbreviations when unambiguous. Or maybe we
require matching:

$by ~~ descending
$by ~~ insensitive

But there is no such thing as a "true" property or "false" property.
There's a Boolean role that can have the value true or false. Traits
are mixed in to declared objects at compile time, and can do weird
things to such objects at mixin time.

Likewise there's no such thing as a "descending" property. There's
a Direction property which defaults to "ascending". And a Case property
that defaults to "sensitive".

To do otherwise is to set ourselves up for objects that can be both
true and false simultaneously. Only junctions should be allowed
to do that...

Larry

Uri Guttman

unread,
Feb 20, 2004, 12:16:58 AM2/20/04
to Damian Conway, perl6-l...@perl.org
>>>>> "DC" == Damian Conway <dam...@conway.org> writes:

DC> Uri bemoaned:

cause you agonize me head!

DC> sort &infix:<=> @unsorted
>> my brane hertz!!
>> so that declares (creates?) an infix op as a code block?

DC> No. C<< &infix:<=> >> is the name of the binary C<< <=> >> operator.

so how is that allowed there without a block? is it because it is the
name is in a style of a sub? that makes sense to me but i want to make
sure i get it.

and now back to my advil addiction. :)

Smylers

unread,
Feb 20, 2004, 4:34:12 PM2/20/04
to perl6-l...@perl.org
Luke Palmer writes:

> Uri Guttman writes:
>
> > >>>>> "DC" == Damian Conway <dam...@conway.org> writes:
>
> > DC> @sorted = sort {-M}=>{$^b cmp $^a} @unsorted;
> >

> > but there is no comma before @unsorted. is that correct?
>

> Yes. Commas may be ommitted on either side of a block when used as an
> argument.

That's what I thought too. But Damian gave exactly the opposite answer
to Uri's question, claiming he'd made a typo and a comma would be
required.

So which is it -- is Luke right in saying that Damian was right in the
first place? Or is Damian right in saying that his example was wrong?

Smylers

Luke Palmer

unread,
Feb 20, 2004, 4:39:42 PM2/20/04
to Smylers, perl6-l...@perl.org

I was wrong in saying that he was right. Those aren't simple blocks, as
Damian said, so you need the comma.

Luke

> Smylers
>

Smylers

unread,
Feb 20, 2004, 4:43:23 PM2/20/04
to perl6-l...@perl.org
Joe Gottman writes:

> sort {$_.key} (1=> 'a', 10 => 'b', 2 =>'c');
>
> There is nothing in the signature of the key-extractor to suggest that
> all the keys are numbers, but as it turns out they all are.

Are they? I'd been presuming that pair keys would always be strings (as
for hashes in Perl 5), and that the C<< => >> operator would
automatically quote a preceding word, stringifying it (as in Perl 5).
So the keys above are strings, albeit ones composed only of digits.

Of course that doesn't actually help with your question, since there are
other data structures of which the same could be asked.

Smylers

Luke Palmer

unread,
Feb 20, 2004, 4:49:31 PM2/20/04
to Smylers, perl6-l...@perl.org

I think you're forgetting what language you're talking about. Those are
numbers. After this statement:

$x = '345';

C<$x> is a number. I should hope it would be treated as one during
multimethod dispatch.

However, I'm not saying this with authority. I'm just extrapolating.
If it's not correct, I'd appreciate that someone who knows correct me.

Luke

Smylers

unread,
Feb 20, 2004, 5:11:09 PM2/20/04
to perl6-l...@perl.org
Luke Palmer writes:

> After this statement:
>
> $x = '345';
>
> C<$x> is a number.

Oh. I'd been assuming that quote marks indicated strings, and that,
while a string containing only digits could obviously be treated as a
number (as in Perl 5), it wouldn't be one without being provoked.

> I should hope it would be treated as one during multimethod dispatch.

What about:

$x = '0345';

Is that a number? And if so which of these is the same as?

$x = 345;
$x = 0345;

What about if the variable contains a line read from user input? As a
programmer I'd expect that to be a string -- and if a user happens to
type only digits then it'd be surprising to find the variable is
considered to be of a different type.

User input comes with a trailing line-break character, which would make
it not a number. But would it suddenly become a number after
C<chomp>ing it? Or if the input stream was auto-chomping?

Smylers

Dan Sugalski

unread,
Feb 20, 2004, 5:12:56 PM2/20/04
to Luke Palmer, Smylers, perl6-l...@perl.org
At 2:49 PM -0700 2/20/04, Luke Palmer wrote:
>After this statement:
>
> $x = '345';
>
>C<$x> is a number.

No, it isn't. It's a string. Or, rather, it's a PerlScalar.

>I should hope it would be treated as one during
>multimethod dispatch.

I should certainly hope *not*. If so, it's a bug. We ought to go add
some tests to the test suite once we expose this bit of the engine.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Luke Palmer

unread,
Feb 20, 2004, 5:26:23 PM2/20/04
to Smylers, perl6-l...@perl.org
Smylers writes:
> Luke Palmer writes:
>
> > After this statement:
> >
> > $x = '345';
> >
> > C<$x> is a number.
>
> Oh. I'd been assuming that quote marks indicated strings, and that,
> while a string containing only digits could obviously be treated as a
> number (as in Perl 5), it wouldn't be one without being provoked.
>
> > I should hope it would be treated as one during multimethod dispatch.
>
> What about:
>
> $x = '0345';
>
> Is that a number? And if so which of these is the same as?
>
> $x = 345;
> $x = 0345;

Well, since those are the same number, I imagine the, um, first?

Don't forget that octal numbers look like 0o345.

> What about if the variable contains a line read from user input? As a
> programmer I'd expect that to be a string -- and if a user happens to
> type only digits then it'd be surprising to find the variable is
> considered to be of a different type.

Yeah, that's a tough question. I'd want it to be a number if it were
only digits, unless I wanted it to be a string. Since numbers and
strings are polymorphic with one another, maybe it's wrong to think that
we can differentiate.

But C<sort> has to know when to use C<< <=> >>.

Maybe you're right. In the presence of multimethod dispatch, it might
be simpler just to tag something with either a num or str marker (I'm
completely neglecting Parrot's implementation for this discussion), and
treat it as its tag. That wouldn't change the behavior of adding two
strings together, or concatenating two numbers, of course.

Luke

Damian Conway

unread,
Feb 20, 2004, 5:48:05 PM2/20/04
to perl6-l...@perl.org
Joe Gottman asked:

> How do you decide whether a key-extractor block returns number? Do you
> look at the signature, or do you simply evaluate the result of the
> key-extractor for each element in the unsorted list? For example, what is
> the result of the following code?
>
> sort {$_.key} (1=> 'a', 10 => 'b', 2 =>'c');
>
> There is nothing in the signature of the key-extractor to suggest that
> all the keys are numbers, but as it turns out they all are. Will the sort
> end up being numerical or alphabetic?

Whilst I'd very much like it to analyse the keys, detect that they're all
numbers, and use C<< <=> >>, I suspect that if C<sort> can't determine that
the return type of the key extractor is numeric, it will have to default to
C<cmp>. Otherwise a case like this:

sort {$_.key} (1=> 'a', 10 => 'b', '2' =>'c');

becomes a little too subtle.

And, after all, it isn't that much extra work to force the issue:

sort {+$_.key} (1=> 'a', 10 => 'b', 2 =>'c');
vs:
sort {~$_.key} (1=> 'a', 10 => 'b', 2 =>'c');

Damian


Damian Conway

unread,
Feb 20, 2004, 5:49:23 PM2/20/04
to perl6-l...@perl.org
Uri wondered:

> DC> No. C<< &infix:<=> >> is the name of the binary C<< <=> >> operator.
>
> so how is that allowed there without a block?

A Code object in a scalar context yields a Code reference.

Damian

Damian Conway

unread,
Feb 20, 2004, 5:52:47 PM2/20/04
to perl6-l...@perl.org
Smylers wrote:

>> sort {$_.key} (1=> 'a', 10 => 'b', 2 =>'c');
>>
>>There is nothing in the signature of the key-extractor to suggest that
>>all the keys are numbers, but as it turns out they all are.
>
> Are they? I'd been presuming that pair keys would always be strings

Nope.

> and that the C<< => >> operator would
> automatically quote a preceding word, stringifying it (as in Perl 5).

Yes. But numbers aren't "words". C<< => >> will continue to autostringify
*identifiers*, as in Perl 6, and those keys aren't identifiers.

Of course, if you used the pairs to populate a hash, the *hash* will convert
the non-stringific pair keys to stringific hash keys (unless the hash is
defined to take non-string keys).

Damian

Damian Conway

unread,
Feb 20, 2004, 5:58:20 PM2/20/04
to perl6-l...@perl.org
Luke wrote:

> I think you're forgetting what language you're talking about. Those are
> numbers. After this statement:
>
> $x = '345';
>
> C<$x> is a number.

I don't think so. C<$x> is, of course, a variable. And what it contains after
that statement will depend on whether the variable is explicitly typed or not.
If C<$x> is explicitly typed, the rvalue will have been converted to that type
(if possible). If C<$x> is not explicitly typed (i.e. implicitly typed to
C<Any>), then it will contain a string, since that's what the rvalue
inherently is.

> I should hope it would be treated as one during
> multimethod dispatch.

I would expect that the multiple dispatch mechanism would allow the C<Str> in
C<$x> to be coerced to a C<Num>, if that's the parameter type it was seeking
to match. And I would expect that the "distance" of that coercion would be 1.


> However, I'm not saying this with authority. I'm just extrapolating.
> If it's not correct, I'd appreciate that someone who knows correct me.

Ditto. ;-)

Damian

Damian Conway

unread,
Feb 20, 2004, 6:01:18 PM2/20/04
to perl6-l...@perl.org
Smylers wrote:

> Oh. I'd been assuming that quote marks indicated strings, and that,
> while a string containing only digits could obviously be treated as a
> number (as in Perl 5), it wouldn't be one without being provoked.

Correct.


> What about:
>
> $x = '0345';
>
> Is that a number?

Nope. A string (unless C<$X> is otherwised typed).


> What about if the variable contains a line read from user input? As a
> programmer I'd expect that to be a string

You'd be right (unless, of course, the variable's storage type forced a
coercion during the assignment).


Damian

Gordon Henriksen

unread,
Feb 21, 2004, 5:36:25 PM2/21/04
to Damian Conway, perl6-l...@perl.org
On Friday, February 20, 2004, at 05:48 , Damian Conway wrote:

> Joe Gottman asked:
>
>> How do you decide whether a key-extractor block returns number? Do
>> you look at the signature, or do you simply evaluate the result of
>> the key-extractor for each element in the unsorted list? For example,
>> what is the result of the following code?
>> sort {$_.key} (1=> 'a', 10 => 'b', 2 =>'c');
>> There is nothing in the signature of the key-extractor to suggest
>> that all the keys are numbers, but as it turns out they all are. Will
>> the sort end up being numerical or alphabetic?
>
> Whilst I'd very much like it to analyse the keys, detect that they're
> all numbers, and use C<< <=> >>

Eek! Please don't even TRY to do that. It'd be creepy if the same call
to sort could swap at runtime between numeric and string comparisons
based upon its input. I would hope that the determination be made at
compile time.

Consider the poor schmuck sorting new objects in preparation for a merge
sort, only to find that his new array isn't sorted the same as his old
array was, even though they came back from the exact same call to
sort... Blech.


But if sort's arguments were specifically typed, i.e.:

my @array of Int;
@array = sort @array;

Does this meet the "key extractor returns number" qualification?

Gordon Henriksen
mali...@mac.com

TSa Thomas Sandla?

unread,
Feb 23, 2004, 4:04:46 PM2/23/04
to
fibo...@babylonia.flatirons.org (Luke Palmer) wrote in message news:<20040216080...@babylonia.flatirons.org>...

> Damian Conway writes:
> > type KeyExtractor ::= Code(Any) returns Any;
> >
> > type Comparator ::= Code(Any, Any) returns Int;
> >
> > type Criterion ::= KeyExtractor
> > | Comparator Pair(KeyExtractor, Comparator)
> > ;
> >
> > type Criteria ::= Criterion
> > | Array of Criterion
> > ;
> >
> > multi sub *sort(Criteria $by = {$^a cmp $^b}: *@data) {...}
>
> This is neat stuff. But in order for this to work, we're relying on a
> I<very> sophisticated type system. Reminiscent of C++ templates,
> actually. And while I love C++ templates for academic challenges, they
> don't seem quite what Perl is looking for.
>
> This is pattern matching more than it is type comparison. And Perl's
> all about pattern matching. I'm just wondering whether it needs I<two>
> pattern archetectures.

Regarding the fact that Perl6 will have a LL(x) top-down parser
it seems quite obvious to build the type system around pattern matching.
For prior art see e.g.

http://www.cduce.org/papers/gentle.ps.gz
"A Gentle Introduction to Semantic Subtyping"

Actually I hope with this post to get some more practical explanations
of this from the list in terms of Perl6. And I want to know if the
subtyping in Perl6 will be semantic or syntactical. How will the type
system be related to classes?

TSa.

0 new messages