Ques: Use of "[ ]" in map

142 views
Skip to first unread message

Steven Lee

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

I am new to Perl and would like to pose the following question
to the Perl gurus at large.

I have an array with the following data (fields if you like) for
each element - "host uid login_name full_name" (full name may be
none, one or more words). I wish to sort this array first by uid
then by login_name. After looking at examples from various sources
(books, Net, etc), I have come up with the following Perl code
which does the job I want.

@output = map { $_->[0] }
sort { $a->[1] == $b->[1] ? $a->[2] cmp $b->[2] :
$a->[1] <=> $b->[1] }
map { [ $_,(split)[1,2] ] } @output;

Being a newbie, naturally I had to take one bite at a time and
what has puzzled me is this.

When I use
@output = map { [ $_,(split)[1,2] ] } @output;
I get (when I print "@array\n")
ARRAY(0x400182f0)
:
:
:

But if I use (note: minus brackets)
@output = map { $_,(split)[1,2] } @output;
I get the expected result

Yet I need the brackets when I do the full sorting (refer to above)
otherwise I get blank lines in my output. I have looked up the same
sources but have not found any reference to the use of "[ ]". Can
someone pls explain this to me.

As I have said before, this is one of my first attempts at writing
Perl (though I have many years at writing shell scripts) so I have
kept it simple. Any pointers on how to do things better such as
using associative arrays (which I will try in my next cut) will also
be appreciated.

Thanks

Eric Arnold

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

I wonder if this chunk of code will haunt us forever. It's cool, but I
think it probably shouldn't be posted anywhere without the complete
explication. It's the sort of thing that if you're a programmer (even
a Perl programmer) it causes a long double take, and if you're not, it
causes you to turn to TCL or Python. I had a long talk with my uncle
whose department is converting to Python, telling him that Perl can be
written as clearly as C, if the programmer is willing. Now, I find that
this JAPH thing has become a wide spread example of Perl coding :-(

It's a showcase of "evil" Perl:
- lots of noise:
{ $_->[0] }
- implicit arguments and effects:
(split)[1,2]
- magic local/global variables:
$a->[1]
- 3 level nested looping (via the maps/sort)

>@output = map { $_->[0] }
> sort { $a->[1] == $b->[1] ? $a->[2] cmp $b->[2] :
> $a->[1] <=> $b->[1] }
> map { [ $_,(split)[1,2] ] } @output;

Anyway, the answer to your question is that [ ] returns a reference to
an anonymous array. The manual page, "perlref", discusses all the
stuff about references. It's also in the second edition Camel Perl
book, which is highly recommended if you plan to go much further, or
regardless.

The "map{ [ ] }" yields an array of anonymous arrays, passed up to
"sort". Because the conditional code block in "sort" dereferences the
elements:

> sort { $a->[1] == $b->[1] ? $a->[2] cmp $b->[2] :

^^^^^
all becomes well because "...->[1]" resolves to the first element
returned by "(split)[1,2]" :

> map { [ $_,(split)[1,2] ] } @output;
^^^

-Eric


In article <58341k$mf3@cdn_news.telecom.com.au>

Joseph N. Hall

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to Eric Arnold

Eric Arnold wrote:
>
> I wonder if this chunk of code will haunt us forever. It's cool, but I
> think it probably shouldn't be posted anywhere without the complete
> explication.

http://www.5sigma.com/perl/recipes.html

Or, if that isn't enough, Tom C's FMTYEWTK about sorting.

It's the sort of thing that if you're a programmer (even
> a Perl programmer) it causes a long double take, and if you're not, it
> causes you to turn to TCL or Python. I had a long talk with my uncle
> whose department is converting to Python, telling him that Perl can be
> written as clearly as C, if the programmer is willing. Now, I find that
> this JAPH thing has become a wide spread example of Perl coding :-(

Whine, whine, whine. The "Schwartzian Transform" is both efficient
and elegant. It isn't even that hard to understand once you've seen
the right narrative description of what's going on. ;-)

One of these days I'll get around to finishing the module containing
the "Hallacious Transform Wrapper," which allows you to implement
Schwartzian Transforms by just writing the appropriate transform
function.

-joseph

--
76% of all CGI questions posted in comp.lang.perl.misc are answered by:
"CGI.pm. LWP. http://www.perl.com/CPAN/modules/01modules.index.html."
....
Joseph N. Hall http://www.5sigma.com/joseph/ jos...@5sigma.com
Proprietor, 5 Sigma Productions P.O. Box 6250 Chandler AZ 85246
Perl instruction (http://www.5sigma.com/perl/), C++/C/Perl software,
web stuff, original music >>>Perl questions? mailto:pe...@5sigma.com

Tom Christiansen

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

[courtesy cc of this posting sent to cited author via email]

In comp.lang.perl.misc, jos...@5sigma.com writes:
:> It's the sort of thing that if you're a programmer (even


:> a Perl programmer) it causes a long double take, and if you're not, it
:> causes you to turn to TCL or Python. I had a long talk with my uncle
:> whose department is converting to Python, telling him that Perl can be
:> written as clearly as C, if the programmer is willing. Now, I find that
:> this JAPH thing has become a wide spread example of Perl coding :-(
:
:Whine, whine, whine. The "Schwartzian Transform" is both efficient
:and elegant. It isn't even that hard to understand once you've seen
:the right narrative description of what's going on. ;-)

No. Eric is right. I'm still pissed at Randal for having posted it,
especially the way he did. It's a great example of what's wrong with
Perl, and to some extent, of what's wrong with this newsgroup. It gives
Perl a bad name. If you want lisp, fine, go play with it, but it just
puts people off. Why do you think I spent hours of my life trying to undo
the damage he caused with that posting? I can't track down all of them.

If you only have time for a short, superclever, gee-ain't-I-smart
gee-aren't-you-a-fool posting, please don't post at all.

--tom
--
Tom Christiansen Perl Consultant, Gamer, Hiker tch...@mox.perl.com


There is a need to keep from being locked into Open Systems. --IBM sales rep

Mike Stok

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

In article <58341k$mf3@cdn_news.telecom.com.au>,

Steven Lee <c82...@vuh263.telecom.com.au> wrote:
>I am new to Perl and would like to pose the following question
>to the Perl gurus at large.
>
>I have an array with the following data (fields if you like) for
>each element - "host uid login_name full_name" (full name may be
>none, one or more words). I wish to sort this array first by uid
>then by login_name. After looking at examples from various sources
>(books, Net, etc), I have come up with the following Perl code
>which does the job I want.

Fine looking schwartzian code, just be aware that this can be a
maintenance nightmare if you're not careful

>@output = map { $_->[0] }
> sort { $a->[1] == $b->[1] ? $a->[2] cmp $b->[2] :
> $a->[1] <=> $b->[1] }
> map { [ $_,(split)[1,2] ] } @output;


>Being a newbie, naturally I had to take one bite at a time and
>what has puzzled me is this.
>
>When I use
> @output = map { [ $_,(split)[1,2] ] } @output;
>I get (when I print "@array\n")
> ARRAY(0x400182f0)

[...]

>But if I use (note: minus brackets)
> @output = map { $_,(split)[1,2] } @output;
>I get the expected result

That's because [] used like this is an anonymous array constructor, it
copies the things inside the []s to a private list and then returns a
reference to that list. That's why the code abive uses $a->[$index]
because $a and $b contain references (similar to C pointers) to lists and
the -> is a dereferencing thing. So your

@output = map { [ $_,(split)[1,2] ] } @output;

is populating @output with array references which point at anonymous
lists, but

@output = map { $_,(split)[1,2] } @output;

will create a new array with more elements e.g.

("host uid1 name1 ...", "host uid2 name2 ...")

turns into ('host uid1 name1 ...', 'uid1', 'name1', 'host uid2 name2 ...',
'uid2', 'name2')

The sort code above never actually compares $a to $b, but supplies a
chunk of code to sort which uses $a and $b as pointers to lists
containing the interesting bits of data you have just split out.

In the debugger:

DB<1> @array = (1 .. 3);

DB<2> $arrayRef = [ 1 .. 3 ]

DB<3> X array
@array = (
0 1
1 2
2 3
)
DB<4> X arrayRef
$arrayRef = ARRAY(0x14019cc48)
0 1
1 2
2 3
DB<5> print $array[0]
1
DB<6> print $arrayRef->[0]
1
DB<7> print $$arrayRef[0]
1
DB<8> print $arrayRef
ARRAY(0x14019cc48)

Or, at the risk of being redundant

DB<1> @input = ("host uid1 name1 ...", "host uid2 name2 ...")

DB<2> @output = map { $_,(split)[1,2] } @input

DB<3> X output
@output = (
0 'host uid1 name1 ...'
1 'uid1'
2 'name1'
3 'host uid2 name2 ...'
4 'uid2'
5 'name2'
DB<4> @output = map { [$_,(split)[1,2]] } @input

DB<5> X output
@output = (
0 ARRAY(0x1401f92f8)
0 'host uid1 name1 ...'
1 'uid1'
2 'name1'
1 ARRAY(0x14020f058)
0 'host uid2 name2 ...'
1 'uid2'
2 'name2'
)

The debugger can be invoked thus

perl -de 1

and used quite profitably as a perl shell.

Check out the perlref and perldsc man pages for a fuller description.

Hope this helps,

Mike
--
mi...@stok.co.uk | The "`Stok' disclaimers" apply.
http://www.stok.co.uk/~mike/ | PGP fingerprint FE 56 4D 7D 42 1A 4A 9C
http://www.token.net/~mike/ | 65 F3 3F 1D 27 22 B7 41
st...@psa.pencom.com | Pencom Systems Administration (work)

Bennett Todd

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

On 4 Dec 1996 06:00:20 GMT, Steven Lee <c82...@vuh263.telecom.com.au> wrote:
>I am new to Perl [...]

Yowow! You're sure roaring in with afterburners engaged!

>I have an array with the following data (fields if you like) for
>each element - "host uid login_name full_name" (full name may be
>none, one or more words). I wish to sort this array first by uid

>then by login_name. [...]


>
>@output = map { $_->[0] }
> sort { $a->[1] == $b->[1] ? $a->[2] cmp $b->[2] :
> $a->[1] <=> $b->[1] }
> map { [ $_,(split)[1,2] ] } @output;

Almost perfect. The sort comparison function can be simplified: you want it
to return <0, 0, or >0 as the first element should come before, don't care,
or after the second, respectively. So rather than using a ?: ternary operator
to decide which field to sort on, just sort on the first one, unless that
comparison returns zero (equal), in which case sort on the second one:

... sort { $a->[1] <=> $b->[1] || $a->[2] cmp $b->[2] } ...

>Being a newbie, naturally I had to take one bite at a time and
>what has puzzled me is this.
>
>When I use
> @output = map { [ $_,(split)[1,2] ] } @output;

@output gets an array (map) of references to anonymous arrays (the outer [])
each containing the full record, as well as its second and third fields:

@output = (
[ @output[0], (split(@output[0]))[1], (split(@output[0]))[2] ],
[ @output[1], (split(@output[1]))[1], (split(@output[1]))[2] ],
...
);

That's what the map expands to.

>I get (when I print "@array\n")
> ARRAY(0x400182f0)

> :
> :
> :

Yup, you're asking to print (the contents of) an array full of references.
That's how Perl is printing them.

>But if I use (note: minus brackets)
> @output = map { $_,(split)[1,2] } @output;

This time @output is getting an array of all those triples, flattened out ---
three fields in the result for each field in the input:

@output = (
@output[0], (split(@output[0]))[1], (split(@output[0]))[2],
@output[1], (split(@output[1]))[1], (split(@output[1]))[2],
...
);

I.e. @output gets 3 times as many records.

>I get the expected result

Yup.

>Yet I need the brackets when I do the full sorting (refer to above)
>otherwise I get blank lines in my output.

Those brackets are constructing references to anonymous arrays --- and those
references are what you are sorting. Think of a reference as a very strongly
typed pointer, a pointer that knows a lot about what it's pointing at.

In fact, from perldata(1),

[...] references are strongly-typed uncastable pointers with built-in
reference-counting and destructor invocation.

A detailed description of these things can be found in perlref(1). And a
really, really exhausting (or was that exhaustive?:-) description of this
precise idiom, what it's doing, and how, can be found in
<URL:http://perl.com/perl/everything_to_know/sort.html>, "Far More Than
Everything You Ever Wanted To Know About Sorting".

-Bennett

Mike Stok

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

In article <32A530...@5sigma.com>,

Joseph N. Hall <jos...@5sigma.com> wrote:
>Eric Arnold wrote:
>>
>> I wonder if this chunk of code will haunt us forever. It's cool, but I
>> think it probably shouldn't be posted anywhere without the complete
>> explication.
>
>http://www.5sigma.com/perl/recipes.html
>
>Or, if that isn't enough, Tom C's FMTYEWTK about sorting.

It depends on the intended audience and purpose of the code. I'd
generally side with Eric as some of my code which was intended as a
"proof of concept" has made it into production and the people who have to
debug it suffer!

If you're showcasing reasons for using languages other than perl then
this is fine code if left uncommented, a lucid comment or commented out
equivalent code might well help the maintainer stuck without web access
at 3am with an angry boss and a listing... It's a matter of taste as to
which idioms are common enough to be considered not worthy of comment,
and if you feel happy leaving code like that uncommented then please just
leave your home phone number in the code somewhere!

Greg Bacon

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to Tom Christiansen

[Posted and mailed]

In article <583v76$aor$1...@csnews.cs.colorado.edu>,
Tom Christiansen <tch...@mox.perl.com> writes:
:
: No. Eric is right. I'm still pissed at Randal for having posted it,


: especially the way he did.

What happened to TIMTOWTDI?

: It's a great example of what's wrong with


: Perl, and to some extent, of what's wrong with this newsgroup.

Perl, maybe; newsgroup no. For every obvious beginner who is greeted
with a gift from the JAPH f*ckin' mastah (sorry Pulp Fiction fans :-),
that beginner will probably get at least three other replies from
other readers either directly answering the original post, explaining
the JAPH, or both. What's wrong with this? The Father tells us
TIMTOWTDI, and if a Perl hacker posts JAPHs, perhaps it is because
he hears a different drummer.

: It gives Perl a bad name.

Maybe, but programming preference is often arbitrary, religious even.
Rich Stevens, a great programmer himself, called Perl a "write-only
language". So what? He has the right to his opinion, but that won't
cause me to emit one less byte of Perl code. IMHO, the variety of
solutions to the same problem shows what's blessedly *right* about Perl;
just some solutions are more readable than others.

: If you want lisp, fine, go play with it, but it just
: puts people off.

True if the Lisp solution is the only one submitted, but that is rarely
the case. What about TIMTOWTDI? Surely the Father knew that in
creating his hybrid beauty, people would try to program in BASIC with
Perl or Lisp with Perl et al.

: Why do you think I spent hours of my life trying to undo


: the damage he caused with that posting? I can't track down all of them.

It's unfair IMHO to say Randal caused damage with that post. That's not
to say that everything you've done for Perl has gone unappreciated. I
don't think anyone save Ray Helie would accuse you of doing any damage
(just look at everyone who came to your defense in that thread).
However, how can one cause damage with a solution when TIMTOWTDI? We
discussed this a few weeks ago, and everyone (Randal included) agreed
that the place for a JAPH is not in "real" code, and the JAPH's main
value lies in that it causes the onlooker to *gasp* think a minute and
maybe even learn something about the language. What's so bad about
that? Besides, if someone runs when code gets hairy, is she destined
to be a Perl hacker, or will she be another source of noise posts that
she could have answered herself with a little research?

If this makes me a pariah in the Perl Community for taking issue with
Tom, then so be it. If Perl's motto is TIMTOWTDI, then oughtn't we
stick by it? (For the record: just because it's *possible* to do
something one way doesn't necessarily make it a good way. Also note
that I never once accused Tom of mailbombing or spamming. :-)

Greg
--
Greg Bacon <gba...@cs.uah.edu>
Unix / Perl Consultant
Perl Institute Partner - http://www.perl.org/

Dave Thomas

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

On 4 Dec 1996 15:38:11 GMT, Greg Bacon <gba...@cs.uah.edu> wrote:
> <stuff about the amazing map sort...>

I think I agree with both of you. I agree with Tom that obscure code coesn't
help beginners, and may just put some of them off. I agree with Greg that
that doesn't mean that clever code doesn't have its place.

I feel that when posting solutions for people, or when illustrating a point
with code, you have to be aware of your audience. Keep things simple, and
try to engender a decent style. Perl is being used to engineer large
systems; we need to adopt more of an engineering approach.

I'm the first one to enjoy trying to decipher some obscure 2 liner, and I've
learned a lot doing it. But outside of some competition, I hope never to
write any production code that needs an hour per source line to understand!

Dave


--

_________________________________________________________________________
| Dave Thomas - Da...@Thomases.com - Unix and systems consultancy - Dallas |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Bennett Todd

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

On 4 Dec 1996 16:07:49 GMT, Dave Thomas <da...@fast.thomases.com> wrote:
>I agree with Tom that obscure code coesn't help beginners, and may just put
>some of them off.

I am afraid I disagree with you and Tom on that one. The code Randal posted
was moderately helpful to me as a beginner, and the response it provoked out
of Tom was _fantastically_ helpful.

>[...] But outside of some competition, I hope never to write any production


>code that needs an hour per source line to understand!

An hour per source line for _Who_ to understand? That's the really essential
issue. If it can't take an hour per line for someone to understand who knows
absolutely nothing about perl or programming, then you aren't going to get
very much work done. Yes, you can riddle your code so densely with explanatory
comments that it amounts to an introduction to the principles of programming,
but I don't think that's good engineering practice for production code.

For anybody who has gone through Tom's ``Far More Than
Everything You've Ever Wanted to Know About Sorting'' at
<URL:http://perl.com/perl/everything_to_know/sort.html>, these sorts of things
are fairly easy to read --- quicker to understand, and to let you convince
yourself it's _correct_, than the bulkier open-coded reimplementations.

That lovely little idiom is a great teaching example for bringing an
understanding of how some of Perl's builtins allow for clean and efficient
functional programming on lists, and this in turn opens up new ways of solving
problems. For instance, I recently posted an example I thought was
particularly clean and useful:

perl -le '$,="\n";print grep { -f } map { "$_/Config.pm" } @INC;'

which it just occurred to me could also be spelt

perl -le 'for (grep { -f } map { "$_/Config.pm" } @INC) { print };'

What fun! And I find this sort of thing useful, too --- having powerful
list-processing primatives available, and efficiently composable, makes the
language more expressive; it lets you avoid getting bogged down in open-coding
basic list operations.

-Bennett

Eric Arnold

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

In article <32A530...@5sigma.com>
jos...@5sigma.com writes:

>Eric Arnold wrote:
>>
>> I wonder if this chunk of code will haunt us forever. It's cool, but I
>> think it probably shouldn't be posted anywhere without the complete
>> explication.
>
>http://www.5sigma.com/perl/recipes.html

Hmm. Too little.

>Or, if that isn't enough, Tom C's FMTYEWTK about sorting.

Which is:
http://www.perl.com/perl/everything_to_know/sort.html

Hmm. Too much, or at least the explanation can't follow the code around.

> It's the sort of thing that if you're a programmer (even
>> a Perl programmer) it causes a long double take, and if you're not, it
>> causes you to turn to TCL or Python. I had a long talk with my uncle
>> whose department is converting to Python, telling him that Perl can be
>> written as clearly as C, if the programmer is willing. Now, I find that
>> this JAPH thing has become a wide spread example of Perl coding :-(
>
>Whine, whine, whine. The "Schwartzian Transform" is both efficient

I actually find the efficiency surprising. I had no idea that hash
lookups were so expensive (and it ain't linear neither). Is a 10000
element hash such a burden, or am I doing something wrong?

array len=1000
Benchmark: timing 10 iterations of hash, map...
hash: 2 secs ( 1.99 usr 0.01 sys = 2.00 cpu)
map: 1 secs ( 1.19 usr 0.01 sys = 1.20 cpu)

array len=5000
Benchmark: timing 1 iterations of hash, map...
hash: 7 secs ( 6.81 usr 0.01 sys = 6.82 cpu)
map: 1 secs ( 0.70 usr 0.07 sys = 0.77 cpu)

array len=10000
Benchmark: timing 1 iterations of hash, map...
hash: 36 secs (35.25 usr 0.06 sys = 35.31 cpu)
map: 2 secs ( 1.52 usr 0.09 sys = 1.61 cpu)

use Benchmark;
print "array len=" , $len = 10000 , "\n";
for ( $i = $len ; $i >= 0 ; $i--)
{
push( @array, "xx$i $i");
}
timethese(1, {
"map" => q{
@results = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, (split)[-1]] } @array;
},
"hash" => q{
for ( @array )
{
$h{$_} = (split)[-1];
}
@results = sort { $h{$a} <=> $h{$b} } @array;
}
});


>and elegant. It isn't even that hard to understand once you've seen
>the right narrative description of what's going on. ;-)

I'm not saying that I don't understand it, and I do admire its
elegance, but do you realize how hard it is to make converts to Perl of
people who aren't Lisp (or some other syntactically snarly language)
programmers, if they see something like this before (or even after)
they see a clearly written piece of Perl code? They don't care that it
takes a page of some other language to do 3 lines of Perl code; they
just don't want to come back in 6 months and see line noise on their
screen. Perl's domain has in large part been people of medium
technical expertise who just want to get a job done and have little
patience for syntactic puzzles, however enriching. I think Perl is a
better language than all the others I've seen, but I'm always
waging an uphill battle against clever Perl programs when trying to
proselytize.

I am saying that, at a minimum, to explain this:

@results = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, (split)[-1]] } @array;

it should be posted as something like this:

@results = # Start reading at the bottom.
#
map { # Take the new array returned by below "sort",
$_->[0] # dereference each element as anon. array ref.
# and return only the 0'th element so the
# final resulting array will have the same
# form as @array, but the new sorted order.
}
sort { # Take the new array returned by below "map",
# sort it by comparing pairs of elements in
# the special $a and $b variables, which


$a->[1] <=> $b->[1]

# must be dereferenced as references to anon.
# arrays before doing any comparison
}
map { # Take each element of @array and ...
#
[ # return a anon. array reference containing
# two (in this case, more if required) elements
#
$_, # ->[0] : the original @array element
#
(split)[-1] # ->[1] : the last value returned by split,
# which is operating on default arg ($_)
# and default pattern (whitespace). The ()
# around split makes a new array, which
# ()[-1] takes the last element of.
]
} @array; # We want to sort on the last field in each
# record of @array. Rather than splitting
# each time "sort" does a comparison, we are
# going to create a structure which caches
# and carries the last field into "sort".
# Flow:@array-->map-->sort-->map-->@results

The idea of putting a wrapper around it is a nice idea, but from:

http://www.5sigma.com/perl/recipes.html

the examples are still abstruse:

sub mksort {
my $xform = shift;
sub {
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, &$xform($_ . "")] }
@_;
}
}

$sort_nospace = mksort sub { $_[0] =~ tr/ //d; };
print join("\n", &$sort_nospace('a', 'a aa', 'aa')), "\n";
$sort_by_mod_time = mksort sub { sprintf "%015.6f", -M shift }; # or change cmp to <=>
print join(" ", &$sort_by_mod_time(@files));


If you do something like this instead (you also need a parametized
comparison):

sub ssort {
my $cmp_sub = shift;
my $xform_sub = shift;
return map { $_->[0] }
sort { &$cmp_sub }
map { [ $_, &$xform_sub ] } @_;
}

you have the option of doing the shorter version:

@results = ssort sub{ $a->[1] <=> $b->[1] },
sub{ (split)[-1] },
@array;

or you can be more verbose:

sub last_elem{
my $str = shift @_;
my(@fields) = split(/\s+/,$str);
return $fields[-1];
}

sub by_numval{
$a->[1] <=> $b->[1];
}

@results = ssort \&by_numval, \&last_elem, @array;

The down side is that they are all 2-3 times slower than the raw
version, and while that's only a problem with large sorts, you'd only
use this map/sort trick if you're looking for max performance. So, I'd
say the thing to do is make sure the comment-impregnated version gets
disseminated, since we can't ignore the trick's usefulness.

-Eric

Joseph N. Hall

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

Eric Arnold wrote:
> I'm not saying that I don't understand it, and I do admire its
> elegance, but do you realize how hard it is to make converts to Perl of
> people who aren't Lisp (or some other syntactically snarly language)
> programmers, if they see something like this before (or even after)
> they see a clearly written piece of Perl code?

The thing is, it IS clearly written. It is clearly written in the
same sense that declarations in C are clearly written. It is an
idiom. It is an elegant idiom. Perl is an idiomatic language in
which you can be elegant.

I'd be lying if I said I've never written a line of Lisp in my life,
but 1) it was for emacs, and 2) it was 10 years ago. I couldn't
write a Lisp program that printed "hello world" now if my life
depended on it, at least not without some documentation. I conclude
that Lisp has nothing to do with the perceived difficulty of this
construct and I don't know why people keep bringing it up. Maybe
it's because people want to understand Perl in terms of some other
language. I think Perl has progressed beyond that.

Finally, I think it's plain wrong for Tom and others to recoil from
the Schwartzian Transform and similar idioms on the basis that
"it's too scary for novices." Look, there are plenty of things out
there in the world made up of small, funny shaped pieces that go
together just so. Engines. Golf swings. Human beings. Just because
it's not blindingly obvious at first glance doesn't mean it's not
worth learning, or that no one will bother. Some people WANT to
climb the big hills. And for those who don't, there is plenty
of perl "baby talk" that is well documented.

--

Joseph N. Hall

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

Eric Arnold wrote:
> If you do something like this instead (you also need a parametized
> comparison):

I'm still looking at other ways of doing this, including a
parameterized comparison. But you don't "need" a parameterized
comparison for most cases. If you do use a parameterized comparison
you sure as heck shouldn't do it the way you suggest, since that's a
huge performance hit that you can't escape. The operator has to
be eval-ed as a string into the sort subroutine (which now has
to start as a string template rather than as a lexical).

The down side is that they are all 2-3 times slower than the raw
> version, and while that's only a problem with large sorts, you'd only
> use this map/sort trick if you're looking for max performance.

2-3 times slower ... not! That depends on how expensive the
key transformation is. If it's really expensive, say, stat(),
then any speed difference will be negligible. It also depends
on the details of the implementation.

What's meant by "max performance" is a matter of opinion. It's
easy to encounter cases where the performance hit of doing it the slow
way is obvious.

-joseph

Joseph N. Hall

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

Eric Arnold wrote:
> I actually find the efficiency surprising. I had no idea that hash
> lookups were so expensive (and it ain't linear neither). Is a 10000
> element hash such a burden, or am I doing something wrong?

In my experience ALL subscripting operations are significantly
slower than other alternatives. Maybe a better optimizer will partly
fix this some day, but the rule of thumb is to avoid using subscripts
whenever possible, and especially avoid using the same subscript
expression more than once (as in using $a[i] over and over again
inside a loop).

Joseph N. Hall

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to Gurusamy Sarathy

Gurusamy Sarathy wrote:
> If you have a post-5.003 perl, you can pre-extend the hash
> with that one line. Now, "hash" beats "map":
>
> array len=10000

> Benchmark: timing 10 iterations of hash, map...
> hash: 22 secs (22.85 usr 0.07 sys = 22.92 cpu)
> map: 31 secs (30.45 usr 0.06 sys = 30.51 cpu)

Geez. Now someone needs to fix "map." ;-)

Joseph N. Hall

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to Eric Arnold

Eric Arnold wrote:
> >http://www.5sigma.com/perl/recipes.html
>
> Hmm. Too little.
>
> >Or, if that isn't enough, Tom C's FMTYEWTK about sorting.

OK,

http://www.5sigma.com/perl/schwtr.html

-joseph

--
76% of all CGI questions posted in comp.lang.perl.misc are answered by:
"CGI.pm. LWP. http://www.perl.com/CPAN/modules/01modules.index.html."
....

Joseph N. Hall

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

Eric Arnold wrote:
> As to whether it's clear, in terms of Perl or other languages, it has
> things about it that I discourage in any project in any language that
> has to be maintainable:

I could argue each point, but it's obvious that you are in the "don't
dare use $_!" camp and I'm in the "why not learn the language, then use
it" camp, and we're just not going to find common ground.

What I do want to take exception to is this:

> Weird special cases another feature that fails to impresses any development
> manager, so if Perl can be shown to be fully useful and functional without
> them, it can better compete with other languages. [...]
> That means making
> the official presentation of it as accessible as possible to the full
> audience it is intended for.

comp.lang.perl.misc is NOT the "official presentation" of the language,
and
it never will be. No pointy-haired manager ever gave a rat's bunghole
about whether a language was supported by a USENET newsgroup, much less
how it was treated there.

(And of course the reason that Perl IS accepted by pointy-haired
managers
is that people come to them and say, "Hey, I can do this in a day with
Perl,
or four weeks with Java. Which would you prefer?")

The "official presentation" of the Perl language is books and classes.
We
don't subject "newbies" to rough treatment in either. If you comes to
comp.lang.perl.misc you takes your chances.

-joseph

--

Andrew M. Langmead

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

"Joseph N. Hall" <jos...@5sigma.com> writes:
>Whine, whine, whine. The "Schwartzian Transform" is both efficient
>and elegant. It isn't even that hard to understand once you've seen
>the right narrative description of what's going on. ;-)

But according to the bencharking that Tom showed in "Re: Read
directory in timestamp order?" (MessageID
<4l36f0$g...@csnews.cs.colorado.edu>), the "Schwartzian Transform"
isn't any more efficient than the "sort the indicies" shown in the
first edition camel. (According to Deja News, this article is also the
earliest appearance of the term "Schwartzian Transform" on Usenet)

So you have code that is harder to read, is no more efficient, but
probably more than twice as short and "kind of neat" to people who can
find source code interesting.

I know for the code that I develop, there is no issue of which one to
use.

--
Andrew Langmead

Gurusamy Sarathy

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

[ mailed and posted ]

In article <ERIC.96De...@m-e-ir1.sun.com>,


Eric Arnold <eric....@sun.com> wrote:
>>Whine, whine, whine. The "Schwartzian Transform" is both efficient
>
>I actually find the efficiency surprising. I had no idea that hash
>lookups were so expensive (and it ain't linear neither). Is a 10000
>element hash such a burden, or am I doing something wrong?

That's a surprising result, but new technology can beat old
any day. See below.

>array len=10000
>Benchmark: timing 1 iterations of hash, map...
> hash: 36 secs (35.25 usr 0.06 sys = 35.31 cpu)
> map: 2 secs ( 1.52 usr 0.09 sys = 1.61 cpu)
>
>use Benchmark;
>print "array len=" , $len = 10000 , "\n";
>for ( $i = $len ; $i >= 0 ; $i--)
>{
> push( @array, "xx$i $i");
>}

keys %k = 10000;

>timethese(1, {
> "map" => q{
> @results = map { $_->[0] }
> sort { $a->[1] <=> $b->[1] }
> map { [$_, (split)[-1]] } @array;
> },
> "hash" => q{
> for ( @array )
> {
> $h{$_} = (split)[-1];
> }
> @results = sort { $h{$a} <=> $h{$b} } @array;
> }
>});

If you have a post-5.003 perl, you can pre-extend the hash


with that one line. Now, "hash" beats "map":

array len=10000


Benchmark: timing 10 iterations of hash, map...

hash: 22 secs (22.85 usr 0.07 sys = 22.92 cpu)
map: 31 secs (30.45 usr 0.06 sys = 30.51 cpu)

TIMTOWTDI.

- Sarathy.
gs...@umich.edu


Eric Arnold

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

In article <32A625...@5sigma.com>
jos...@5sigma.com writes:

>Eric Arnold wrote:
>> If you do something like this instead (you also need a parametized
>> comparison):
>

>I'm still looking at other ways of doing this, including a
>parameterized comparison. But you don't "need" a parameterized
>comparison for most cases. If you do use a parameterized comparison
>you sure as heck shouldn't do it the way you suggest, since that's a
>huge performance hit that you can't escape. The operator has to
>be eval-ed as a string into the sort subroutine (which now has
>to start as a string template rather than as a lexical).

There was no "eval" or strings. The new cost is the dereference of the
sub reference, and the overhead of a sub call (plus any local variable
copies, etc). The performance hit wasn't "huge", but was significant.

sub ssort {
my $cmp_sub = shift;
my $xform_sub = shift;
return map { $_->[0] }
sort { &$cmp_sub }
map { [ $_, &$xform_sub ] } @_;
}

@results = ssort sub{ $a->[1] <=> $b->[1] },


sub{ (split)[-1] },
@array;

> The down side is that they are all 2-3 times slower than the raw


>> version, and while that's only a problem with large sorts, you'd only
>> use this map/sort trick if you're looking for max performance.
>

>2-3 times slower ... not! That depends on how expensive the
>key transformation is. If it's really expensive, say, stat(),
>then any speed difference will be negligible. It also depends
>on the details of the implementation.

Well, I think I did an apples-to-apples benchmark, and that's what I
found. You could do something where a whole new subroutine is "eval"ed
into existence before you use it, but then you'd have to deliver the
comparisons and transforms as strings, and you'd lose the flexibility
of giving the user a subroutine to work with, but it is faster. As
for whether the key transform will dominate the times, that can be
true in any case; you can always find something to bottleneck the
process.

Benchmark: timing 3 iterations of eval, raw, subref...
eval: 5 secs ( 4.37 usr 0.14 sys = 4.51 cpu)
raw: 4 secs ( 4.30 usr 0.00 sys = 4.30 cpu)
subref: 9 secs ( 7.88 usr 0.02 sys = 7.90 cpu)

use Benchmark;
$len = 5000;
print "array len=$len\n";


for ( $i = $len ; $i >= 0 ; $i--)
{
push( @array, "xx$i $i");
}

sub ssort1 {


my $cmp_sub = shift;
my $xform_sub = shift;

return eval qq{ map { \$_->[0] }
sort { $cmp_sub }
map { [ \$_, $xform_sub ] } \@_;
};
}

# if you want to try a more expensive key transform:
sub split1{
for ( $i = 0 ; $i < 50 ; $i++ ){} # spin a bit
split;
}

sub ssort2 {


my $cmp_sub = shift;
my $xform_sub = shift;
return map { $_->[0] }
sort { &$cmp_sub }
map { [ $_, &$xform_sub ] } @_;
}

timethese(3, {
"eval" => q{

@results = ssort1 q{ $a->[1] <=> $b->[1] },
q{ (split)[-1] },
@array;
},
"subref" => q{

@results = ssort2 sub{ $a->[1] <=> $b->[1] },


sub{ (split)[-1] },
@array;

},

"raw" => q{


@results = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, (split)[-1]] } @array;
},

});

Eric Arnold

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

We aren't algorithmic luddites. The point is that if some chunk of
code becomes a widely cited example, then it should help the cause of
the language rather than hurt it. It isn't just a case of newbies
having trouble with it. There are many folks who might use Perl (to
their edification!) if it didn't have an (undeserved) reputation as a
line noise language. For production purposes, people just don't have
the time to riddle out clever code, whether its C or Perl or whatever.
I'm only picking on this sorting routine because it's gained
visibility. People are free to obfuscate whatever they like in
private.

As to whether it's clear, in terms of Perl or other languages, it has
things about it that I discourage in any project in any language that
has to be maintainable:

- lots of noise:
{ $_->[0] }
This has very little immediate semantic value. Programmers have
evolved beyond using single character register names for a reason.

- implicit arguments and effects:
(split)[1,2]

"split" is slyly using a global variable ($_), and performing a
special version of its operation because it was given no
arguments. It is returning a list apparently nowhere; nothing
explicitly receives the value, i.e. "@fields=split"). Programming
has evolved away from the use of global variables and hidden side
effects for a reason.

- magic local/global variables:
$a->[1]

More of the same.

- 3 level nested looping (via the maps/sort)

The deeper the nesting of a construct, the more difficult it is to
grasp. Reverse polish calculators were really cool, but they caught
on only because they were the only game in town for a while, and try
to find one now.

I understood the thing fairly easily because I have diddled with Perl for
years, and have a wide grasp of the language and it's weird special cases.


Weird special cases another feature that fails to impresses any development
manager, so if Perl can be shown to be fully useful and functional without
them, it can better compete with other languages.

I want to see Perl succeed, because I don't like the alternatives, and
I am fascinated by the unique direction Perl takes. That means making


the official presentation of it as accessible as possible to the full
audience it is intended for.

-Eric


In article <32A623...@5sigma.com>


jos...@5sigma.com writes:
>Eric Arnold wrote:
>> I'm not saying that I don't understand it, and I do admire its
>> elegance, but do you realize how hard it is to make converts to Perl of
>> people who aren't Lisp (or some other syntactically snarly language)
>> programmers, if they see something like this before (or even after)
>> they see a clearly written piece of Perl code?
>

>The thing is, it IS clearly written. It is clearly written in the
>same sense that declarations in C are clearly written. It is an
>idiom. It is an elegant idiom. Perl is an idiomatic language in
>which you can be elegant.
>
>I'd be lying if I said I've never written a line of Lisp in my life,
>but 1) it was for emacs, and 2) it was 10 years ago. I couldn't
>write a Lisp program that printed "hello world" now if my life
>depended on it, at least not without some documentation. I conclude
>that Lisp has nothing to do with the perceived difficulty of this
>construct and I don't know why people keep bringing it up. Maybe
>it's because people want to understand Perl in terms of some other
>language. I think Perl has progressed beyond that.
>
>Finally, I think it's plain wrong for Tom and others to recoil from
>the Schwartzian Transform and similar idioms on the basis that
>"it's too scary for novices." Look, there are plenty of things out
>there in the world made up of small, funny shaped pieces that go
>together just so. Engines. Golf swings. Human beings. Just because
>it's not blindingly obvious at first glance doesn't mean it's not
>worth learning, or that no one will bother. Some people WANT to
>climb the big hills. And for those who don't, there is plenty
>of perl "baby talk" that is well documented.
>

Brian L. Matthews

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

In article <ERIC.96De...@m-e-ir1.sun.com>,
Eric Arnold <eric....@sun.com> wrote:
|do you realize how hard it is to make converts to Perl of
|people who aren't Lisp (or some other syntactically snarly language)
|programmers, if they see something like this before (or even after)
|they see a clearly written piece of Perl code?

If someone rejects a language based upon a single 3 line example,
they get what they deserve.

Brian
--
Brian L. Matthews Illustration Works, Inc.
For top quality, stock commercial illustration, visit:
http://www.halcyon.com/artstock

Bennett Todd

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

On Thu, 5 Dec 1996 00:23:08 GMT, Andrew M. Langmead <a...@world.std.com> wrote:
>So you have code that is harder to read, is no more efficient, but
>probably more than twice as short and "kind of neat" to people who can
>find source code interesting.
>
>I know for the code that I develop, there is no issue of which one to
>use.

Me too. Since I find this sort algorithm particularly clear to read, I use it
by preference.

I can eyeball a use of this function, and pretty quickly figure out what it's
doing and why --- and when I'm done, I'm convinced that it will do what I
expect. If you don't like the idiomatic use of map and an intermediate data
structure, then by all means code around the problem, building up helper
arrays carrying the sort key information, then sorting by reference to them.
But that doesn't make code "easier to read". "Easy to read" isn't an absolute;
it is relative to the particular reader in question. This reader is
comfortable with that idiom.

-Bennett

Greg Bacon

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to Eric Arnold

[Posted and mailed]

In article <ERIC.96De...@m-e-ir1.sun.com>,
eric....@sun.com (Eric Arnold) writes:
:
: We aren't algorithmic luddites. The point is that if some chunk of


: code becomes a widely cited example, then it should help the cause of
: the language rather than hurt it.

See my previous post in this thread. IMHO, Randal's JAPH sort didn't
hurt the language at all.

: It isn't just a case of newbies


: having trouble with it. There are many folks who might use Perl (to
: their edification!) if it didn't have an (undeserved) reputation as a
: line noise language.

More idiots might use Perl if their IQ was above that of the average
fern, but I don't think we ought to dumb down Perl for their sake.

: For production purposes, people just don't have


: the time to riddle out clever code, whether its C or Perl or whatever.
: I'm only picking on this sorting routine because it's gained
: visibility. People are free to obfuscate whatever they like in
: private.

True, and this was discussed *extensively* in another thread a couple
of weeks ago (between Randal, Dave Hammen, a few others and me).

: As to whether it's clear, in terms of Perl or other languages, it has


: things about it that I discourage in any project in any language that
: has to be maintainable:

How many projects assign novice programmers to maintanence?

: - lots of noise:


: { $_->[0] }
: This has very little immediate semantic value. Programmers have
: evolved beyond using single character register names for a reason.

Well, then let's start advocating Hungarian notation for Perl. Ugh.
Perl is like a natural language in that many ambiguities can be resolved
by looking at the *context*. $_ is a perfect example of this. In
examining the context, the above expression has immediate semantic value.

: - implicit arguments and effects:


: (split)[1,2]
: "split" is slyly using a global variable ($_), and performing a
: special version of its operation because it was given no
: arguments. It is returning a list apparently nowhere; nothing
: explicitly receives the value, i.e. "@fields=split"). Programming
: has evolved away from the use of global variables and hidden side
: effects for a reason.

How is it hidden? That feature is well documented and widely known.
If someone isn't sure what's going on, a mere glance at the fine manual
will clear the misunderstanding.

: - magic local/global variables:


: $a->[1]
: More of the same.

Again, the programmer only need RTFM or have 1% of a clue about Perl.

: - 3 level nested looping (via the maps/sort)


: The deeper the nesting of a construct, the more difficult it is to
: grasp. Reverse polish calculators were really cool, but they caught
: on only because they were the only game in town for a while, and try
: to find one now.

Can you say HP? Most business, engineering, math, and science majors
I know own a 48G series calculator. I like Joseph's remark about C
declarations. They're often difficult to grasp too, but that hasn't
stopped anyone from using them. C declarations' result is one of
tolerance: cdecl.

: I understood the thing fairly easily because I have diddled with Perl for


: years, and have a wide grasp of the language and it's weird special cases.
: Weird special cases another feature that fails to impresses any development
: manager, so if Perl can be shown to be fully useful and functional without
: them, it can better compete with other languages.

They're not weird when one develops an understanding of Perl. Do you
hire programmers who don't understand the language they're going to be
using? TIMTOWTDI in that one can write longhand Perl or Perl. Given
the innate laziness of programmers, most will opt for the elegant
(read 'least typing') solution.

: I want to see Perl succeed, because I don't like the alternatives, and


: I am fascinated by the unique direction Perl takes. That means making
: the official presentation of it as accessible as possible to the full
: audience it is intended for.

If you want the "official presentation", go read the Camel or Llama, or
get Tom or Randal to come give a training seminar. Like Joseph said,
Usenet is rarely the official presentation of anything.

Greg Bacon

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

In article <32A66C...@5sigma.com>,

"Joseph N. Hall" <jos...@5sigma.com> writes:
: OK,
:
: http://www.5sigma.com/perl/schwtr.html

Maybe someone should RFD comp.lang.perl.japh.authoring and
comp.lang.perl.japh.advocacy. :-)

Tom Christiansen

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to gba...@cs.uah.edu

[courtesy cc of this posting sent to cited author via email]

In comp.lang.perl.misc,
gba...@CS.UAH.Edu writes:
:: As to whether it's clear, in terms of Perl or other languages, it has


:: things about it that I discourage in any project in any language that
:: has to be maintainable:
:
:How many projects assign novice programmers to maintanence?

Nearly all of them, sadly enough.

--tom

Eric D. Friedman

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

In article <586s6v$c...@info.uah.edu>, Greg Bacon <gba...@CS.UAH.Edu> wrote:
>In article <ERIC.96De...@m-e-ir1.sun.com>,
> eric....@sun.com (Eric Arnold) writes:
>: - lots of noise:
>: { $_->[0] }
>: This has very little immediate semantic value. Programmers have
>: evolved beyond using single character register names for a reason.

This objection (at least) is easily addressed with the English module.
I wonder: has anyone written a little script that goes through a perl
script and 'anglicizes' it? Might be just what's needed to placate
those 'pointy haired managers' we keep hearing about, and it might
also be handy for novices (or others) who are having difficulty
keeping track of the mnemonic associations the Camel proposes for
$/ and the like.

--
Eric D. Friedman -- frie...@uci.edu -- http://www.oac.uci.edu/indiv/friedman/
"A book should be a ball of light in one's hands." -- Ezra Pound

Joseph N. Hall

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to Tom Christiansen

Tom Christiansen wrote:
> In comp.lang.perl.misc,

> gba...@CS.UAH.Edu writes:
> :How many projects assign novice programmers to maintanence?
>
> Nearly all of them, sadly enough.

I have to agree with Tom here. It's a good way to break in a new
body. Well, at least sometimes.

-joseph

Joseph N. Hall

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

Greg Bacon wrote:
>
> In article <32A66C...@5sigma.com>,
> "Joseph N. Hall" <jos...@5sigma.com> writes:
> : OK,
> :
> : http://www.5sigma.com/perl/schwtr.html
>
> Maybe someone should RFD comp.lang.perl.japh.authoring and
> comp.lang.perl.japh.advocacy. :-)

How about comp.lang.perl.japh.authoring and comp.lang.perl.japh.d!

Joseph N. Hall

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

Ilya Zakharevich wrote:
> As many people discovered, English'ization makes Perl more
> obscure. You will not find many modules which use English nowadays (as
> opposed to the initial frenzy).

Some of it I don't mind (especially when I'm teaching). Will the
speed problems be resolved in the future?

Joseph N. Hall

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to Andrew M. Langmead

Andrew M. Langmead wrote:
>
> "Joseph N. Hall" <jos...@5sigma.com> writes:
> >Whine, whine, whine. The "Schwartzian Transform" is both efficient
> >and elegant.
> [...] the "Schwartzian Transform"

> isn't any more efficient than the "sort the indicies" shown in the
> first edition camel.
> [...]

> So you have code that is harder to read, is no more efficient, but
> probably more than twice as short and "kind of neat" to people who can
> find source code interesting.
>
> I know for the code that I develop, there is no issue of which one to
> use.

I don't know why it is that I seem to be generating half the postings
on this thread, but here I go again.

I use the Schwartzian Transform approach for a variety of reasons:

1) It is efficient--if not the very most efficient, it is in the
ballpark with all the other most efficient approaches

2) It is the most syntactically compact approach (I guess that's
a fancy way of saying "shortest")

3) It uses no temporary variables.

4) It seems very "perlish" to me.

5) It's cool the first time you see it.

6) It's simple when you figure it out.

What I think it boils down to is, if you like using map, you'll
like using the Schwartzian Transform. Otherwise go ahead and
write your foreach loops.

-joseph

Chris Fedde

unread,
Dec 5, 1996, 3:00:00 AM12/5/96