newbie question about sorting a string

Hugo Andrade Cartaxeiro

unread,

Apr 6, 1995, 3:00:00 AM4/6/95

to

Hi Perl'ers
Here goes a newbie question about sorting a array (string):

I have a (big) string like that:

print $str;
eir 11 9 2 6 3 1 1 81% 63% 13
oos 10 6 4 3 3 0 4 60% 70% 25
hrh 10 6 4 5 1 2 2 60% 70% 15
spp 10 6 4 3 3 1 3 60% 60% 14

and I like to sort it with the last field as the order key. I know
perl has some features to do it, but I can't make 'em work properly.

Do you have a suggestion ?

Reply by mail (if any).

tks in adv,
--
Hugo Andrade Cartaxeiro : INESC, Lisboa, Portugal. email: h...@inesc.pt
Tel: +351-1 310 02 49 Fax: +351-1 52 58 43

Who is General Failure and why is he reading my hard disk ?!

Randal L. Schwartz

unread,

Apr 8, 1995, 3:00:00 AM4/8/95

to

>>>>> "Hugo" == Hugo Andrade Cartaxeiro <h...@nexus.inesc.pt> writes:

Hugo> print $str;
Hugo> eir 11 9 2 6 3 1 1 81% 63% 13
Hugo> oos 10 6 4 3 3 0 4 60% 70% 25
Hugo> hrh 10 6 4 5 1 2 2 60% 70% 15
Hugo> spp 10 6 4 3 3 1 3 60% 60% 14

Hugo> and I like to sort it with the last field as the order key. I know
Hugo> perl has some features to do it, but I can't make 'em work properly.

Well, speaking Perl with a Lisp.... :-)

require 5; # if you don't have perl 5 by now, why NOT? :-)
$str =
join "\n",
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, (split)[-1]] }
split /\n/,
$str;

Hugo> Reply by mail (if any).

(oops :-)

print "Just another Perl hacker," # but not what the media calls "hacker!" :-)
# legal fund: $2880.03 collected, $38379.50 spent; email fu...@stonehenge.com for details
--
Name: Randal L. Schwartz / Stonehenge Consulting Services (503)777-0095
Keywords: Perl training, UNIX[tm] consulting, video production, skiing, flying
Email: <mer...@stonehenge.com> Snail: (Call) PGP-Key: (finger mer...@ora.com)
Web: <A HREF="http://www.teleport.com/~merlyn/">My Home Page!</A>

Tom Christiansen

unread,

Apr 10, 1995, 3:00:00 AM4/10/95

to

In comp.lang.perl, mer...@stonehenge.com obfuscates:

:>>>>> "Hugo" == Hugo Andrade Cartaxeiro <h...@nexus.inesc.pt> writes:
:
:Hugo> print $str;
:Hugo> eir 11 9 2 6 3 1 1 81% 63% 13
:Hugo> oos 10 6 4 3 3 0 4 60% 70% 25
:Hugo> hrh 10 6 4 5 1 2 2 60% 70% 15
:Hugo> spp 10 6 4 3 3 1 3 60% 60% 14
:
:Hugo> and I like to sort it with the last field as the order key. I know
:
:Hugo> perl has some features to do it, but I can't make 'em work properly.
:
:Well, speaking Perl with a Lisp.... :-)

We all recall that Perl est un esprit LISP dans un cors C, n'est-ce
pas? :-) ("cors C" sounds like "corset" in French.)

CODE 0.0:

:require 5; # if you don't have perl 5 by now, why NOT? :-)

:$str =
: join "\n",
: map { $_->[0] }
: sort { $a->[1] <=> $b->[1] }
: map { [$_, (split)[-1]] }
: split /\n/,
: $str;
:
:Hugo> Reply by mail (if any).
:
:(oops :-)

Oh for cryin' out loud, Randal! You expect a NEW PERL PROGRAMMER to make
heads or tails of THAT? :-) You're postings JAPHs for solutions, which
isn't going to help a lot. You'll probably manage to scare these poor
people away from the language forever? :-)

BTW, you have a bug.

Couldn't you at least *indent* it properly? :-) This might help
folks make more sense of it. I'm going to use the

map EXPR,LIST

rather and then

map BLOCK LIST

of map here so that it looks more like a conventional subroutine
call (although I can't do that with sort):

CODE 0.1:

$str = join("\n",
map($_->[0],
sort( {$a->[1] <=> $b->[1] }
map( [$_, (split(' ', $_))[-1] ],
split(/\n/, $str)
)
)
)
);

Hm... ok, not much better. Let's do this slowly and work our way through
it. First of all, let's look at your string again.

eir 11 9 2 6 3 1 1 81% 63% 13

oos 10 6 4 3 3 0 4 60% 70% 25

hrh 10 6 4 5 1 2 2 60% 70% 15

spp 10 6 4 3 3 1 3 60% 60% 14

It looks like it's got a bunch of lines in it, so the first thing we're
going to do is break it up into a list of lines.

@lines = split (/\n/, $str);

You need to do this because each the perl built-in sort() function expects
a list and returns a list. It works like this in its most simple
incarnation.

@newlist = sort @oldlist;

Well, that's all well and good, but it's going to do the default sort,
meaning it's sort it by ASCII. Imagine a function like this:

CODE 1.1:

sub by_last_num_field {
@alist = split(/\s+/, $a);
@blist = split(/\s+/, $b);
if ( $alist[$#alist] > $blist[$#blist] ) {
return 1;
} elsif ( $blist[$#blist] > $alist[$#alist] ) {
return -1
} else {
return 0;
}
}

You could then say

@newlist = sort by_last_num_field @oldlist;

And it would work out for you.

It turns out that the $a and $b are each of the two values to be compared
in the original list. They're not going to passed in the way normal
arguments are, which is $_[0], $_[1], etc.

That's work out ok, but it's pretty complicated looking. It turns
out that for just a matter, perl has a couple of operators to help
you: "<=>" for numbers and "cmp" for strings.

While normally we give split a regular expression as its first argument,
there's one case where you might want to give it a string: if the first
argument is just a single space, then it will automatically discard
leading null fields, which would otherwise be preserved. (Trailing null
fields are discarded in any case (unless you give split() a third
argument).)

So you could just do this:

CODE 1.2:

sub by_last_num_field {
@alist = split(' ',$a);
@blist = split(' ',$b);
return $alist[$#alist] <=> $blist[$#blist];
}

Now you could also have written that function "in-lined", like so.
I'm going to leave out the word "return" this time:

CODE 1.3:

@newlist = sort {
@alist = split(' ', $a);
@blist = split(' ', $b);
$alist[$#alist] <=> $blist[$#blist];
} @oldlist;

Ok so far? We can make use of perl5's ability to do negative
subscripting to get the last element:

CODE 1.4:

@newlines = sort {
@alist = split(' ', $a);
@blist = split(' ', $b);
$alist[-1] <=> $blist[-1];
} @oldlines;

And in fact, we can do this without the temporary lists:

CODE 1.5:

@newlines = sort {
(split(' ', $a))[-1] <=> (split(' ', $b))[-1];
} @oldlines;

But no matter how you write it, that's a wee tad expensive for large data
sets. Since Perl's sort is a hook into C's qsort (lookup qsort(3) in the
on-line Unix manual, if you're daring), that makes it N-logN sort. You're
going to be doing 2 splits for each of those. With your data set of four,
this isn't two bad: you'll average only 11 splits. With ten lines
though, you're already up to 46. With a mere one hundred lines, 921
splits, and with a thousand lines, we're talking 13,816 splits on the
average! That's way too many.

So let's arrange to do just N splits. We'll have to pre-process them:

CODE 2.1:

@last_field = ():
foreach $line (@oldlines) {
$last_elt = (split(' ', $line))[-1];
push (@last_field, $last_elt);
}

Now for a given line $n in @oldlines, its last field can be
found in the corresponding element of the @last_field array,
that is, $last_field[$n] .

You could also just use the return value from split directly, without
using a temporary, like this:

CODE 2.2:

@last_field = ():
foreach $line (@oldlines) {
push (@last_field, (split(' ', $line))[-1]);
}

You'll find that many perl programs are used to letting the $_
variable be the default to quite a few different operations. This is
unsettling to the newcomer, but it's really not that bad. Amongst other
things, $_ is the default iterator variable on a foreach loop, the default
string to split on in a split() function, as well as the default variable
to search or replace in a match or substitution. (You'll find that it's
also the implicit iterator variables in the map() and grep() built-ins,
which are themselves something like short-cut ways of writing foreach()
loops.)

CODE 2.3:

@last_field = ():
foreach (@oldlines) {
push(@last_field, (split(' '))[-1]);
}

In perl5, we can write that without so many parentheses if we'd like.
This time I'm also going to spell "foreach" as "for". It actually
doesn't matter to perl, but it might to the reader.

CODE 2.4:

@last_field = ():
for (@oldlines) {
push @last_field, (split ' ')[-1];
}

This entire process of reducing things to smaller and smaller amounts of
code to do the equivalent thing is an interesting exercise, but perilous.
While between 1.1 and 1.5 and between 2.1 and 2.4, we actually did improve
the efficiency of the operation ever so slightly, we've also managed to
make it harder to debug and harder to maintain. It's nice to have
intermediary values sometimes in order to print them out part way through
the calculation, whether in the debugger or with normal print statements.
It's also harder for someone to come by later and figure out what you're
doing. If your aim is job security, then this might work out in your
favor, but in most other cases, it probably won't.

Ok, how do we couple the stuff we did in code set 2 with what we did in
code set 1? Remembering that any given element in @last_field corresponds
to the last field (the thing we want to sort on) in the @oldlines list.
So we're going to do something a bit strange and sort not real data but
rather just the indices. That is, we'll rearrange indices (like
0,1,2,3,4) to correspond to the properly sorted order (like 3,2,1,4,0).
This is probably the hardest concept to grasp in this whole tutorial.

CODE 3.1:

@orig_indices = 0 .. $#oldlines;
@sorted_indices = sort {
$last_field[$a] <=> $last_field[$b]
} @orig_indices;
@newlist = ();
foreach $idx (@sorted_indices) {
push(@newlines, $oldlines[$idx]);
}

Now when we're done, we've got everything into its proper order.
Occasionally, advanced perl programmers make use of what we
call array slices. If you say

CODE: 3.2.1:

@old_idx = (1,2,3);
@new_idx = (3,1,2);
@x[@new_idx] = @y[@old_idx];

It's the same as though you'd said:

CODE 3.2.2:

@x[3,1,2] = @y[1,2,3];

Which is really nothing more than simply:

CODE 3.2.3:

$x[3] = $y[1];
$x[1] = $y[2];
$x[2] = $y[3];

Except that it's shorter to write. Notice how when you're
doing slice operations you have to subscript your list with
an @ not a $? That's because the $ or @ sign actually indicates
what it is that you're getting back, not what you're looking at.
In this case, you're assigning a (sub-)list to a (sub-)list, so
you need the @ signs.

This means we can re-write 3.1 as

CODE 3.3:

@orig_indices = 0 .. $#oldlines;
@sorted_indices = sort {
$last_field[$a] <=> $last_field[$b]
} @orig_indices;
@newlines = @oldlines[@sorted_indices];

Now, since sort will return the indices, we can avoid some
temporaries and write that as:

CODE 3.4:

@newlines = @oldlines[ sort {
$last_field[$a] <=> $last_field[$b]
} 0 .. $#oldlines
];

If we'd used shorter variable names, we could have then
written it all on one line:

CODE 3.5:

@new = @old[ sort { $f[$a] <=> $f[$b] } 0 .. $#old ];

Now what? Well, we have to get back the user's orginal data.

CODE 4.1:

$str = join ("\n", @newlines);

But you'll find that you're shy a final newline (that's Randal's bug).

You could solve this by passing another null argument to join(),

CASE 4.2:

$str = join ("\n", @newlines, '');

or by concatenating a newline to the result:

CASE 4.3:

$str = join ("\n", @newlines) . "\n";

Combining all these, we have final solution that looks like this:

CODE 5.1:

@old = split (/\n/, $str);
@f = ():
for (@old) {
push @f, (split ' ')[-1];
}

@new = @old[ sort { $f[$a] <=> $f[$b] } 0 .. $#old ];
$str = join ("\n", @new) . "\n";

We can merge a few of these and have something shorter.

CODE 5.2:

@old = @f = ():
for (split (/\n/, $str)) {
push @old, $_;
push @f, (split ' ')[-1];
}
$str = join ("\n", @old[sort{ $f[$a] <=> $f[$b] } 0..$#old ]) . "\n";

It's true tha this is verging on the inscrutable, but we're trying to get
closer to Randal's obfuscated perl entry he presented way at the beginning
of this.

Nonetheless, it's still quite a bit different from Randal's. One thing
he's done is make use of the map() function. This is a perl5 feature
that uses its first expression argument as code to execute repeatedly
against each of its following list arguments, setting $_ to each element,
and returning the result of the expression to create a new output list.

CODE 5.2.1:

@single = 1..10;
@treble = map $_ * 3, @single;

You may also write this expression as a block, and omit the comma,
which is what Randal did:

CODE 5.2.2:

@single = 1..10;
@treble = map { $_ * 3 } @single;

Or even omit the temporary:

CODE 5.2.3:

@treble = map { $_ * 3 } 1..10;

You can use the map() function to shorten up 5.2 a bit by pulling
out the for loop:

CODE 5.3:

@old = split (/\n/, $str);
@f = map { (split ' ')[-1] } @old;
$str = join ("\n", @old[sort{ $f[$a] <=> $f[$b] } 0..$#old ]) . "\n";

You could even make use of the result of the assignment statement
as the argument to map and shrink it sill more:

CODE 5.4:

@f = map { (split ' ')[-1] } @old = split (/\n/, $str);
$str = join ("\n", @old[sort{ $f[$a] <=> $f[$b] } 0..$#old ]) . "\n";

Now, Randal didn't want to use any temporary variables at all.
Let's look at his jumbled code again:

Code 0.0:

$str =
join "\n",
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, (split)[-1]] }
split /\n/,
$str;

He's been kinda nasty in that there are no parens or indentation.
I offered this version which adds them:

CODE 0.1:

$str = join("\n",
map($_->[0],
sort( {$a->[1] <=> $b->[1] }
map( [$_, (split(' ', $_))[-1] ],
split(/\n/, $str)
)
)
)
);

But you could retain Randal's block form as well:

CODE 0.2:

$str = join("\n",
map( { $_->[0] }
sort( {$a->[1] <=> $b->[1] }
map( { [$_, (split(' ', $_))[-1] ] }
split(/\n/, $str)
)
)
)
);

While in some ways code 0.2 resembles code 5.4, in many others it
doesn't. What are all those crazy arrows and square brackets doing
there? Well, that's a really interesting question.

Let's take this thing inside out. Here's the innermost portion
of it:

CODE 6.1:

map { [$_, (split(' ', $_))[-1] ] } split(/\n/, $str)

What's going on is that he's using the return from split to be the
arguments to map() over. Ok, but what's that map() code block doing???

CODE 6.2:

[ $_, (split(' ', $_))[-1] ]

What 6.2 does happens to do is to create a reference to a list of two
elements. It's as though you said this

CODE 6.2.1:

my @x = ( $_, (split(' ', $_))[-1] );
return \@x;

However, the list reference he's creating by using the [ .... ] notation
isn't a named list at all. If you wrote the inner loop as a real
honest-to-goodness loop, it might look like this: (remember that 6.3 is
just the long form of 6.1)

CODE 6.3:

@listrefs = ();
for (split(/\n/, $str)) {
$newref = [ $_, (split(' ', $_))[-1] ];
push (@listrefs, $newref);
}

And then you'd have all your references in @listrefs. This means that for
a given line within $str (call it line number $n), $listrefs[$n]->[0] is
going to be the whole line itself (without its newline), while
$listrefs[$n]->[1] is going to be its last field. We could actually write
those without the pointer arrows, because adjacent brackets may omit them,
as in $listrefs[$n][0], but we're going to need them for what later, so
you should get used to seeing them.

Well, what's the next level do? It's a sort. Consider this:

CODE 7.1

@newrefs = sort { $a->[1] <=> $b->[1] } @listrefs;

Since $a and $b will be each element of @listrefs, they will each be
references to lists (kinda like pointers to lists). Remember that each of
these refers to a list of two elements, whose zeroth element $a->[0] is
the entire data, but whose first element $a->[1] is the sort key.
So we'll compare the sort keys.

Put we don't want the sort keys. We want the original data.
You could write a loop to do it:

CODE 7.2:

@data = ():
foreach $ref (@newrefs) {
push(@data, $ref->[0]);
}

Or you could be like Randal and avoid temporaries by using
a map() instead of a loop:

CODE 7.3:

@data = map { $_->[0] } @newrefs;

Combining up various stages, we get this:

CODE 7.4:

@lines = split( /\n/, $str);
@listrefs = map { [$_, (split(' ', $_))[-1] ] } @lines;
@newrefs = sort { $a->[1] <=> $b->[1] } @listrefs;
@data = map { $_->[0] } @newrefs;
$str = join("\n", @data) . "\n";

It may not look like it, but this is very, very close to what Randal wrote
back in 0.0; he just didn't keep around his temporary results in names
variables. It's quite likely that he even developed the 0.0 code starting
from something that looked more like 7.4. If you don't, you'll have an
exceedingly difficult time debugging it.

But you know something? You don't have to be all so super-clever to
do this. There's a much more straightforward solution:

CODE 8.1:

$tmpfile = "/tmp/sortage.$$";
open (TMP, ">$tmpfile");
print TMP $str;
close(TMP);
$str = `sort +10n $tmpfile`;
unlink $tmpfile;

See? We just write it to a file, then let the systme sort program do the
work for you. Isn't that much easier to understand? If you can easily
express your sort in something that the system sort program will
understand, it usually makes more sense to use it. It'll run faster
and be easier to write as well.

Of course, we've made committed some minor errors here by omitting
error checking. This would be better:

CODE 8.2:

$tmpfile = "/tmp/sortage.$$";
open (TMP, ">$tmpfile") || die "can't open $tmpfile for writing: $!";
(print TMP $str) || die "can't write to $tmpfile: $!";
close(TMP) || die "can't write to $tmpfile: $!";
$str = `sort +10n $tmpfile`;
if ($?) { die "sort exited with a $?" }
unlink($tmpfile) || die "can't unlink $tmpfile: $!";

That line to print() looks funny, eh? And what about the parens on
the unlink? You probably thought that you could have written both
of those as

CODE 8.2.1:

print TMP $str || die "can't write to $tmpfile: $!";
unlink $tmpfile || die "can't unlink $tmpfile: $!";

But in fact, you couldn't. The || binds more tightly than you want.
It would have been as though you'd written:

CODE 8.2.2:

print TMP ($str || die "can't write to $tmpfile: $!");
unlink ($tmpfile || die "can't unlink $tmpfile: $!");

Which certainly isn't good. That's why in perl5 we have "or", which is
a very low-precedence operator. We probably would write it like this:

CODE 8.3:

$tmpfile = "/tmp/sortage.$$";
open TMP, ">$tmpfile" or die "can't open $tmpfile for writing: $!";
print TMP $str or die "can't write to $tmpfile: $!";
close TMP or die "can't close $tmpfile: $!";
$str = `sort +10n $tmpfile`;
if ($?) { die "sort exited with a $?" }
unlink $tmpfile or die "can't unlink $tmpfile: $!";

Another trick sometimes used is to use to the filename itself as
an indirect file handle.

CODE 8.4:

$tmpfile = "/tmp/sortage.$$";
open $tmpfile, ">$tmpfile" or die "can't open $tmpfile for writing: $!";
print $tmpfile $str or die "can't write to $tmpfile: $!";
close $tmpfile or die "can't close $tmpfile: $!";
$str = `sort +10n $tmpfile`;
if ($?) { die "sort exited with a $?" }
unlink $tmpfile or die "can't unlink $tmpfile: $!";

Another trick sometimes used is to use to the filename itself as an
indirect file handle. This has a greater advantage when you are reading
from the file, and issuing diagnostics using warn() or die(), which
include the line (or chunk) position and filehandle anme of the last file
read from. This way you'd also get the

CODE 8.5:

$tmpfile = "/tmp/sortage.$$";
open $tmpfile, ">$tmpfile" or die "can't open $tmpfile for writing: $!";
print $tmpfile $str or die "can't write to $tmpfile: $!";
close $tmpfile or die "can't close $tmpfile: $!";
# do something else to the file here, then...
open $tmpfile, "<$tmpfile" or die "can't open $tmpfile for reading: $!";
while (<$tmpfile>) {
if ( $something_or_other ) {
warn "oops message";
}
}

Now the warn will include the actual filename rather than just
the filehandle, like TMP.

Now, you probably shouldn't bother go to too much trouble in figuring
out a good filename. There is a POSIX function which will do this
for you.

CODE 8.6:

use POSIX;
$tmpfile = POSIX::tmpnam();
open TMP, ">$tmpfile" or die "can't open $tmpfile for writing: $!";
print TMP $str or die "can't write to $tmpfile: $!";
close TMP or die "can't close $tmpfile: $!";
$str = `sort +10n $tmpfile`;
if ($?) { die "sort exited with a $?" }
unlink $tmpfile or die "can't unlink $tmpfile: $!";

Just one final thing. This seems like just the application for a
bi-directional open. You might be tempted to write this using the open2()
function.

CODE 9.1:

use IPC::Open2;
$kidpid = open2('INPUT', 'OUTPUT', 'sort +10n');
print OUTPUT $str or die "can't write to $tmpfile: $!";
close OUTPUT or die "can't close output pipe: $!";
while (<INPUT>) {
print "got line $.: $_";
}
close INPUT or die "can't close input pipe: $!";

or maybe to read the whole thing back again, undef the RS:

CODE 9.2:

use FileHandle;
use IPC::Open2;
$kidpid = open2('INPUT', 'OUTPUT', 'sort +10n');
print OUTPUT $str or die "can't write to $tmpfile: $!";
close OUTPUT or die "can't close output pipe: $!";
INPUT->input_record_separator(undef);
$str = <INPUT>;
close INPUT or die "can't close input pipe: $!";

Now, it turns out that in this case it actually works,
at least on my system. I wrote all my input do the pipe
first, which gladly read it all in up until end of file, and
then spit it out. That's how sort work. But cat isn't so
friendly, and works far more like most programs.

CODE 9.3:

use FileHandle;
use IPC::Open2;
$kidpid = open2('INPUT', 'OUTPUT', 'cat');
print OUTPUT $str or die "can't write to $tmpfile: $!";
close OUTPUT or die "can't close output pipe: $!";
INPUT->input_record_separator(undef);
$str = <INPUT>;
close INPUT or die "can't close input pipe: $!";

If you do this and $str is big enough, you'll block forever.
And even doing a piece at a time:

CODE 9.4:

use IPC::Open2;
$kidpid = open2('INPUT', 'OUTPUT', 'cat');
while (1) {
print OUTPUT $str;
$newstr = <INPUT>;
print "just read <<<$newstr>>>\n";
}

This approach won't work because in general, you can't control your child
processes notions of buffering, which are entirely useless for simulating
interactive work as you're attempting here. The only programs this works
on are ones that are designed for it, like dc. There aren't many system
programs like it.

I strongly advise against using open2 for almost anything, even though I'm
its author. UNIX buffering will just drive you up the wall. You'll end
up quite disappointed. It's better to use temp files as we did back in
the code examples in section 9 if you really want to deal with an external
process in which you wish to control both its input and its output.

There's one final lesson I'd like to impart. It's not a bad idea to order
your coding priorities like so:

1. correctness
2. maintainability
3. efficiency

Notice at no point did "4. cleverness" enter the picture. Of course, if
you're writing Obfuscated C contest entries, or tricksy little JAPHs, then
it makes perfect sense to invert these priorities, like so:

1. cleverness
2. efficiency
3. maintainability
4. correctness

But that's only good for tricks, not solid code, which is why code like
that in 5.1, 8.3, or even 7.4 will in the long run make easier both your
life and the lives of those around you.

While a LISP or Scheme programmer may be entirely conformatable with
arrange their code as Randal did in 0.0, I haven't personal found that
most of your average off-the-street UNIX, DOS, VMS, or IBM programmers
take very quickly to that style of functional programming -- or even the
above average ones, for that matter. It's nice in its own pristine way
not to use any temporary variables and all, but it's still not to most
people's liking.

And no, I'm not ragging on Randal, merely teasing a bit. He's just trying
to be clever, and that's what he does. I'm just submitting a sample
chapter for his perusal for inclusion the mythical Alpaca Book :-)

--tom
--
Tom Christiansen Perl Consultant, Gamer, Hiker tch...@mox.perl.com

sv_magic(sv, Nullsv, 'B', Nullch, 0); /* deep magic */
--Larry Wall, from util.c in the v5.0 perl distribution

Robert Davies

unread,

Apr 10, 1995, 3:00:00 AM4/10/95

to

Thanks Tom,

and I really was afraid to ask...

Ken Hiatt

unread,

Apr 11, 1995, 3:00:00 AM4/11/95

to

h...@nexus.inesc.pt (Hugo Andrade Cartaxeiro) wrote:

>Hi Perl'ers
>Here goes a newbie question about sorting a array (string):

>I have a (big) string like that:

>print $str;

>eir 11 9 2 6 3 1 1 81% 63% 13
>oos 10 6 4 3 3 0 4 60% 70% 25
>hrh 10 6 4 5 1 2 2 60% 70% 15
>spp 10 6 4 3 3 1 3 60% 60% 14

>and I like to sort it with the last field as the order key. I know

>perl has some features to do it, but I can't make 'em work properly.

>Do you have a suggestion ?

@mylist = split( /\n/, $str ); # split strings into array

foreach $i (0 .. $#mylist) # loop through strings
{
$_ = $mylist[$i]; # I'm sure there's a better way
/[^%]*%[^%]*%\s*(\d+)/; # pull out the key
@myassoc{$1} = $mylist[$i]; # add string to the assoc. array
# keyed on the field we want
}

foreach $key (sort keys(%myassoc) ) # sort 'em
{
print @myassoc{$key}, "\n"; # and print 'em
# or if you want it back in a string:
# $newstr .= @myassoc{$key} . "\n";
}

>Reply by mail (if any).

If someone replies by mail:
(1) then nobody else that may have a similar problem will get
the benefit of seeing the answer.
(2) then people like me, who just might feel benevolent, would
end up wasting their time all answering your problem.

>tks in adv,
>--
>Hugo Andrade Cartaxeiro : INESC, Lisboa, Portugal. email: h...@inesc.pt
> Tel: +351-1 310 02 49 Fax: +351-1 52 58 43

>Who is General Failure and why is he reading my hard disk ?!

Hugo, this is a good example of the type of question that
discourages the old-timers from even reading groups like
comp.lang.perl. The first time I ever saw any Perl was this
weekend when I downloaded the NT version. My answer to your
question is not pretty, and I'm sure there is a cleaner way
to do it, but there is *nothing* that I used that is not found
in the first 49 pages of _Programming_perl_. It took me about
4 minutes to type in,

Take some time to play with it, and you will probably arrive at
a solution. Once you have some solution, *then* you might con-
sider posting and requesting assistance in making it cleaner.
This is part of why I posted my solution, I wanted to see how
I *could* do it...and hopefully get one of the oldtimers to
spare me a few minutes and critique it, suggesting better
paths to the Perfect Perl.

#undef SOAPBOX

Perl is cool...'nuff said.

...jester

Ken Hiatt
Legent Corporation

PS: Anybody have ftp space to begin a collection of NT
specific scripts?

Bob Gibson

unread,

Apr 11, 1995, 3:00:00 AM4/11/95

to

Tom:
Well written. Thanks.
Bob
--
Illiterate? Write for information -- PO Box 101; Washington, DC 20011

Bob Gibson -- gib...@netcom.com

Tom Christiansen

unread,

Apr 12, 1995, 3:00:00 AM4/12/95

to

In comp.lang.perl,
khi...@houston2.legent.com (Ken Hiatt) writes:
:
:@mylist = split( /\n/, $str ); # split strings into array

:
:foreach $i (0 .. $#mylist) # loop through strings
:{
: $_ = $mylist[$i]; # I'm sure there's a better way
: /[^%]*%[^%]*%\s*(\d+)/; # pull out the key
: @myassoc{$1} = $mylist[$i]; # add string to the assoc. array
: # keyed on the field we want
:}
:
:foreach $key (sort keys(%myassoc) ) # sort 'em
:{
: print @myassoc{$key}, "\n"; # and print 'em
:# or if you want it back in a string:
:# $newstr .= @myassoc{$key} . "\n";

:}

Careful; perl5 -w says:

Scalar value @myassoc{$key} better written as $myassoc{$key}

--tom
--
Tom Christiansen Perl Consultant, Gamer, Hiker tch...@mox.perl.com

"Oh, I'm totally outrageous--I just never seem to inspire any outrage. :-)"
--Larry Wall

Steve E. Lumos

unread,

Apr 13, 1995, 3:00:00 AM4/13/95

to

Tom Christiansen (tch...@mox.perl.com) wrote:

[. . .]

I'm not the original poster, but I want to say many thanks to Tom for
the awesome description. I don't know about you, but I'm definitely
saving this.

TTYL - Steve

Paul Lussier

unread,

Apr 13, 1995, 3:00:00 AM4/13/95

to

On Tue, 11 Apr 1995, Ken Hiatt wrote:
> h...@nexus.inesc.pt (Hugo Andrade Cartaxeiro) wrote:
> >Hi Perl'ers
>

> >print $str;
> >eir 11 9 2 6 3 1 1 81% 63% 13
> >oos 10 6 4 3 3 0 4 60% 70% 25
> >hrh 10 6 4 5 1 2 2 60% 70% 15
> >spp 10 6 4 3 3 1 3 60% 60% 14
>

> >Do you have a suggestion ?
>

> @mylist = split( /\n/, $str ); # split strings into array
>
> foreach $i (0 .. $#mylist) # loop through strings
> {
> $_ = $mylist[$i]; # I'm sure there's a better way
> /[^%]*%[^%]*%\s*(\d+)/; # pull out the key
> @myassoc{$1} = $mylist[$i]; # add string to the assoc. array
> # keyed on the field we want
> }
> foreach $key (sort keys(%myassoc) ) # sort 'em
> {
> print @myassoc{$key}, "\n"; # and print 'em
> # or if you want it back in a string:
> # $newstr .= @myassoc{$key} . "\n";
> }

The only thing I would do differently, and maybe this is not the right
answer even though it does works, in the first foreach loop:

change: /[^%]*%[^%]*%\s*(\d+)/;

to: /.*%\s*(d+)/;

Since pattern matching is already greedy, may as well take advantage of
it! Though this method assumes that the column to be sorted will always
come after a column with a % and any amount of whitespace between them
(including none)!

Hope this helps!
Paul

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
- Paul Lussier = It is a good day =The next best thing to doing -
= Raytheon MSD SWL - to put slinkies -something smart is not doing =
- p...@swl.msd.ray.com = on escalators = something stupid. -
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
= Quote of the week: =
- Double your drive space - delete Windows! -
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Tom Christiansen

unread,

Apr 15, 1995, 3:00:00 AM4/15/95

to

In comp.lang.perl,
slu...@unlv.edu (Steve E. Lumos) writes:

You can get at it via a World Wide Web client in

http://mox.perl.com/perl/all_about/sort.html

--tom
--
Tom Christiansen Perl Consultant, Gamer, Hiker tch...@mox.perl.com

You want it in one line? Does it have to fit in 80 columns? :-)
--Larry Wall in <73...@jpl-devvax.JPL.NASA.GOV>