sorting strings numerically

0 views
Skip to first unread message

Steffen Mueller

unread,
Aug 10, 2002, 10:11:44 AM8/10/02
to
"Richard J. Rauenzahn" <nos...@hairball.cup.hp.com> schrieb...
> Michael Carman <mjca...@mchsi.com> writes:
> >On 8/9/02 3:54 PM, Michael Carman wrote:
> >> On 8/9/02 2:39 PM, Michele Dondi wrote:
> >>>
> >>> I would like to sort strings alphabetically for what regards non digit
> >>> chars and numerically for any block of digits. To make things clear,
> >>> the following strings are sorted in the sense I want:
> >>
> >> [snip very crude approach]
> >>
> >> No sir, I don't really like it, but I can't think of a way to at least
> >> cache the decomposition.
> >
> >Okay, I thought of one.
>
> Here's another non recursive. It can be made into a schwartzian
> transform pretty easily. I make zero length substrings the lowest
> number and lowest letter and 'prepend' a zero length string if the
> element is /^[0-9]/. I'm not sure about its speed since I make copies
> for each sort iteration.

[...]

Here's how it stacks up to the other solutions:

Benchmark: running michael, richard, steffen, each for at least 5 CPU
seconds...

michael: 7 wallclock secs ( 5.49 usr + 0.00 sys = 5.49 CPU)
@ 524.05/s (n=2876)
richard: 8 wallclock secs ( 5.38 usr + 0.00 sys = 5.38 CPU)
@ 447.75/s (n=2408)
steffen: 6 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU)
@ 594.76/s (n=3085)

Code:
#!/usr/bin/perl
use strict;
use Benchmark;

sub alphachunk {
my ($i, $x, $y) = @_;
if ($#$x > $i && $#$y > $i) {
return $x->[$i] cmp $y->[$i] || numchunk($i+1, $x, $y);
}
else {
return $x->[$i] cmp $y->[$i];
}
}

sub numchunk {
my ($i, $x, $y) = @_;
if ($#$x > $i && $#$y > $i) {
return $x->[$i] <=> $y->[$i] || alphachunk($i+1, $x, $y);
}
else {
return $x->[$i] <=> $y->[$i];
}
}

my @test1 = qw(
abc101def06 bbc999a abc101def5 abc90def07
abc
abcd
abcd0
10
abc101ab
abc101
zzz
zzzz
109zzz
zzz103
abc102

abc101def06
bbc999a
abc101def5
abc90def07
);

timethese( -5, {
steffen => sub {
my @data = @test1;
@data = map { $_->[0] }
sort {
my $i = 1;
my $l = (@$a - @$b > 0) ? @$a : @$b;
while ( $i < $l ) {
my $result;
if ( $i % 2 ) {
$result = $a->[$i] <=> $b->[$i];
} else {
$result = $a->[$i] cmp $b->[$i];
}
return $result if $result;
$i++;
}
return 1 if @$a;
return -1 if @$b;
return 0;
}
map {
[ $_, split /(\D+)/, '0' . $_ ]
} @data;
},
michael => sub {
my @data = @test1;
@data =
map {$_->[0]}
sort {numchunk(0, $a->[1], $b->[1])}
map {[$_, [("0$_" =~ /(\d+|\D+)/g)]]}
@data;
},
richard => sub {
my @data = @test1;
foreach(@data) {
s/(\d+)/,$1,/g;
$_ = [ split /,/, $_ ];
}

my @sorted = sort {
my ($aa, $bb);
my @a = @$a;
my @b = @$b;
while(@a && @b) {
$aa = shift @a;
$bb = shift @b;
if($aa =~ /^[0-9]/) {
return $aa <=> $bb if($aa != $bb);
} else {
return $aa cmp $bb if($aa ne $bb);
}
}
return +1 if(@a);
return -1 if(@b);
return 0;
} @data;

foreach(@sorted) {
$_ = join "", @$_;
}
},
} );


Michele Dondi

unread,
Aug 10, 2002, 10:25:15 AM8/10/02
to
On Fri, 09 Aug 2002 16:09:42 -0400, pizza <pi...@parseerror.com>
wrote:

>Michele Dondi wrote:

>ok, it ain't pretty and i'm pretty sure it ain't fast, but it does sort your
>example array

[...]

As for all the others who kindly answered my post, I'll have to study
the proposed solutions for a while, since I'm really a newbie and I
must first understand what they really do. Many thanks in any case...


Michele Dondi
--
Liberta' va cercando, ch'e' si' cara,
Come sa chi per lei vita rifiuta.
[Dante Alighieri, Purg. I, 71-72]

I am my own country - United States Confederate of Me!
[Pennywise, "My own country"]

Michele Dondi

unread,
Aug 10, 2002, 10:25:16 AM8/10/02
to
On Fri, 09 Aug 2002 16:26:15 -0500, Michael Carman
<mjca...@mchsi.com> wrote:

>On 8/9/02 3:54 PM, Michael Carman wrote:

>Okay, I thought of one.

[code snipped]

>And (trimmed) benchmarks, comparing it to my last post:

Well, I just tried it with a realistic test, which is the set of
strings I really needed it for, and it seems to me that it runs pretty
fast, taking into account that the test set is rather "big" IMHO
(13313 strings, for a total amount of about 515Kb).

I will try the same test with the other proposed solutions soon...

Ilmari Karonen

unread,
Aug 11, 2002, 9:01:41 AM8/11/02
to
In article <94n5luopvo64p9g2j...@4ax.com>, Michele Dondi wrote:
>I would like to sort strings alphabetically for what regards non digit
>chars and numerically for any block of digits. To make things clear,
>the following strings are sorted in the sense I want:
>
>1) Is there a ready made solution/package for a suitable comparison
>function to feed to "sort"?

Yes there is. It's called Sort::Naturally.

http://search.cpan.org/author/SBURKE/Sort-Naturally-1.01/

--
Ilmari Karonen - http://www.sci.fi/~iltzu/
"TIMTOWTDI often means there is more than one really bad way to do it."
-- after Tim Cuffel in comp.lang.perl.misc

Reply all
Reply to author
Forward
0 new messages