sorting strings numerically

0 views

Steffen Mueller

Aug 10, 2002, 10:11:44 AM8/10/02
to
"Richard J. Rauenzahn" <nos...@hairball.cup.hp.com> schrieb...
> Michael Carman <mjca...@mchsi.com> writes:
> >On 8/9/02 3:54 PM, Michael Carman wrote:
> >> On 8/9/02 2:39 PM, Michele Dondi wrote:
> >>>
> >>> I would like to sort strings alphabetically for what regards non digit
> >>> chars and numerically for any block of digits. To make things clear,
> >>> the following strings are sorted in the sense I want:
> >>
> >> [snip very crude approach]
> >>
> >> No sir, I don't really like it, but I can't think of a way to at least
> >> cache the decomposition.
> >
> >Okay, I thought of one.
>
> Here's another non recursive. It can be made into a schwartzian
> transform pretty easily. I make zero length substrings the lowest
> number and lowest letter and 'prepend' a zero length string if the
> element is /^[0-9]/. I'm not sure about its speed since I make copies
> for each sort iteration.

[...]

Here's how it stacks up to the other solutions:

Benchmark: running michael, richard, steffen, each for at least 5 CPU
seconds...

michael: 7 wallclock secs ( 5.49 usr + 0.00 sys = 5.49 CPU)
@ 524.05/s (n=2876)
richard: 8 wallclock secs ( 5.38 usr + 0.00 sys = 5.38 CPU)
@ 447.75/s (n=2408)
steffen: 6 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU)
@ 594.76/s (n=3085)

Code:
#!/usr/bin/perl
use strict;
use Benchmark;

sub alphachunk {
my (\$i, \$x, \$y) = @_;
if (\$#\$x > \$i && \$#\$y > \$i) {
return \$x->[\$i] cmp \$y->[\$i] || numchunk(\$i+1, \$x, \$y);
}
else {
return \$x->[\$i] cmp \$y->[\$i];
}
}

sub numchunk {
my (\$i, \$x, \$y) = @_;
if (\$#\$x > \$i && \$#\$y > \$i) {
return \$x->[\$i] <=> \$y->[\$i] || alphachunk(\$i+1, \$x, \$y);
}
else {
return \$x->[\$i] <=> \$y->[\$i];
}
}

my @test1 = qw(
abc101def06 bbc999a abc101def5 abc90def07
abc
abcd
abcd0
10
abc101ab
abc101
zzz
zzzz
109zzz
zzz103
abc102

abc101def06
bbc999a
abc101def5
abc90def07
);

timethese( -5, {
steffen => sub {
my @data = @test1;
@data = map { \$_->[0] }
sort {
my \$i = 1;
my \$l = (@\$a - @\$b > 0) ? @\$a : @\$b;
while ( \$i < \$l ) {
my \$result;
if ( \$i % 2 ) {
\$result = \$a->[\$i] <=> \$b->[\$i];
} else {
\$result = \$a->[\$i] cmp \$b->[\$i];
}
return \$result if \$result;
\$i++;
}
return 1 if @\$a;
return -1 if @\$b;
return 0;
}
map {
[ \$_, split /(\D+)/, '0' . \$_ ]
} @data;
},
michael => sub {
my @data = @test1;
@data =
map {\$_->[0]}
sort {numchunk(0, \$a->[1], \$b->[1])}
map {[\$_, [("0\$_" =~ /(\d+|\D+)/g)]]}
@data;
},
richard => sub {
my @data = @test1;
foreach(@data) {
s/(\d+)/,\$1,/g;
\$_ = [ split /,/, \$_ ];
}

my @sorted = sort {
my (\$aa, \$bb);
my @a = @\$a;
my @b = @\$b;
while(@a && @b) {
\$aa = shift @a;
\$bb = shift @b;
if(\$aa =~ /^[0-9]/) {
return \$aa <=> \$bb if(\$aa != \$bb);
} else {
return \$aa cmp \$bb if(\$aa ne \$bb);
}
}
return +1 if(@a);
return -1 if(@b);
return 0;
} @data;

foreach(@sorted) {
\$_ = join "", @\$_;
}
},
} );

Michele Dondi

Aug 10, 2002, 10:25:15 AM8/10/02
to
On Fri, 09 Aug 2002 16:09:42 -0400, pizza <pi...@parseerror.com>
wrote:

>Michele Dondi wrote:

>ok, it ain't pretty and i'm pretty sure it ain't fast, but it does sort your
>example array

[...]

As for all the others who kindly answered my post, I'll have to study
the proposed solutions for a while, since I'm really a newbie and I
must first understand what they really do. Many thanks in any case...

Michele Dondi
--
Liberta' va cercando, ch'e' si' cara,
Come sa chi per lei vita rifiuta.
[Dante Alighieri, Purg. I, 71-72]

I am my own country - United States Confederate of Me!
[Pennywise, "My own country"]

Michele Dondi

Aug 10, 2002, 10:25:16 AM8/10/02
to
On Fri, 09 Aug 2002 16:26:15 -0500, Michael Carman
<mjca...@mchsi.com> wrote:

>On 8/9/02 3:54 PM, Michael Carman wrote:

>Okay, I thought of one.

[code snipped]

>And (trimmed) benchmarks, comparing it to my last post:

Well, I just tried it with a realistic test, which is the set of
strings I really needed it for, and it seems to me that it runs pretty
fast, taking into account that the test set is rather "big" IMHO
(13313 strings, for a total amount of about 515Kb).

I will try the same test with the other proposed solutions soon...

Ilmari Karonen

Aug 11, 2002, 9:01:41 AM8/11/02
to
In article <94n5luopvo64p9g2j...@4ax.com>, Michele Dondi wrote:
>I would like to sort strings alphabetically for what regards non digit
>chars and numerically for any block of digits. To make things clear,
>the following strings are sorted in the sense I want:
>