How should I go about doing this in perl? Obviously I don't want to
read the whole thing into memory, and I need to stay away from dodgy
functions that turn hex into ascii.
If anybody's interested, it is 8 million digits of pi, in hex form (2
digits per byte). I already converted from ASCII into that packed
format, but now I need to search it.
There's a slow PHP version if you care. http://ericjiang.co.nr/pi.php
What do you mean "in hex form?" Do you just mean that each 4-bit nybble
represents an unsigned integer from 0 to 15?
look at the functions sysopen and sysread
this will do the trick. on sysread you can specify how many bytes to read.
>
> If anybody's interested, it is 8 million digits of pi, in hex form (2
> digits per byte). I already converted from ASCII into that packed
> format, but now I need to search it.
this sounds cool, I bought a book on pi, from B&N, and it had I think 10000
digits of pi.
>
> There's a slow PHP version if you care. http://ericjiang.co.nr/pi.php
--
Registered Linux User: 275424
Today's Fortune: The only constant is change.
perl -MMath::Pari=:prec=10000,Pi -wle "print Pi"
AFAIK, the algorithm takes O(n^3/2). To get a million of digits may
take several minutes on contemporary machines. 8e6 should be feasible
too; make take about hour (but if the cached size of memory is low,
can be much slower). In both cases you may need to increase the size
of the PARI stack, see the docs.
Hope this helps,
Ilya
I'm not too sure on the best way to go about searching it. With
sysopen, would I have to incrementally read more bytes? That would mean
looping over a lot - how would it affect the run speed?
How do I get it to give me hex instead of ascii? I don't want to
continually convert ascii to hex.
If it's 2 digits per byte, it's not in hex form. Hexadecimal is a
(typically ASCII) text representation of base-16 numbers, using the
characters '0'..'9', 'a'..'f'. It typically takes 8 bits per
hexadecimal digit.
The format you're describing sounds like plain binary. Or is it
binary-coded decimal?
If it's binary-coded decimal, it's going to look something like this:
0011 0001 0100 0001 0101 1001 ...
(3) (1) (4) (1) (5) (9) ...
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
One more thing: Why don't you want to read the whole thing into
memory? On most modern systems, 4 megabytes isn't an unreasonable
amount.
>One more thing: Why don't you want to read the whole thing into
>memory? On most modern systems, 4 megabytes isn't an unreasonable
>amount.
Yes, that is exactly what I would do.
And for search purposes, since you search on decimal digits, it would
be great to put them all in a hash (since memory shouldn't be a
problem) like that:
{1} => 1
{2} => 4
{3} => 1
{4} => 5
{5} => 9
...
Then sort the hash on the values and you would easily be able to
compute stats like how many 4's are there?, how many times are
identical consecutive digits encountered?, etc.
Sébastien
Sébastien Nadeau
Technicien en informatique
Bibliothèque de l'Université Laval
(418) 656-2131 ext. 6815
Notice of Confidentiality
<http://www.rec.ulaval.ca/lce/securite/confidentialite.htm>
Even if 4 megabytes of data is reasonable, once you start doing things
like creating hashes with 8 million elements, or sorting
8-million-element arrays, you might start having resource problems.
Binary-coded decimal actually. I was referring to it like that because
the digits 1 and 2 would show as 12 in hex.
so.. (3.) 14159265 would be
14 15 92 65
when viewed in a hex editor.
So what's the best function to search for strings of numbers in that,
asides from turning it into an ascii string in memory? What kind of
functions does Perl have that would be suited to that? I searched
perl's manual, but didn't find anything.
Or would it be better to just expand the whole thing into human-readable
ASCII string in memory? It seems like that would take all the advantage
out of it.
If you mean you want to search for a string of digits like 14159265
and there may be an odd number of digits, or you want to find them
whether they start at an odd or even position, you have enormously
complicated things by packing your data that way. You could come
up with a regex that would work, but it's much less trouble just
to have your file not packed.
>Even if 4 megabytes of data is reasonable, once you start doing things
>like creating hashes with 8 million elements, or sorting
>8-million-element arrays, you might start having resource problems.
I agree with that.
In theory sorting 8 000 000 digits with a quicksort is not much
longer than sorting an array of 100 000. Of course this is not true
anymore if you need to swap memory pages while sorting 8 000 000. But
even printing Hello World will be a pain if memory is filled. I
should have said: "if memory is not an issue on your system" instead
of "memory shouldn't be a problem".
It doesn't. Hardly anyone ever does what you're doing, so there's no
built-in function for it.
> Or would it be better to just expand the whole thing into human-readable
> ASCII string in memory?
That's what I would think.
> It seems like that would take all the advantage out of it.
Now you're like the guy who stores all his groceries by gluing them to
the ceiling. It keeps the stuff out of the way, and the ceiling was
ormerly just wasted space. But then the guy wants to know how he can
cook the food without unsticking it from the ceiling. "You mean I
have to take it down again to eat it?" he asks. "It seems like that
would take all the advantage out of it."
Yeah, I guess so. The house doesn't have a special stove for cooking
food that's attached to the ceiling, but that's not necessarily a
defect in the design of the house.
Well, anyway, I am not getting to the point of this article, which is
to present some code that searches BCD data without unpacking it. I
did it as an exercise, not because I wanted to recommend this as a
worth while thing to do.
The idea is this. Suppose you are looking for $needle = "123456" in
some BCD-coded target string $haystack. The first thing you can do is
to use
index($haystack, bcd_pack($needle))
where bcd_pack turns "123456" into "\x12\x34\x56".
This searches for the needle directly, and index() uses a fast
string-search algorithm. If you find "\x12\x34\x56", you win. But if
you don't find it, you're not finished yet, because the haystack might
contain (for example) "\x91\x23\x45\x69" which should also match. So
you need to do a trick. You trim off the "1" and the "6" from
"123456" and search for "\x23\x45" instead; if you find any matches,
you make sure they are preceded by something of the form X1 and
followed by something of the form 6X. If so, you win.
There are some other details you have to worry about, but I think at
this point looking at the code would bne easier than following an
explanation, so here it is:
================================================================
#!/usr/bin/perl
use strict 'vars';
# Utility function
#
# $haystack is the string to be searched, in BCD form
# $needle is the thing we're looking for, also in BCD form
# we only get a successful match if $needle appears,
# immediately preceded by $head and followed by $tail
#
# Either or both of head and tail may be undefined
#
# Returns leftmost position at which there is a match,
# or -1 if there is none
sub _search_bcd_util {
my ($haystack, $needle, $head, $tail) = @_;
my $nlen = length($needle);
my $cur_pos = 0 + defined($head) ? 1 : 0;
my $pos;
while ($cur_pos <= length($haystack)) {
$pos = index($haystack, $needle, $cur_pos);
return -1 if $pos == -1;
next if defined($head)
&& (substr($haystack, $pos-1, 1) & "\x0f") ne $head;
next if defined($tail)
&& (substr($haystack, $pos+$nlen, 1) & "\xf0") ne $tail;
return $pos;
} continue {
$cur_pos = $pos+1;
}
return -1;
}
sub search_bcd {
my ($haystack, $needle) = @_;
my $hlen = length $haystack;
my $nlen = length $needle;
my $odd = $nlen % 2;
my $bcd1 = pack "H*", substr($needle, 0, $nlen - $odd);
my $bcd2 = pack "H*", substr($needle, 1, $nlen - 2 + $odd);
my $tail = pack "H2", substr($needle, -1, 1) . "0";
my $head = pack "H2", "0" . substr($needle, 0, 1);
my $found1 = _search_bcd_util($haystack, $bcd1, undef, $odd ? $tail : undef);
my $found2 = _search_bcd_util($haystack, $bcd2, $head, $odd ? undef : $tail);
return -1 if $found1 < 0 && $found2 < 0;
return $found1 == -1 ? 2*$found2 -1 :
$found2 == -1 ? 2*$found1 :
$found1 < $found2 ? 2*$found1 :
2*$found2 -1 ;
}
if ($ARGV[0] eq "TEST") {
require Test::More;
Test::More->import("no_plan");
my $pi = "141592653589793238";
my $haystack = pack "H*", $pi;
is (search_bcd($haystack, "653"), 6);
is (search_bcd($haystack, "535"), 7);
is (search_bcd($haystack, "6535"), 6);
is (search_bcd($haystack, "5358"), 7);
for my $i (0 .. length($pi)-2) {
for my $j (1 .. length($pi)-$i-1) {
my $needle = substr($pi, $i, $j);
is(search_bcd($haystack, $needle), index($pi, $needle),
"Looking for <$needle> at position $i");
}
}
for my $failure (qw(0 12 43 142 999 373 142857)) {
is(search_bcd($haystack, $failure), -1,
"search for absent '$failure'");
}
}
================================================================
This seems to work OK, although I did not make an effort to test it
carefully, so I am not entirely sure. I have not tried to optimize it
for speed. I don't know if this would be worth doing for your
application, either with or without speed optimizations. I suspect
not, but who knows?
"Hope this helps."
> If you mean you want to search for a string of digits like 14159265
> and there may be an odd number of digits, or you want to find them
> whether they start at an odd or even position, you have enormously
> complicated things by packing your data that way.
"Enormously"??? I doubt it. Fixed strings are easy:
/\x14\x15\x92\x65/ || /[\x01...\xF1]\x41\x59\x26[\x50-\x5F]/
(of course, this leading [\x01...\xF1] should be completely spelled out).
What *is* enormously complicated is looking for regular expressions in
the expanded string - without expansion...
Hope this helps,
Ilya
I didn't mean to imply that it was objectively difficult, just that it
increases the complexity of a solution by a factor of 3 or more.
Just a routine to compute the regex to use for a given series of digits
is going to be as complicated as the entire code needed to search the file
for the digits in ascii form.
Yeah. If anything, it will get worse. The basic problem is that the
smallest piece of data that is easy to work with is a single byte
(which is pretty much the definition of a byte) and the data format
he's chosen stores a digit in a smaller space than that.
The only way I can imagine to solve the problem in a lower-level
language is to do pretty much the same thing that I did in Perl, but
translated into the lower-level language. And maybe you'd have to
implement index() from scratch too.
You see this kind of thing a lot in beginning programmers. They still
have a very hazy idea of what the computer is doing and why it does
what it does; the conceptual connection between what they put in and what comes
out is still uncleare to them. The output of a computer program is
an extremely complicated function of the input, and if you don't have
a good undersatnding of the very complicated internal mechanism, it
appears to be random.
This is why beginning programmers try putting random code into their
programs in hopes that it will work, and why it is so hard to get them
to try to read and understand an error message.
Here I think the same problem is manifesting in a different way.
Programs work by magic, so Mazin's approach is to look for the
appropriate magic incantation in Perl that will magically solve his
problem. If Perl doesn't have the right incantation, then he'll go
try a different language that might have it.
: Yeah. If anything, it will get worse. The basic problem is that the
: smallest piece of data that is easy to work with is a single byte
: (which is pretty much the definition of a byte) and the data format
: he's chosen stores a digit in a smaller space than that.
And the silly thing is that he's not really saving much space. It's not
as if he's got some great 10:1 compression ratio happening, or gigabytes
of data to squeeze into memory.
Premature optimization!
--
This programmer available for rent.
Sorry, but this argument is, in general, BS-ish. Most of the time it
is not a compression ratio which matters, but whether the working set
fits in one of "caches" (especially if you consider the memory as the
cache of the swap space ;-).
So if it is on a Sparc with 8M second-level cache, his choice of
keeping the data compressed *may* be beneficial in some situations.
Hope this helps,
Ilya
Mark
Um, what theory would that be?
Quicksort is O(n * log n). Sorting 8,000,000 items should require
about 110 times as many operations as sorting 100,000 items. If the
time to sort 8,000,000 items is very short, there's not much real
difference -- but I just tried an experiment, and the program crashed
before it finished sorting.
Well, maybe. There are languages that implement packed arrays.
In Pascal, for example, I think the syntax would be
packed array [lo..hi] of 0..15
Ada has a similar feature.
This should be a good match for the way he's storing the data, and
he can use ordinary array indexing to search the array.
Of course, the compiler is going to have to transform all the indexing
operations into something that extracts data from individual bytes.
Depending on how clever the compiler is, this may or may not result in
faster code than what you'd get if you manually coded it in Perl or C.
It will almost certainly be slower than an unpacked array (one element
per byte) -- but cache size issues make it unclear what the best
tradeoff is going to be.
Well, my watch still does not have 8M of builtin memory...
> PI doesn't repeat - that makes it harder to do the searching since any
> string will occur only once.
Doubtlessly, you wrote this before your morning coffee, right?
Hope this helps,
Ilya
> Well, maybe. There are languages that implement packed arrays.
> In Pascal, for example, I think the syntax would be
> packed array [lo..hi] of 0..15
> Ada has a similar feature.
And don't forget the Z80. I recall how I calculated once a mersenne prime
number of which I knew the power, by doubling the number, and using BCD to
store the number :-) The Z80 has some BCD operations, can't recall all
though.
--
John MexIT: http://johnbokma.com/mexit/
personal page: http://johnbokma.com/
Experienced programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
>Sebastien Nadeau <sebastie...@bibl.ulaval.ca> writes:
> > At 20:37 2005-09-26, you wrote:
> >
> >>Even if 4 megabytes of data is reasonable, once you start doing things
> >>like creating hashes with 8 million elements, or sorting
> >>8-million-element arrays, you might start having resource problems.
> >
> > I agree with that.
> >
> > In theory sorting 8 000 000 digits with a quicksort is not much longer
> > than sorting an array of 100 000.
>
>Um, what theory would that be?
>
>Quicksort is O(n * log n). Sorting 8,000,000 items should require
>about 110 times as many operations as sorting 100,000 items. If the
>time to sort 8,000,000 items is very short, there's not much real
>difference -- but I just tried an experiment, and the program crashed
>before it finished sorting.
Exactly the theory you explained. There's not an order of magnitude
between sorting 100,000 and 8,000,000 items. Only a multiplicative
constant. Quicksort in the worst case is O(n^2), but will generally
run in O(n * log n) when well implemented. I agree with you however:
I talked before testing. So I tried this script:
use strict;
my @digits;
my $time1 = time;
foreach my $i (1 .. 1000000) {
push @digits, int (rand 10);
}
my @sorted = sort @digits;
my $time2 = time;
my $total = $time2 - $time1;
print "$total seconds\n";
exit;
And the first time it sorted 1,000,000 digits in 3 seconds.
It took 5 seconds to sort 2,000,000 and 8 seconds to sort 3,000,000.
That's exactly what I was expecting.
However, sorting 4,000,000 took 20 seconds (that's what it printed)
and then Perl crashed (or I was not patient enough to wait the whole
day so I killed the process). Indeed, I didn't even try with 8,000,000.
Since quicksort is an inplace sort, it shouldn't be a memory problem,
but in every occasions the perl process memory usage jumped to reach
the limit of available memory of my machine (I'm running with 512
megs or physical memory).
Can someone explain that?
Martin wrote me on this also but his message gave me an idea on how this
person's problem could be solved. What do you think of this:
1. Build $index_1[0-9][0->#]
2. Start loop looking for strings.
2a. Use $index_1 to index into the BCD string.
3. Build reduced search index ($index_2[0->#])
using a max of two, three, or four BCD characters.
4. Use reduced $index_2 as starting point to look for string.
5. End of loop
Sort of like a divide and conquer algorithm but not quite.
Think that might work?
Since big O is defined as the asymptotic worst case performance of an
algorithm, how is it "worst case O() and average O()?" (This is really
a question, I have no doubt in my mind that I've missed something, or
forgotten something from my school days)
> Since quicksort is an inplace sort, it shouldn't be a memory problem,
> but in every occasions the perl process memory usage jumped to reach
> the limit of available memory of my machine (I'm running with 512
> megs or physical memory).
>
> Can someone explain that?
Quicksort is a recursive algorithm, which would mean that every call to
it is going to consume just a bit more stack. Do that a few million
times and you're chipping away at your RAM.
--T Beck
But quicksort isn't going to recurse a few million times except on
pathological data.
But in any case, perl no longer defaults to using quicksort.
See: http://search.perl.org/perldoc/sort
The call consumes memory, but the memory is released when the call returns.
In nearly all cases, the maximum amount of stack space used when
sorting N items is only O(log(N)).
> On Fri, Sep 30, 2005 at 10:15:17AM -0700, T Beck wrote:
>> Sebastien Nadeau wrote:
>> > Since quicksort is an inplace sort, it shouldn't be a memory problem,
>> > but in every occasions the perl process memory usage jumped to reach
>> > the limit of available memory of my machine (I'm running with 512
>> > megs or physical memory).
>> >
>> > Can someone explain that?
>>
>> Quicksort is a recursive algorithm, which would mean that every call to
>> it is going to consume just a bit more stack. Do that a few million
>> times and you're chipping away at your RAM.
>
> But quicksort isn't going to recurse a few million times except on
> pathological data.
And even then IIRC, one is tail recursion (so can be optimized away), and
for small number of items one shouldn't use quicksort in the first place
:-).
Basically, Perl allocates memory but does not free up memory as quickly
as it gobbles it up. That is to say that the garbage collection routine
waits a while to see if the memory is going to be used again. So if the
Quicksort is going at a speed of, say five, the garbage collector is
only going at a speed of between one and three. So memory is growing
simply because Perl's garbage collector hasn't realized that some of the
memory can be released. That is also why you see it go up, then down,
then farther up, then down a bit, and up again. The garbage collector
is trying to keep up but the priority is on doing the quicksort. It is
assumed that you have enough memory to handle doing the sort in the
first place.
You can write your own Quicksort and alleviate the memory problems since
you can make the original array global versus local. Then all you are
doing is moving pointers into the array around rather than the array
itself all the time. It is a bit slower than the built-in Quicksoft,
but not by that much.
An excellent (although out of print) source of information on sorting
algorithms is the old Nibble magazine (from the Apple ][ era). It
contained no fewer than six different sorting algorithms written out
completely in flowcharts and Applesoft. These, in turn, were gotten
from Donald Knuth's books on mathematics et al. (Tomes actually.)
For alphabetic sorting, I have found the Postal Sort to be the quickest
and easiest to use. If you want more info just e-mail me.
Mark
Sebastien Nadeau wrote:
> Can someone explain that?
That's not the definition. In general, a quantity Q is said to be
O(f(x)) for some function f(x), if there is some constant C so that,
for sufficiently large x,
Q < C * f(x)
If the running time of an algorithm is O(f(x)), then
running time < C * f(x)
for some constant C and sufficiently large x. Clearly, this is a
statement about the worst case of the algorithm.
But you can also say that the average running time of some algorithm
is O(f(x)), which means that there is a constant C, etc., so that
average running time < C * f(x)
So it is correct to say that the running time of quicksort is not
O(x log x), but the average running time of quicksort *is* O(x log x).
Hope this helps.
> Yes - kind-of. I've had this happen many times and someone on the Perl
> list explained it to me years ago. So I will try to remember what they
> said.....
You remember (or were told) wrongly.
> Basically, Perl allocates memory but does not free up memory as quickly
> as it gobbles it up. That is to say that the garbage collection routine
> waits a while to see if the memory is going to be used again. So if the
> Quicksort is going at a speed of, say five, the garbage collector is
> only going at a speed of between one and three. So memory is growing
> simply because Perl's garbage collector hasn't realized that some of the
> memory can be released. That is also why you see it go up, then down,
> then farther up, then down a bit, and up again. The garbage collector
> is trying to keep up but the priority is on doing the quicksort. It is
> assumed that you have enough memory to handle doing the sort in the
> first place.
I am inclined to say that this is gibberish. Perl's garbage collector
has no speed at which it is going. It is a simple reference-counting
mechanism. Once the reference-count of a variable drops to zero its
memory is not exactly freed but rather marked as reusable.
What you describe requires an asynchrous mode of operation, using a
dedicated garbage-collecting-thread. There is no such thing in perl.
> You can write your own Quicksort and alleviate the memory problems since
> you can make the original array global versus local. Then all you are
> doing is moving pointers into the array around rather than the array
> itself all the time. It is a bit slower than the built-in Quicksoft,
> but not by that much.
Have you any benchmarks to back that claim? I have one here, and
depending on the size of the array to be sorted, my handrolled version
is between 9 and 13 times slower than the built-in:
#!/usr/bin/perl -w
use Benchmark qw/cmpthese/;
sub qsort {
my ($cmp, $ary, $left, $right) = @_;
return if $left >= $right;
my $piv = ($left+$right) / 2;
@$ary[$left, $piv] = @$ary[$piv, $left];
my $last = $left;
*main::b = \$ary->[$left];
for ($left+1 .. $right) {
*main::a = \$ary->[$_];
++$last and @$ary[$_, $last] = @$ary[$last, $_] if &$cmp < 0;
}
@$ary[$left, $last] = @$ary[$last, $left];
qsort($cmp, $ary, $left, $last-1);
qsort($cmp, $ary, $last+1, $right);
}
cmpthese(1, { # for huge lists, 1 iteration is already painful enough
psort => sub {
my @ary = reverse 1 .. 2_000_000;
qsort(sub { return $a <=> $b }, \@ary, 0, $#ary);
},
csort => sub {
my @ary = reverse 1 .. 2_000_000;
@ary = sort { $a <=> $b } @ary;
},
});
Granted, this is not the most efficient qsort, however choosing the
middle element as pivot element is the best choice for an already
pre-sorted list so that should make up a bit for the lack of
optimization. Also note that the requirement of doing a real in-place
sort makes this routine quite ugly with its explicit passing-around of
the sub-array-bounds.
What can be said is that psort uses only around a half or third memory
of what csort uses for a list with 2 million elements. The latter is
heavily swapping on my machine, and it yet it's still much faster.
Therefore I conclude a handrolled qsort is only a useful option when
perl's sort bails out with out-of-memory.
Tassilo
--
use bigint;
$n=71423350343770280161397026330337371139054411854220053437565440;
$m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
> Quicksort is a recursive algorithm
There are non-recursive Quicksort-implementations too.
http://en.wikipedia.org/wiki/Quicksort
For some kinds of data: Radix, Bucket and Postman's sort are interesting
alternatives.
--
Affijn, Ruud
"Gewoon is een tijger."
> I am inclined to say that this is gibberish. Perl's garbage collector
> has no speed at which it is going. It is a simple reference-counting
> mechanism. Once the reference-count of a variable drops to zero its
> memory is not exactly freed but rather marked as reusable.
>
> What you describe requires an asynchrous mode of operation, using a
> dedicated garbage-collecting-thread. There is no such thing in perl.
No. I did not mean to convey that the garbage collector is a separate
thread or process (although it may be) - only that it has to wait and
verify that the memory is available for recouping. Therefore, it runs
slower (or maybe I should say at a slower rate) than the quicksort
itself is using up memory. Further, from what I understand about Perl -
which can be wrong also - Perl expects you to have the needed memory so
it can complete whatever function you are calling/using. Thus, the
quicksort itself has a higher priority than recouping memory. Even if
Perl isn't multithreaded (which I thought they had put in back in the
90's so Perl could be used with Apache more readily and handle multiple
threads at one time), the quicksort routine would then be called and,
since there aren't multiple threads, the quicksort routine would not
release the memory until it was through using it. Which again would
mean that the quicksort routine had higher priority than the garbage
collector as the GC could not run until after QS had finished.
If, however, Perl does use multithreading, then it would stand to
reason, IMHO, that the GC would be put on a separate thread so it could
monitor and try to grab memory back without having to wait for any
process Perl is running otherwise to complete. Applesoft, from the old
Apple][+ days had a GC which would interrupt programs running on the
Apple][+ to do garbage collection. This was slow and tedious because
your program would just stop functioning until the GC was through. With
the advent of ProDOS, Apple figured out a way to make the garbage
collector steal time from a running program to do the GCing. With
MS-DOS, QBasic used a similar scheme to recover their memory. Perl's
first implementation used the same type of scheme from QBasic but when
multithreading came along I thought they had changed over to using
threads for these types of things. From what I have read (and
understand) Java uses this type of an approach so the program doesn't
have to slow down and so does PHP.
>>You can write your own Quicksort and alleviate the memory problems since
>>you can make the original array global versus local. Then all you are
>>doing is moving pointers into the array around rather than the array
>>itself all the time. It is a bit slower than the built-in Quicksoft,
>>but not by that much.
>
>
> Have you any benchmarks to back that claim? I have one here, and
> depending on the size of the array to be sorted, my handrolled version
> is between 9 and 13 times slower than the built-in:
The term "a bit more" is not a dependent clause to "It will run as fast
as" or even "It will run nearly as fast as". It is just slower. Thus
"a bit more" is a relative term and not to be taken as a precise
measurement of time, size, length, breath, width, or any other
measurement term. Between 9 and 13 times slower is "a bit more slower".
Just as between 1 and 2 times slower is "a bit more slower" and 1,000
to 1,000,000 is "a bit more slower" - only usually used to denote
sarcasm because it is really a bit more slower rather just a little
slower. ;-)
> Granted, this is not the most efficient qsort, however choosing the
> middle element as pivot element is the best choice for an already
> pre-sorted list so that should make up a bit for the lack of
> optimization. Also note that the requirement of doing a real in-place
> sort makes this routine quite ugly with its explicit passing-around of
> the sub-array-bounds.
And that is a bad thing because....?
>
> What can be said is that psort uses only around a half or third memory
> of what csort uses for a list with 2 million elements. The latter is
> heavily swapping on my machine, and it yet it's still much faster.
>
> Therefore I conclude a handrolled qsort is only a useful option when
> perl's sort bails out with out-of-memory.
Wasn't that what we were talking about? That he had run a QS and had
run out of memory? So the suggestion was you could write your own QS
and it would run slower but by an unknown amount - just a bit slower. :-)
> [...] my handrolled version
> is between 9 and 13 times slower than the built-in:
> [...]
> Therefore I conclude a handrolled qsort is only a useful option when
> perl's sort bails out with out-of-memory.
Yes. It seems to be the case here however. 9 to 13 times is not so
dramatic since it's not a plunge from an order of magnitude.
Especially if the built-in version crashes, I'm able to wait a little
more if the sort is not part of a time critical routine.
Anyway, just before leaving friday I tried:
use sort "_quicksort";
Since Perl's default sort is now a merge sort. And indeed, it sorted
a list of 4,000,000 digits in 18 seconds without crashing.
Sébastien