I have a huge array of string. Lets just say I have 1 million strings.
I want to sort the alphabetically (obviously), but the quickest
sorting algorithm I’ve stumbled across lately is QuickSort, (the
one in the thread example that comes with Delphi). But it’s not
that “quick” when it has to deal with over 10-20 thousand
elements.
Therefore I want to have a procedure written in asm so I really can
gain some speed. But here is the catch………..I suck at
asm! I have little, well actually almost NO EXPERIENCE with asm.
That’s why I’m asking for help!
...(I need that damn array sorted FAST)
I have a huge array of string. Lets just say I have 1 million strings.
I want to sort the alphabetically (obviously), but the quickest
sorting algorithm I've stumbled across lately is QuickSort, (the one
in the thread example that comes with Delphi). But it's not that
"quick" when it has to deal with over 10-20 thousand elements.
Therefore I want to have a procedure written in asm so I really can
gain some speed. But here is the catch....I suck at asm! I have
little, well actually almost NO EXPERIENCE with asm. That's why I'm
have you considered using third party libraries ?
Is it possible to handle the sorting when elements are added to the list,
and not afterwards on the whole array ?
if the list is also built by your application, it may be possible to process
strings one by one when they have to be inserted. Time is generally less
critical at this stage. At the end you get a sorted list (a list and an
index).
if you get the list from an external file, this will not help.
Denis
"König" <el...@mailme.dk> wrote in message
news:a2937919.02060...@posting.google.com...
>I have a huge array of string. Lets just say I have 1 million strings.
>I want to sort the alphabetically (obviously), but the quickest
>sorting algorithm I’ve stumbled across lately is QuickSort, (the
>one in the thread example that comes with Delphi). But it’s not
>that “quick” when it has to deal with over 10-20 thousand
>elements.
>...(I need that damn array sorted FAST)
First, how fast does it need to be? In my test, Quicksort sorts
20,000 random 20-character strings in less than 0.05 seconds (Delphi
6, O+,R-,Q-, 1.3 GHz Athlon). It sorts 1,000,000 of them in 5.6
seconds.
Secondly, I doubt an asm version will be much faster.
In terms of it being the fastest there is, lets see how many other people in
this newsgroup know this, the complexity of a quicksort function is
something in the region of O(n log n ) this is exactly the same as merge
sort (the different being the position at which the sorting is done,
quicksort on the way down, mergesort on the way up)
I came accross an algorithm for sorting numbers (given enough storage, which
for a million is more than plentyful) that can sort descrete numbers in
O( n ) time complexity and hence is significantly faster than quick-sort,
merge-sort and other O(n log n) based methods. But as you have already
gathered the fact that it was designed to sort integers it a major
limitation.
I did adapt this algorithm successfully to sort strings using a hash code
and it worked well. That was some time ago, and it was used to sort fixed
length fields (unique offline identifiers). Variable length items are
another matter. I am sure it is possible but I have not spent enough time
thinking about it to work it out.
Anyway, enough of my rabbiting. Assembler should not be needed in this case
20,000 records is nothing.
Ethar
"fat G" <fa...@NO-SPAMemail.si> wrote in message
news:tScM8.346$w5.1...@news.siol.net...
"König" <el...@mailme.dk> wrote in message
news:a2937919.02060...@posting.google.com...
"Ethar Alali" <som...@dsl.pipex.com> wrote in message
news:3d0176a2$0$238$cc9e...@news.dial.pipex.com...
To follow up on the discussion, I'll give you a little more
information: (sorry, should have done that to begin with)
It's NOT possible to sort when elements are added to the list. I have
a complete list and then I want to sort it.
I mentioned that it was an array of string, but really it's an array
of "ElementInfo"
...ElementInfo??? What??!
Well....
type
ElementInfo = record
s1,s2,s3: string;
end;
I want to sort the list alphabetically according to s1.
I've tried the QuickSort that is in the threads example, but it sorts
an array [0..25000] of ElementInfo in about 5-6 seconds. Why is that?
This is how QuickSort looks:
procedure QuickSort(var A: array of ElementInfo; iLo, iHi: Integer);
var
Lo, Hi, Mid: Integer;
T: ElementInfo;
begin
Lo := iLo;
Hi := iHi;
Mid := A[(Lo + Hi) div 2].s1;
repeat
while A[Lo].s1 < Mid do Inc(Lo);
while A[Hi].s1 > Mid do Dec(Hi);
if Lo <= Hi then
begin
T := A[Lo];
A[Lo] := A[Hi];
A[Hi] := T;
Inc(Lo);
Dec(Hi);
end;
until Lo > Hi;
if Hi > iLo then QuickSort(A, iLo, Hi);
if Lo < iHi then QuickSort(A, Lo, iHi);
end;
P.S.
Sorry for last time posting the same message twice! I don't know what
happened!!!
So for purposes of sorting, it's still an array of string.
>I've tried the QuickSort that is in the threads example, but it sorts
>an array [0..25000] of ElementInfo in about 5-6 seconds. Why is that?
Probably because when switching records, it's switching both
the whole records. Try sorting a list of pointers to every record,
so it switches only the pointers.
Groetjes,
Maarten Wiltink
>Probably because when switching records, it's switching both
>the whole records. Try sorting a list of pointers to every record,
>so it switches only the pointers.
Even when switching records, my quicksort is nearly 100 times faster.
>type
> ElementInfo = record
> s1,s2,s3: string;
>end;
How long are the strings? When I run it with random 20 char strings
it takes a small fraction of a second.
>procedure QuickSort(var A: array of ElementInfo; iLo, iHi: Integer);
>var
> Lo, Hi, Mid: Integer;
> T: ElementInfo;
Mid should be a string.
>I've tried the QuickSort that is in the threads example, but it sorts
>an array [0..25000] of ElementInfo in about 5-6 seconds. Why is that?
Are yuo sure that this is the time to sort only, not reading from disk
or anything else? What speed CPU are you using? How long are the
strings?
Is your array by any chance already partly sorted? QuickSort is very
quick if the list to be sorted is initially in a random order, but it
has one serious drawback: if the list to be sorted is already almost
sorted into order, then QuickSort becomes very slow. In fact it is
slowest of all if you use it on a list which is in fact already
completely in order. This is very odd behaviour: I know of no other
sorting algorithm that behaves like this, and some are quicker on an
almost sorted list than on a random one, which seems much more in
keeping with common-sense: there is less work to do!
If this is what is happening in your case the best solution is to use
a different sorting algorithm. ShellSort works at a speed of the same
order of magnitude as QuickSort on a random list, and does not slow
down on a partly sorted list. Perhaps a web search for 'ShellSort'
will prvide you with something usable. Failing that, you could first
use a random number generator to 'unsort' your array, and then sort it
with QuickSort. This is a weird idea, but it actually can sometimes
result in a speed increase.
Michael Davis.
Quite incorrectly, ShellSort is nowhere near the speed of QuickSort. The
shell sort has a time complexity set of O( n sqrt( n ) ) which means that
for 20,000 records, we are looking at something in the order of 2828427
comparisons (and hence multiply with the time for one comparison) whilst
quicksort sorts in the 285754 comparisons, an order of magnitude of almost
10 times faster in worst case. This of course is true only for large sorts,
for small sorts it is unlikely to show up as a significant performance
difference in application programs or for small amounts in the practical
scheme of things. If you need to sort a large array you should not consider
ShellSort etc.
There is good reason to use (or learn to use, which unfortunately most
programmers dont know) asymptotic notation and Algorithmic methods (no I
have not mis-spelled that, I do not mean algorithms, I mean the analysis of
the algorithm for performance predictability. If you thought I was stupid,
who's laughing now!). Mathematicians can teach software developers a lot and
they should listen!! After all we should be pushing for engineering
principles here.
... and breathe, excuse me while I go count to ten.
}// End function rant_and_rave
Ethar
P.S. Something I forgot to ask, why are you trying to write your own
quicksort when there exists one in the VCL?
"Michael Davis" <miki...@yahoo.co.uk> wrote in message
news:62f60a1.02061...@posting.google.com...
>Quite incorrectly, ShellSort is nowhere near the speed of QuickSort. The
>shell sort has a time complexity set of O( n sqrt( n ) ) which means that
>for 20,000 records, we are looking at something in the order of 2828427
>comparisons (and hence multiply with the time for one comparison) whilst
>quicksort sorts in the 285754 comparisons, an order of magnitude of almost
>10 times faster in worst case.
In practice it is a lot closer. On my tests with sorting 20,000
integers and with 20-char strings, Quicksort is 2.6 to 2.7 times
faster than Shellsort. We're talking about milliseconds too.
The second thing you must bear in mind, is that the O() functions do not
care about the scalar coefficient of any polynomial value. They care about
the order because it is a set of possible values of result time. Hence 2n
log n, 3n log n and 4n log n are all in O( n log n ). The scalar factors can
appear to speed it up by a certain amount but it is nothing compared to the
increase due to an order of power. e.g. 30 * 20000 <<< 20000 * 20000.
>We're talking about milliseconds too
This is right, but I was talking about microseconds. One thing you have to
look at is that due to the fact that the ShellSort algo. is a O( n
sqrt( n ) ) gradient curves upwards whilst O( n log n ) curves downwards,
thus there is a point where O( n sqrt( n ) ) member 'a' and O( n
sqrt( n ) ) variable 'b' are equal. draw the graphs of these two functions
superimposed on each other and you will see what I mean. Test it out with
smaller values and you may find a situation where given a certain value of n
and an implimentation of shell sort, the ShellSort algo actually outperforms
quicksort, for a given constant c. As an example, if n = 2 and c = 4 then:
cn log2 n is member of O( n log2 n ) = 8 log2 4 = 16
n sqrt( n ) = 2 sqrt( 4 ) = 4
This is obviously a trivial example, but it makes the point. The scalar
factors depend on the implimentation of the algorithm, processor power,
compilers etc. and is only one member of the set of O( n log n ) for
example. SO your sort of 20000 producing 2.6 - 2.7 times is NOT outside the
bounds of the theoretical, it is a member of it, just like a lot of things
in the practical world that practitioners assume theorists are wasting their
time on.
Thanx for you comments. Much appreciated.
Ethar
"Jan Philips" <jud.mc...@mindspring.com> wrote in message
news:dh4aguofdbn2f9u95...@4ax.com...
I was under the impression that ShellSort was of order O(n log n), but
I am willing to accept that I was wrong. It is actually some years
since I have had any reason to go into these details. However,
someone else has pointed out in a message on another thread that
MergeSort is of order O(n log n), so that might be a solution to the
original problem.
> > P.S. Something I forgot to ask, why are you trying to write your own
> quicksort when there exists one in the VCL?
>
The original message says "I've stumbled across lately is QuickSort,
(the
one in the thread example that comes with Delphi)." That doesn't
sound to me like someone trying to write their own QuickSort!
Michael Davis
No not trying to wrte their own quicksrt, but it is not the same as the VCL
quicksort (at least not exactly)
Anyway, public apology. Sorry. I make a habit of that. Too much time in
front of a computer, thats my problem. This is not a reflection on yourself
or other programmers. It can get a little frustrating at time, but I have to
realise that I have been there too. But I also learned that just coding will
get nowhere. Have to plan what you want to do. Design and engineering are
ways meeting constraints (amongst other things of course)
The whole asymtotic notation for Shellsort can be derived form the algorithm
itself. I use algorithmics on a day to day basis to meet the constraints
specified in the requirements given several dimensionos such as time to
impliment, space requirements, performance etc. and linear programming
becomes a mighty useful tool.
Again sorry for the rant and rave function.
Ethar
"Michael Davis" <miki...@yahoo.co.uk> wrote in message
news:62f60a1.02061...@posting.google.com...
>That is fine, doesn't surprise me, the thing I highlighted in my previous
>e-mail is that the
>notation referres to the average speed of the algorithm being bound from
>above by O(n log n) an from below by something else.
Quicksort is not bounded above by O(n log n), it is O(n^2) in the
worst case. The average is O(n log n) though.
>This is right, but I was talking about microseconds. One thing you have to
>look at is that due to the fact that the ShellSort algo. is a O( n
>sqrt( n ) )
Shell sort is (much) better than that. Depending on the
implementation, it can be n * (log n)^2, n^(5/4), n^(7/6) for
practical versions, or asymptotically n^(1+epsilon) for any epsilon >
0.
>gradient curves upwards whilst O( n log n ) curves downwards,
I'm not sure what you mean here. Do you mean like the slope or
derrivative (gradient)?
>This is obviously a trivial example, but it makes the point. The scalar
>factors depend on the implimentation of the algorithm, processor power,
>compilers etc. and is only one member of the set of O( n log n ) for
>example. SO your sort of 20000 producing 2.6 - 2.7 times is NOT outside the
>bounds of the theoretical,
For 20,000 random data items, Shell sort actually performs fewer
comparisons (but more data moves) than Quicksort.
>Thanx for you comments. Much appreciated.
Nice talking to you.
>I was under the impression that ShellSort was of order O(n log n), but
>I am willing to accept that I was wrong.
Not quite, see my other message.
"Michael Davis" <miki...@yahoo.co.uk> wrote in message
news:62f60a1.02061...@posting.google.com...
> >That is fine, doesn't surprise me, the thing I highlighted in my previous
> >e-mail is that the
> >notation referres to the average speed of the algorithm being bound from
> >above by O(n log n) an from below by something else.
>
> Quicksort is not bounded above by O(n log n), it is O(n^2) in the
> worst case. The average is O(n log n) though.
>
Reading the last sentence of my part you see that we agree on this =8¬) The
average case of quicksort IS bounded from above by O( n log n ), indeed the
top is bounded by n^2. We are agreeing on this one.
> >This is right, but I was talking about microseconds. One thing you have
to
> >look at is that due to the fact that the ShellSort algo. is a O( n
> >sqrt( n ) )
>
> Shell sort is (much) better than that. Depending on the
> implementation, it can be n * (log n)^2, n^(5/4), n^(7/6) for
> practical versions, or asymptotically n^(1+epsilon) for any epsilon >
> 0.
>
True, the implimentation of shell sort is dependant on the sequences used to
compare elements in the array/list, and indeed if the result is
n^(1+epsilon) it is still slower than quicksort.
> >gradient curves upwards whilst O( n log n ) curves downwards,
>
> I'm not sure what you mean here. Do you mean like the slope or
> derrivative (gradient)?
>
Yep
>
> >This is obviously a trivial example, but it makes the point. The scalar
> >factors depend on the implimentation of the algorithm, processor power,
> >compilers etc. and is only one member of the set of O( n log n ) for
> >example. SO your sort of 20000 producing 2.6 - 2.7 times is NOT outside
the
> >bounds of the theoretical,
>
> For 20,000 random data items, Shell sort actually performs fewer
> comparisons (but more data moves) than Quicksort.
>
You are right, I miscommunicated here, when I talked about comparisons I
meant operations on data whch encompas a comparison or a move. My bad.
Should have specified that in more detail.
Ethar
I dont believe you should assume that the data is almost always already
sorted, dangerous assumption that. But indeed there is a point to what you
have said. If it is nearly sorted (or fully sorted) then a bubblesort algo
is bounded from BELOW (Big-Ohmega) by O( n ) believe it or not (given the
list may be already sorted) whilst quicksort would still perform a O( n log
n ) sort.
Ethar
"fat G" <fa...@NO-SPAMemail.si> wrote in message
news:HXxN8.1527$w5.2...@news.siol.net...
This was only the case for the initial implementation of the
quicksort, where the first element was always chosen as the pivot.
Shortly after that problem was discovered, people came up with all
sorts of solutions. A popular one is to take the median of the first,
middle, and last element of the current section, then use that as the
pivot.
> This is very odd behaviour: I know of no other
> sorting algorithm that behaves like this...
The initial implementation of the treesort behaved that way too,
although you can avoid the problem by using a red-black or AVL tree.
>We're told on our faculty never never never to use Shellsort
That's nonsense. Perhaps they meant never use Bubble sort. I use
Shell sort a lot.
> I don't think
>they're talking nonsense, when they say 'don't use it'.
I do think it is nonense to not use Shell.
>True, the implimentation of shell sort is dependant on the sequences used to
>compare elements in the array/list, and indeed if the result is
>n^(1+epsilon) it is still slower than quicksort.
Yes, but it isn't 10 times slower on 20,000 items, as was claimed.
"Jan Philips" <jud.mc...@mindspring.com> wrote in message
news:4kpfgu81vbst74oj4...@4ax.com...
They didn't mean bubblesort. It was definetely shellsort, I've looked it up.
Hope this doesn't sound too aggressive, but just because you are using
it a lot, it doesn't mean it's good, does it?
Not trying to be offensive or anything.
"Beer Hunter" <beer_...@swirve.com> wrote in message
news:95849100.02061...@posting.google.com...
If I remember rightly the Shell sort is used as a Genetic algorithms test
(well task actually) for optimisation, the idea being to optimise the sort
of a particular set or 20 numbers. There are various hill climbing algo's
that have hit an optimized point.
heap sort and bubble sort are indeed very easy to impliment, but I cannot
advocate using them unless you know for sure that the algo's will be in
their element (i.e. they will be performing sorts on nearly sorted data, to
get as close to O( n ) as possible) if that is the case then use them, else
look at a more geeric approach.
The one thing with QuickSort, Radix sort etc. is that you have a fairly
predictable average complexity from each, within a defined range. This does
not apply to Bubble sort which has a complexity between O( n ) and O( n^2 )
which is pretty significant, thus making it sometimes (not all the time)
pretty inconsistent between implimentations.
The Heap and Bubble sorts are not faster than any other algorithm unless
they are working on data that is nearly already sorted. Most of the time the
averages are inthe O( n^2 ) mark hence way below the merge and quicksorts of
O( n log n )
It is amazing to see how heated a discussion on sorting can get =8¬) I am
not even sure that we answered te question that was asked, indeed I am not
sure I remember the original question.
Ethar
"fat G" <fa...@NO-SPAMemail.si> wrote in message
news:FTRN8.1573$w5.2...@news.siol.net...
>They didn't mean bubblesort. It was definetely shellsort, I've looked it up.
>Hope this doesn't sound too aggressive, but just because you are using
>it a lot, it doesn't mean it's good, does it?
It isn't good because I use it, I use it because it is good.
>Well... it might be a matter of opinion, but the people that told me that
>are proffesors on our faculty of CS,
>and are supposed to know these things. I'ma stick with them until proven
>wrong :).
Well I know something too. I've been studying algorithms for more
than 22 years and I've never heard "don't use Shell sort" before, but
I HAVE often heard it about bubblesort.
>If I may ask, why _are_ you using it? Heapsort and quicksort are both faster
>(... in most cases), not to mention
>much simplier to implement?
If you are sorting a few thousand records, the execution time
difference is very small. For sorting 1000 records, Quicksort is on
the order of 0.0001 second faster. For 10,000 records, the difference
is on the order of 0.001 second. For sorting hundreds of thousands or
millions of records, then Quicksort is naturally better.
Shell sort is simplier to implement than heap sort or quick sort. And
good implementations of quicksort get a little trickier. One good
implementation that I sometimes use requires adding a sentinal record
in order to run correctly. If I forget that, I get wrong answers very
quickly...
>The Heap and Bubble sorts are not faster than any other algorithm unless
>they are working on data that is nearly already sorted. Most of the time the
>averages are inthe O( n^2 ) mark hence way below the merge and quicksorts of
>O( n log n )
Actually, heap sort is O(n log n) on the average and in the worst
case. In fact, it was the first sort proven to be O(n log n) in the
worst case. I think heap sort takes about twice as long as quicksort
on the average, but it is guaranteed to be O(n log n), so it could be
a good choice when the data may be pathological.
>How about using iterative quicksort, anyone tried that?
>In theory, it's faster than recursive version.
In my test the non-recursive quicksort is about 0.0001 second faster
than the recursive version when sorting 10,000 records.
>They didn't mean bubblesort. It was definetely shellsort, I've looked it up.
You shouldn't believe everything they say. You should use your own
common sense. In "Algorithms", by Robert Sedgewick, first edition
page 98, "Shellsort is the method of choice for many sorting
applications because it has acceptable running time for even
moderately large files and requires only a very small amount of code,
which is easy to get working. There are more efficient methods, but
they're perhaps only twice as fast (if that much) except for large
files, and significantly more complicated. In short, if you have a
sorting problem, use Shellsort, then determine whether the extra
effort required to replace it with a sophisticated method will be
worthwhile."
And that corresponds to common sense and actual practice.
Regardless of academic qualification I would be leary of anyone who says
that one should never use a particular sort algorithm. One of the reasons
that there are so many sort algorithms is that each one of them has a set of
characteristics that make it ideal for a certain set of circumstances.
Knowing the nature of the data being sorted is a requirement of choosing the
optimal sort algorithm. If one doesn't have this knowledge then choosing
shell sort makes a great deal of sense.
> If I may ask, why _are_ you using it? Heapsort and quicksort are both
faster
> (... in most cases), not to mention
> much simplier to implement?
One of my corollaries to Murphy's law is that if there is a worst case
scenario, it will only appear after a product goes live. IMO "in most cases"
just doesn't cut it.
Ethar
"Jan Philips" <jud.mc...@mindspring.com> wrote in message
news:lbufguklvk5is3ptf...@4ax.com...
Ethar
"fat G" <fa...@NO-SPAMemail.si> wrote in message
news:tXRN8.1574$w5.2...@news.siol.net...
If some of you don't remember me or even know who I am at all, let me
fill you in.... I'm the one posting the original question.
I've been following the discussion and I must admit that I thought it
would bee way simpler to "just sort a list". You really have to know
something about the data in the list, and how it is structured.
I was using Quick Sort and I couldn't figure out why it was kinda
"slow". But then I looked at my list and I discovered that large
sections of it already were sorted!! Well I guess that means bye bye
Quick Sort then!?
Then I can see that some of you suggest the Shell Sort and the Merge
Sort for a list that is partially sorted, as an alternative to, or
even a better solution than Quick Sort. But what about posting some
usable code so I could test it myself? That would really bee nice!!
Also...I've been searching the web for different sorting algorithms,
and I've (once again) stumbled across something interesting. If you'd
take the time to check out this website:
http://www.informatik.uni-stuttgart.de/ifi/ps/Ploedereder/sorter/sortanimation2.html
and try the animation. Could someone please explain what the
distribution sort is and how it works?! It looks fast!! ...and if you
have some code for that one too, you are more than welcome to post
it!!
...thanks for the great discussion, I've really learned something.
|Hope this doesn't sound too aggressive, but just because you are using
|it a lot, it doesn't mean it's good, does it?
Hope this doesn't sound too aggressive, but just because a professor
says something, doesn't make it so! (except in his class)
Phil
"Jan Philips" <jud.mc...@mindspring.com> wrote in message
news:guvfgukrecuau31h1...@4ax.com...
>I agree with Jan, as I mensioned before, the whole principle of selection
>should be based on several factors, including speed, performance, space and
>difficulty.
Yes, I routinely use 4 or 5 sorting algorithms, depending on the
situation.
>True, I shouldn't believe everything. But it's a starting point to start
>building my beliefs. After all, I've just learned all about these sorts,
>while you and many others have years of experience with them.
Even the lowly Bubble Sort has some uses, and it is certainly the
least well regarded of the mainstream sorting methods. It is OK on
nearly sorted data, and that comes up in practice sometimes.
>Good point, I stand corrected, Heap is indeed O( n log n ) making it a good
>choice for speed actually. Sounds good to me. Good call Jan. Being twice as
>slow in terms of complexity is nothing, as long as we are not looking at
>orders of magnitude then fine.
The Algoritmn Design Manual says that a poorly written Quicksort can
be slower than a poorly written Heap sort. I've never experienced
that, but it could be.
Do you mean replacing the second recursive call with iteration, or
both? Some compilers already replace tail recursion with iteration -
if you want to see a difference, you might need to replace both.
"Beer Hunter" <beer_...@swirve.com> wrote in message
news:95849100.02061...@posting.google.com...