Mergesort: why's it efficient?

Joost Romeu

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

I recently read about the mergesort. The book characterized the sort as
"elegant and highly efficient".

The analogy it gave to explain the sort was:
1
- provide one person w. a deck of cards;
- that person divides the deck and distributes each half to another person;
- those persons doing the same
...in binary treelike fashion...until every person had a card
at that point the card in each person's hand is sorted (one card is
an automatic sort)
2.
- and then the merge begins (this part the book cites as the most
involved, requiring an "auxiliary array whose size equals the size of the
original array")

What's so elegant or efficient about an algorithm that takes multiple
passes to divide a set of unsorted items into smaller sets of unsorted
items? It seems that one non-recursive assignment could do the same thing.

Why all the work dividing when what you're left with (at the merge step)
is a bunch of sets that taken in toto are no more sorted than when you
started?

Thanks

--
------------------------------------------
please also reply as email...

DDL

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

I don't have my algorithms book in front of me, but I do
remember a few things:

First is you don't have to sort each smaller pile until
"everyone" only holds one card. You only need to sort when
someone has a set of cards which are still unsorted. As an
example, if 26 "people" have two cards each, statistically
half of them will hold two cards already in the correct order.
The same logic works (with less matches, but still important),
for people holding three cards, four cards, etc.

The obvious breakdown for this is when the sorted set is
in exactly the reverse order. Then you must break down the
entire set.

The big advantage of merge sort is it will *average* a sort
time of N(logN) where N is the number of elements being
sorted. N(logN) is pretty standard for recursive algorithms.

Hope this helps.

DDL

Jude Giampaolo

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

In article <56pu92$d...@decius.ultra.net>, DDL <d...@DBRC.com> wrote:

> The big advantage of merge sort is it will *average* a sort
> time of N(logN) where N is the number of elements being
> sorted. N(logN) is pretty standard for recursive algorithms.

Actually merge sort is N(logN) in worst case. Perhaps you are thinking of
quick sort? It is N(logN) on average but N*N in worst case.

--
Jude Charles Giampaolo 'I was lined up for glory, but the
jcg...@psu.edu tickets sold out in advance'
ju...@smellycat.com http://prozac.cwru.edu/jude/JudeHome.html

William D Clinger

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to DDL

DDL <d...@DBRC.com> wrote:
> The big advantage of merge sort is it will *average* a sort
> time of N(logN) where N is the number of elements being
> sorted. N(logN) is pretty standard for recursive algorithms.

The big advantage of merge sort is that it *always* runs in
O(N lg N) time.

Quicksort has an *average* time of O(N lg N) but sometimes
requires O(N^2) time. (It is possible to implement a version
of quicksort that *always* runs in O(N lg N) time, but this
is seldom done because it makes the average case slower by a
large constant factor.)

Purists can read "O" as a capital theta.

Will

David Hanley

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

> The big advantage of merge sort is it will *average* a sort
> time of N(logN) where N is the number of elements being
> sorted. N(logN) is pretty standard for recursive algorithms.

Ack! No! The point of merge sort is that it is
*guaranteed* to be n lg n--it cannot do worse. That is very important!

dave

DDL

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

I seem to be confusing Quicksort and Mergesort. Strange, I was
positive that Quicksort was "better" than Mergesort for computational
efficiency. I'll whip up some Lisp code this evening and test it out.

DDL

Tim Fischer

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

As I recall, both QuickSort and MergeSort are O(n log n) for their average
cases. However, the characteristic of 'O' notation is that you drop all
constant factors. Therefore, one O(n) algorithm could take 1000 times as
long as another O(n) algorithm, and as long as the complexity isn't
dependant on the size on 'n', they're both O(n) algorithms.

The advantage of QSort over MergeSort is that it's constant factor is
smaller than MergeSort's, and so therefore, on the average case, it is
faster (but not less complex).

I don't recall whether MergeSort is O(n log n) worst case, if so, that
would be an advantage of it, as QSort degenerates to O(n^2) worst case.

Emperically, I believe, QSort is best, especially if you're somewhat
intelligent about selecting the 'pivot' value. (There's lots of research
on this).

<\thinking back to my algorithms classes>

-Tim

--
---------------------------------------
Tim Fischer
Coda Music Technology

The following email address is mangled to prevent automated
unsolicited junk mail. Replace the '_AT_' with an '@':

tfischer_AT_codamusic.com

Marty Hall

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

DDL <d...@DBRC.com> writes:
> [MS is O(NlgN) worst case and average case, QS is O(N^2) worst case,
> O(NlgN) average case.]

> I seem to be confusing Quicksort and Mergesort. Strange, I was
> positive that Quicksort was "better" than Mergesort for computational
> efficiency.

Depends on what computational efficiency you are looking at. :-). If
you are looking at the average case, they have the same asymptotic
complexity (big-oh/theta), but Quicksort has smaller constant factors,
at least for random large input.

> I'll whip up some Lisp code this evening and test it out.

Be careful, though. Many a Lisp hacker has tried to use Quicksort on
lists, forgotten to account for the list appending time, and invented
a O(N^2 lgN) sorting algorithm! If you're using arrays, you should be
fine.
- Marty

Lisp Resources: <http://www.apl.jhu.edu/~hall/lisp.html>

Chris Jones

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

In article <tfischer-181...@news.minn.net>,

Tim Fischer <tfis...@See.Address.In.Signature> wrote:
>I don't recall whether MergeSort is O(n log n) worst case, if so, that
>would be an advantage of it, as QSort degenerates to O(n^2) worst case.

MergeSort can also sort on external data -- I never thought it would
happen to me, but I ended up doing this once! I had a really huge
file that needed sorting, and I had to cut it into lots of little
pieces, and then merge them back together. Used lots of tmpfile()
calls. Really kind of a cool concept, sorting external data.

Chris
--
-------------------------------------------------------------------------------
Chris Jones cjo...@rupert.oscs.montana.edu
Mad scientist in training...
"Is this going to be a stand-up programming session, sir, or another bug hunt?"

David Hanley

unread,

Nov 18, 1996, 3:00:00 AM11/18/96

to

Joost Romeu wrote:

> What's so elegant or efficient about an algorithm that takes multiple
> passes to divide a set of unsorted items into smaller sets of unsorted
> items?

Well, for starters, it doesn't have to; It can divide up the list into
two-item sorted sublists on the first pass.

> It seems that one non-recursive assignment could do the same thing.
>
> Why all the work dividing when what you're left with (at the merge step)
> is a bunch of sets that taken in toto are no more sorted than when you
> started?

They are sorted--when the recursion gets to the base lever( one item )
it is sorted, by definition, and the merging produces sorted stuff.

dave

Richard A. O'Keefe

unread,

Nov 19, 1996, 3:00:00 AM11/19/96

to

Marty Hall <ha...@apl.jhu.edu> writes:

>Depends on what computational efficiency you are looking at. :-). If
>you are looking at the average case, they have the same asymptotic
>complexity (big-oh/theta), but Quicksort has smaller constant factors,
>at least for random large input.

Sorry, this is not true, *UNLESS* you are comparing hardware-supported
numbers for which comparisons are extremely cheap, and in _that_ case
you ought to be using a bucket sort anyway, which is O(N) expected time.

>> I'll whip up some Lisp code this evening and test it out.

Your Common Lisp "stable-sort" function probably _is_ a merge sort.

I used to boast about how my straightforward but in no way tweaked
merge sort code in C could run rings around any UNIX qsort() I had
ever met. The Bently & Someone "Engineering a Sort" code from
Software Practice and Experience *matches* my merge sort code (after
I put in some tweaks to improve _its_ performance) but does not _beat_ it.

Remember the punch-line from Sedgewick's Acta Informatica paper:
he charged a comparison as 1/4 of a load, and if you are for example
comparing strings or records, that just isn't true.

--
Mixed Member Proportional---a *great* way to vote!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.

Richard A. O'Keefe

unread,

Nov 19, 1996, 3:00:00 AM11/19/96

to

jo...@sirius.com (Joost Romeu) writes:
>What's so elegant or efficient about an algorithm that takes multiple
>passes to divide a set of unsorted items into smaller sets of unsorted

>items? It seems that one non-recursive assignment could do the same thing.

So how do you _do_ that?

There are many many members of the merge sort family.
There's also the bottom-up natural merge, for example:
start by chopping up the input into already-sorted pieces
that are as long as possible ("runs")

while you have more than one sequence,
merge adjacent pairs (A B C D E F G H -> AmB CmD EmF GmH and so on)

>Why all the work dividing when what you're left with (at the merge step)
>is a bunch of sets that taken in toto are no more sorted than when you
>started?

In array-based merge sort, the "dividing" is purely notional; you don't
_do_ anything with the data in the splitting phase, you simply move your
_attention_. On the way back _up_ you move things.

Marty Hall

unread,

Nov 19, 1996, 3:00:00 AM11/19/96

to

o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:
> Marty Hall <ha...@apl.jhu.edu> writes:
>
> >Depends on what computational efficiency you are looking at. :-). If
> >you are looking at the average case, they have the same asymptotic
> >complexity (big-oh/theta), but Quicksort has smaller constant factors,
> >at least for random large input.
>
> Sorry, this is not true, *UNLESS* you are comparing hardware-supported
> numbers for which comparisons are extremely cheap, and in _that_ case
> you ought to be using a bucket sort anyway, which is O(N) expected time.

Hmm, I must admit this is news to me. I thought articles I read and
tests I performed (both a number of years ago :-) indicated this, but
perhaps my memory (or the articles) are mistaken. You cite your
tests that show that mergesort actually performs better in real
life. Is this a standard implementation, or do you have some trick(s)
that we should all know about?

Cheers-

Jens Kilian

unread,

Nov 19, 1996, 3:00:00 AM11/19/96

to

Marty Hall (ha...@apl.jhu.edu) wrote:
> Be careful, though. Many a Lisp hacker has tried to use Quicksort on
> lists, forgotten to account for the list appending time, and invented
> a O(N^2 lgN) sorting algorithm! If you're using arrays, you should be
> fine.
> - Marty

Yep, that's the reason why Mergesort is usually preferred when sorting lists.
Also, Mergesort can exploit existing order in the input, in the best case
achieving O(N) performance.

When sorting arrays, all the shuffling adds up to slower processing.
In that case, a well-written Quicksort is preferable. "Well-written" implies
that it doesn't degenerate to O(N^2) except in pathological cases, which
usually requires that the input is artifically shuffled.

Greetings,

Jens.
--
Internet: Jens_...@bbn.hp.com Phone: +49-7031-14-7698 (TELNET 778-7698)
MausNet: [currently offline] Fax: +49-7031-14-7351
PGP: 06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]

David Hanley

unread,

Nov 19, 1996, 3:00:00 AM11/19/96

to

Tim Fischer wrote:

> <thinking back to my algorithms classes>

>

> The advantage of QSort over MergeSort is that it's constant factor is
> smaller than MergeSort's, and so therefore, on the average case, it is
> faster (but not less complex).

Actaully, that's not true. The difference between the two is that
mergesort requires extra space, while quicksort can be done 'in place'.
It
is pretty easy to show that if extra space is not an issue, mergesort
will win in most cases.

Lou Steinberg

unread,

Nov 19, 1996, 3:00:00 AM11/19/96

to

In article <joost-17119...@news.sirius.com> jo...@sirius.com (Joost Romeu) writes:

I recently read about the mergesort. The book characterized the sort as
"elegant and highly efficient".

What's so elegant or efficient about an algorithm that takes multiple

passes to divide a set of unsorted items into smaller sets of unsorted
items? It seems that one non-recursive assignment could do the same thing.

Why all the work dividing when what you're left with (at the merge step)

is a bunch of sets that taken in toto are no more sorted than when you
started?

Perhaps this perspective will help:

Suppose you have a pack of index cards with numbers on them and you
have to add them all up. This job takes time roughly proportional to
the number of cards. Suppose 100 cards take a minute. Then 200 cards
take 2 minutes, 50 cards take half a minute, etc. You could add them all
up by splitting the pack into two 50 card packs and adding them
separately, then adding the results. But this would gain nothing,
since each half takes 30 seconds so doing both would take 30 sec + 30
sec = the 1 minute the full pack would take. In fact you would lose
overall because of the extra time to split the pack and do the final
addition.

But now suppose you were doing something like looking to see if the
same number appeared twice in the pack. For this job, the time is
more than proportional to the number of cards - if 100 cards take a
minute, 200 cards take _more_than_ 2 minute. More importantly, 50 cards
will take _less_than_ 30 seconds. Say 50 cards take 20 seconds. Then,
if you divide the 100-card pack in two, doing the job on the two
50-card packs takes a total of 40 seconds. If the total of the time
to do the division into two packs and the time to somehow combine the
results of the two packs into a final result is less than 20 seconds,
then overall you've won.

Of course, the same argument holds for solving the problem with
50 cards - its faster to do it with two 25-card packs. And so on,
until you get down to 1-card packs (as long as the dividing-the-problem
time and the combining-the-solutions time doesn't outweigh the saving
from working on 2 smaller problems).

It turns out that sorting is like this second example. If it takes
1 minute to sort 100 numbers, it takes less than 30 seconds to sort
50 numbers. In fact, if you get down to 1-card packs, sorting takes
no time at all because a 1-card pack (by itself) can't be out-of-order.
Since for merge sort, dividing the pack is trivial (just take the top
half of cards and the bottom half), we say that that takes no time either.

But, as you point out, considered _together_ the bunch of 1-card
packs is still not sorted at all. We still have the
combining-solutions time to consider. For merge sort, combining
solutions means taking two packs of cards that are each sorted by
themselves and combining them into one large pack that is still
sorted, i.e. "merging" the packs. First we merge pairs of 1-card
packs into 2-card packs, then merge pairs of 2-card packs into
4-card packs, etc. It turns out that you can show that all this repeated
merging takes less time than it would have to sort the initial 100-card
pack by simpler sorting methods (like bubble sort or insertion sort).

(Actually, what you can show is that it takes fewer comparisons of
one card to another.)

Does this help?

Jeff Barnett

unread,

Nov 19, 1996, 3:00:00 AM11/19/96

to

In article <56s84c$r...@isoit109.bbn.hp.com>, je...@bbn.hp.com (Jens Kilian) writes:

|> Marty Hall (ha...@apl.jhu.edu) wrote:
|> Also, Mergesort can exploit existing order in the input, in the best case
|> achieving O(N) performance.

I think that merge sort still makes log n passes over the total input
AND each pass will take O(n) time. Thus, the time performance of merge
sort is extremely independent of the original order. However, I think
QUICK sort and many others vary between O(n) and O(N^2) depending on
the original order.

Jeff Barnett

Bruce Hoult

unread,

Nov 20, 1996, 3:00:00 AM11/20/96

to

o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:

> Remember the punch-line from Sedgewick's Acta Informatica paper:
> he charged a comparison as 1/4 of a load, and if you are for example
> comparing strings or records, that just isn't true.

What do you mean "1/4 of a load"? Is that the same as "1/4 of a copy"?

If so, then you're correct: in the case of strings or records, a
comparison isn't 1/4 as expensive as a copy -- it's usually far, far
less.

-- Bruce

--
...in 1996, software marketers wore out a record 31,296 copies of Roget's
Thesaurus searching for synonyms to the word "coffee" ...

Christer Ericson

unread,

Nov 20, 1996, 3:00:00 AM11/20/96

to

On Mon, 18 Nov 1996 13:26:01 +0000, William D Clinger <wi...@ccs.neu.edu>
wrote:
>[...]

>Quicksort has an *average* time of O(N lg N) but sometimes
>requires O(N^2) time. (It is possible to implement a version
>of quicksort that *always* runs in O(N lg N) time, but this
>is seldom done because it makes the average case slower by a
>large constant factor.)

How would you do an O(N lg N) worst case quicksort? Median-of-
three is still O(N^2) worst case. If I'm not mistaken, for any
given K, K independent of N, median-of-K is still O(N^2) worst
case.

Knuth mentions a method (which selects K elements, K depedent of
N) by Frazer and McKellar that decreases to O(N lg N) as N->inf.
Is that the method you are referring to?

David Hanley

unread,

Nov 20, 1996, 3:00:00 AM11/20/96

to

Marty Hall wrote:
>
> o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:
>

> > Sorry, this is not true, *UNLESS* you are comparing hardware-supported
> > numbers for which comparisons are extremely cheap, and in _that_ case
> > you ought to be using a bucket sort anyway, which is O(N) expected time.
>
> Hmm, I must admit this is news to me. I thought articles I read and
> tests I performed (both a number of years ago :-) indicated this, but
> perhaps my memory (or the articles) are mistaken. You cite your
> tests that show that mergesort actually performs better in real
> life. Is this a standard implementation, or do you have some trick(s)
> that we should all know about?

If you add the number of compares and moves that each of the sorts
perform, in the best case, it comes out to be the same. However,
mergesort will do fewer compares, and quicksort will do fewer moves.
Quicksort will do better if we are moving big chunks of data, or
comparisons are trivial. If we are sorting by
pointers-to-data-structures, mergesort wins big.

And it's diffucult to overemphasize that mergesort has a worst case of
nlgn. With the extra code needed to make quicksort avoid the n^2 worst
case behaviour, mergesort will pretty much always win.

Here's a C++ mergesort template I wrote. It beats the built-in
quicksort by a _lot_. That's a bit unfair, since it does inline
comparisons, but it's faster than _any_ quicksort in all but the most
pedantic cases, and it doesn't even try to take advantage of partial
ordering of the data. It's old code--since before booleans. :)

template< class T >
inline int TestSwap( T first , T second )
{
int rv = 0;
if ( LessThan( second , first ) )
{
T temp; temp = first; first = second; second = temp;
rv = 1;
}
return rv;
}

template< class T >
inline void MergeLists( T List1[] , int Size1 , T List2[] , int Size2 ,
T Dest[] )
{
//The following code could be rewritten faster by making
//the while an infinite loop, and jumping around, but that would be
//ugly.

int Pos1 = 0 , Pos2 = 0 , DestPos = 0 , Going = 1;
while( Going != 0 )
{
if ( LessThan( List1[ Pos1 ] , List2[ Pos2 ] ) )
{
Dest[ DestPos++ ] = List1[ Pos1++ ];
Going = Pos1 < Size1;
}
else
{
Dest[ DestPos++ ] = List2[ Pos2++ ];
Going = Pos2 < Size2;
}
}
//should replace the following with memcpy.
while( Pos1 < Size1 )
Dest[ DestPos++ ] = List1[ Pos1++ ];
while( Pos2 < Size2 )
Dest[ DestPos++ ] = List2[ Pos2++ ];
}

template< class T >
void MergeSortThis( T ar[] , int size , T scratch[] )
{
switch( size )
{
//Handle the special cases for speed:
case 2:
TestSwap( ar[ 0 ] , ar[ 1 ] );
break;

case 3:
TestSwap( ar[ 0 ] , ar[ 1 ] );
if ( TestSwap( ar[ 1 ] , ar[ 2 ] ) )
TestSwap( ar[ 0 ] , ar[ 1 ] );
break;

case 4:
//Gotta optimize this....
TestSwap( ar[ 0 ] , ar[ 1 ] );
TestSwap( ar[ 2 ] , ar[ 3 ] );
TestSwap( ar[ 0 ] , ar[ 2 ] );
TestSwap( ar[ 1 ] , ar[ 3 ] );
TestSwap( ar[ 1 ] , ar[ 2 ] );
break;

//Gotta do special case for list of five.

default:
//Subdivide the list, recurse, and merge
{
int s1 = size / 2;
int s2 = size - s1;
T *List1 = scratch;
T *List2 = scratch + s1;
MergeSortThis( List1 , s1 , ar );
MergeSortThis( List2 , s2 , ar + s1 );
MergeLists( List1 , s1 , List2 , s2 , ar );
}
}
}

template< class T >
void MergeSort( T ar[] , int size )
{
if ( size > 1 )
{
T *scratch = new T[ size ];
if ( scratch != NULL )
{
assert( scratch != NULL );
memcpy( scratch , ar , size * sizeof( T ) );
MergeSortThis( ar , size , scratch );
delete [] scratch;
}
}
}

Richard Wesley

unread,

Nov 20, 1996, 3:00:00 AM11/20/96

to

In article <E1553...@gremlin.nrtc.northrop.com>,
jbar...@charming.nrtc.northrop.com wrote:

I used to have a heapsort lying around in my C days. Guaranteed n log n
time and no space requirements. Larger constant than quicksort (about a
factor of two I believe). Very nice for sorting large numbers of objects
that are mostly sorted.

- rmgw

http://www.halcyon.com/hawkfish/Index.html

----------------------------------------------------------------------------
Richard Wesley | "I don't know about your dreams
hawk...@punchdeck.com | But mine are sort of hackneyed"
hawk...@electricfish.com | - Laurie Anderson, "Talk Normal"
----------------------------------------------------------------------------

William D Clinger

unread,

Nov 20, 1996, 3:00:00 AM11/20/96

to will

chri...@neversoft.com (Christer Ericson) wrote:
> How would you do an O(N lg N) worst case quicksort? Median-of-
> three is still O(N^2) worst case. If I'm not mistaken, for any
> given K, K independent of N, median-of-K is still O(N^2) worst
> case.

That's right. For an O(N lg N) quicksort, you need to find the
median of N elements in O(N) time.

> Knuth mentions a method (which selects K elements, K depedent of
> N) by Frazer and McKellar that decreases to O(N lg N) as N->inf.
> Is that the method you are referring to?

No.

There is an O(N) algorithm for finding the median of N elements.
See Knuth Volume 3 or Cormen, Leiserson, and Rivest (chapter 10)
for details.

The O(N) algorithm has a fairly large constant factor, so using
it to implement an O(N lg N) quicksort leaves you with a quicksort
that is slower than mergesort by a constant factor. The O(N)
algorithm for medians is also considerably more difficult to
implement than a mergesort.

So although it is possible to implement a O(N lg N) quicksort,
hardly anyone does it because mergesort is faster and easier to
implement.

jbar...@charming.nrtc.northrop.com (Jeff Barnett) wrote:
> QUICK sort and many others vary between O(n) and O(N^2) depending on
> the original order.

No, the best case performance of quicksort is O(N lg N).

Perhaps a summary table will help:

algorithm best case average case worst case

mergesort N lg N N lg N N lg N
quicksort using an O(N)
algorithm to find the N lg N N lg N N lg N
median
quicksort as implemented N lg N N lg N N^2
in practice
insertion sort N N^2 N^2

selection sort N^2 N^2 N^2

Will

Jens Kilian

unread,

Nov 21, 1996, 3:00:00 AM11/21/96

to

David Hanley (da...@netright.com) wrote:
> Actaully, that's not true. The difference between the two is that
> mergesort requires extra space, while quicksort can be done 'in place'.

Sorry, not true. Quicksort requires O(log N) extra space because of its
recursive nature, and Mergesort requires exactly as much.
Of course, this is again the *best* case for Quicksort, and the *worst* for
Mergesort. The other way round, Quicksort requires O(N) space, and Mergesort
O(1).

(I'm not considering Mergesort on arrays here, but I see no reason why it
should require more than O(1) extra memory above what it needs for sorting
lists.)

Bye,

Jens Kilian

unread,

Nov 21, 1996, 3:00:00 AM11/21/96

to

Jeff Barnett (jbar...@charming.nrtc.northrop.com) wrote:
> In article <56s84c$r...@isoit109.bbn.hp.com>, je...@bbn.hp.com (Jens Kilian) writes:
> |> Marty Hall (ha...@apl.jhu.edu) wrote:
> |> Also, Mergesort can exploit existing order in the input, in the best case
> |> achieving O(N) performance.

> I think that merge sort still makes log n passes over the total input
> AND each pass will take O(n) time. Thus, the time performance of merge
> sort is extremely independent of the original order. However, I think

> QUICK sort and many others vary between O(n) and O(N^2) depending on
> the original order.

The "natural merge" makes a single pass collecting sorted runs in the original
data:

4561230789 -> 456 0123 789

This can be done in linear time and constant space; if it ends up producing
a single run, the algorithm is finished. Merging multiple runs requires
O(N log N) time and O(log N) space.

(I'm always talking about sorting lists, where the constant factors are low
because the program basically just swaps pointers around.)

Greetings,

Janet Bell

unread,

Nov 21, 1996, 3:00:00 AM11/21/96

to

In my experience, a naive mergesort outperforms the more
complicated algorithms on machines that can take handle
the huge amount of garbage created.

Thomas Yan

unread,

Nov 21, 1996, 3:00:00 AM11/21/96

to

In article <5718ua$h...@isoit109.bbn.hp.com>,

Jens Kilian <je...@bbn.hp.com> wrote:
>Sorry, not true. Quicksort requires O(log N) extra space because of its
>recursive nature, and Mergesort requires exactly as much.
>Of course, this is again the *best* case for Quicksort, and the *worst* for
>Mergesort. The other way round, Quicksort requires O(N) space, and Mergesort
>O(1).

actually, it is easy to make quicksort use O(log N) space in the worst
case: after partitioning, first recurse on the smaller partition,
which is therefore at most half the original partition.

also, it is possible and still relatively easy to make quicksort use
O(1) space. you do this by using binary search (!) to recover the
pivot. binary search works because the pivot's value is sandwiched
between values in the lower and upper partition. running this on some
examples showed that its running time was good. this was done by
David Gries and someone else.

- thomas
--
Thomas Yan <ty...@cs.cornell.edu> I don't speak for Cornell.
Computer Science Department \\ Cornell University \\ Ithaca, NY 14853

David Hanley

unread,

Nov 21, 1996, 3:00:00 AM11/21/96

to

Jens Kilian wrote:
>
> David Hanley (da...@netright.com) wrote:
> > Actaully, that's not true. The difference between the two is that
> > mergesort requires extra space, while quicksort can be done 'in place'.
>

> Sorry, not true. Quicksort requires O(log N) extra space because of its
> recursive nature,

Yes, but this does not detract from the fact that quicksort can
overwrite the data that it is sorting, while mergesort cannot.

> and Mergesort requires exactly as much.

Sorry, nope. Mergesort ( on arrays ) requires temporary extra space
equal to the array being sorted. Quicksort on lists also doesn't
require(much) extra space, though on lists, mergesort doesn't require
too much extra either.

>
> (I'm not considering Mergesort on arrays here, but I see no reason why it
> should require more than O(1) extra memory above what it needs for sorting
> lists.)

How will you merge the array [[123][456]]?

dave

Woodrow Yeung

unread,

Nov 21, 1996, 3:00:00 AM11/21/96

to

I'm suprised that no one has mentioned the most important use of merge
sort. It is the ideal sorting algorithm for external sorting, that is
when you can't fit all of your data in RAM. Use merge sort if you need to
sort big files.

Merge sort belongs in the class of divide and conquor algorithms just like
quicksort. If you want the fastest sort use an optimized quicksort cause
it has the smallest constant factor. If you want to have a good internal
sort, but don't want to write too much code or is space constrained use
heap sort. Heap sort has the nice property that the worst case is still
O(NlogN), and that it does not require any extra working space.

By the way, David Hanley's merge sort example is a hybrid sort. It
switches to an N^2 sorting algorithms (It looks like a bubble sort!) when
the partition size goes down to 4 elements. This is really where his
merge sort beats the library Quicksort. The classic way to optimize
Quicksort is to use a different sort algorithm at a threshold point which
is what David did with his merge sort. Bubble sort would do quite well as
the replacement algorithm, although my choice is Shell sort.

Woody Yeung
ye...@reed.edu

Mukesh Prasad

unread,

Nov 21, 1996, 3:00:00 AM11/21/96

to

I found that Knuth's recommendation for quicksort with a
lower bound (stop recursing when less than N elements
are left to work on, then use another sort on the final
end result), works quite well. Knuth also has a method
for figuring out the best N for a given processor. For
a 80386, I think N comes out to about 9 or so.

Jens Kilian

unread,

Nov 22, 1996, 3:00:00 AM11/22/96

to

Thomas Yan (ty...@cs.cornell.edu) wrote:
> actually, it is easy to make quicksort use O(log N) space in the worst
> case: after partitioning, first recurse on the smaller partition,
> which is therefore at most half the original partition.

I stand corrected (I seem to remember this argument from some lecture long ago,
so I should have known better.)

> also, it is possible and still relatively easy to make quicksort use
> O(1) space. you do this by using binary search (!) to recover the

> pivot. [...]

Interesting - I never heard of that technique.

Steve Heller

unread,

Nov 22, 1996, 3:00:00 AM11/22/96

to

ye...@reed.edu (Woodrow Yeung) wrote:

>Merge sort belongs in the class of divide and conquor algorithms just like
>quicksort. If you want the fastest sort use an optimized quicksort cause
>it has the smallest constant factor.

This is incorrect or at least misleading. Distribution counting and
its relatives have the fastest big-Oh performance, at O(n). Otherwise,
your analysis is fine.

Steve Heller, author and software engineer
http://ourworld.compuserve.com/homepages/steve_heller

Ben Bullock

unread,

Nov 22, 1996, 3:00:00 AM11/22/96

to

Steve Heller wrote:
> ye...@reed.edu (Woodrow Yeung) wrote:

> >Merge sort belongs in the class of divide and conquor algorithms just like
> >quicksort. If you want the fastest sort use an optimized quicksort cause
> >it has the smallest constant factor.

> This is incorrect or at least misleading. Distribution counting and
> its relatives have the fastest big-Oh performance, at O(n). Otherwise,
> your analysis is fine.

Steve Heller should state that distribution counting or any other
`O(n)' sorting method is only valid under the assumptions that

(1) One knows the minimum and maximum values of the input

(2) The data is evenly distributed between those two values.

--
Ben Bullock @ Cavendish Laboratory, University of Cambridge, Britain
email: b...@pcae.hep.phy.cam.ac.uk (If mailing from a .com address, please
put "not junk" in the subject line or your mail may be auto-ignored.)
www: http://www.hep.phy.cam.ac.uk/theory/ben/index.html
Lenny's page: http://www.hep.phy.cam.ac.uk/theory/ben/lenny/lenny.html

Jordan Zimmerman

unread,

Nov 22, 1996, 3:00:00 AM11/22/96

to

In article <yeung-21119...@ccp2-2n-76-1.mrg.uswest.com>,
ye...@reed.edu (Woodrow Yeung) wrote:

> I'm suprised that no one has mentioned the most important use of merge
> sort. It is the ideal sorting algorithm for external sorting, that is
> when you can't fit all of your data in RAM. Use merge sort if you need to
> sort big files.

Actually, if you have a large number (>10) external files that you're
sorting, a straight merge sort can be unnecessarily slow. This is because
of the large numbers of comparisons that need to be made.

A kind of merge sort, called a K-Way merge, is the way to go. Here's an
explanation of it (I have the code if anyone wants it):

An implementation of "k-Way Merging" as described in "Fundamentals of
Data Structures" by Horowitz/Sahni.

The idea is to merge k sorted arrays limiting the number of
comparisons. A binary tree is built containing the results of comparing
the heads of each array. The top most node is always the smallest entry.
Then, its corresponding leaf in the tree is refilled and the tree is
processed again. It's easier to see in the following example:

Imagine 4 arrays:
{5, 10, 15, 20}
{10, 13, 16, 19}
{2, 19, 26, 40}
{18, 22, 23, 24}

The initial tree looks like this:

2
/ \
2 5
/ \ / \
18 2 10 5

The '/' and '\' represent links. The bottom row are the leaves and they
contain the heads of the arrays. The rows above the leaves represent the
smaller of the two child nodes. Thus, the top node is the smallest. To
process the next iteration, the top node gets popped and its leaf gets
refilled. Then, the new leaf's associated nodes are processed. So, after
the 2 is taken, it is filled with 19 (the next head of its array). After
processing, the tree looks like this:

5
/ \
18 5
/ \ / \
18 19 10 5

So, you can see how the number of comparisons is reduced.

--
Jordan Zimmerman
Altura Software, Inc.
home page: http://www.altura.com/jordanz

Don't blame me, I voted for Harry Browne!
http://www.harrybrowne96.org 1 (800) 682 1776

David Hanley

unread,

Nov 22, 1996, 3:00:00 AM11/22/96

to

Woodrow Yeung wrote:
>
> I'm suprised that no one has mentioned the most important use of merge
> sort. It is the ideal sorting algorithm for external sorting, that is
> when you can't fit all of your data in RAM. Use merge sort if you need to
> sort big files.

This is true!

>
> Merge sort belongs in the class of divide and conquor algorithms just like
> quicksort. If you want the fastest sort use an optimized quicksort cause
> it has the smallest constant factor.

I'm sorry, but this is simply not the case! Each algorithm does *only*
compares, moves, and recurses. Adding them up, they do the same amount
of each.
Quicksort does more compares, mergesort does more moves. Typically,
compares are
more expensive them moves, so mergesort comes out ahead.

> If you want to have a good internal
> sort, but don't want to write too much code or is space constrained use
> heap sort. Heap sort has the nice property that the worst case is still
> O(NlogN), and that it does not require any extra working space.

Heap sort is a good sort, but it's constant is bigger than quicksort or
merge sort. Still, it is not a bad choice for most cases.

>
> By the way, David Hanley's merge sort example is a hybrid sort. It
> switches to an N^2 sorting algorithms (It looks like a bubble sort!) when
> the partition size goes down to 4 elements. This is really where his
> merge sort beats the library Quicksort.

Uh, no. The library quicksort does this too, as does any comparison
sort in which speed is essential. in fact, i erred on the side of
writing less code--it would have
been better to switch at about 7 elements. You usually drop down to a
easier sort between 5-10 elements when coding a fast sort. And, in
fact, the sort I drop down to is a sort tree,
which makes things really fast.

> The classic way to optimize
> Quicksort is to use a different sort algorithm at a threshold point which
> is what David did with his merge sort. Bubble sort would do quite well as
> the replacement algorithm, although my choice is Shell sort.

Shell sort will be slower than bubble sort for small sets, and a lot
slower than s sort tree.

dave

Thomas Yan

unread,

Nov 22, 1996, 3:00:00 AM11/22/96

to

In article <574kg4$q...@lyra.csx.cam.ac.uk>,
Ben Bullock <b...@pcae.hep.phy.cam.ac.uk> wrote:
> [...] distribution counting or any other

>`O(n)' sorting method is only valid under the assumptions that
>
>(1) One knows the minimum and maximum values of the input
>
>(2) The data is evenly distributed between those two values.

i suggest (2) be rephrased as:

the number of items is AT LEAST a constant fraction of the
difference of the minimum and maximum values.

this is because even if all data is the same value, if you've got lots
of them, the cost for scanning empty buckets is still small on average.

David Hanley

unread,

Nov 22, 1996, 3:00:00 AM11/22/96

to

Ben Bullock wrote:
>
> Steve Heller wrote:
> > ye...@reed.edu (Woodrow Yeung) wrote:

> > This is incorrect or at least misleading. Distribution counting and
> > its relatives have the fastest big-Oh performance, at O(n). Otherwise,
> > your analysis is fine.
>

> Steve Heller should state that distribution counting or any other

> `O(n)' sorting method is only valid under the assumptions that
>
> (1) One knows the minimum and maximum values of the input
>
> (2) The data is evenly distributed between those two values.

No; this is not necessarily true for radix sort. In fact, #2 is not
even true for a bucket sort. It is only true that the distribution must
be known beforehand.

dave

Dann Corbit

unread,

Nov 23, 1996, 3:00:00 AM11/23/96

to

Counting sort is linear time, and the distribution
does not have to be known (except min and max for
each subset of the key counted). It is essentially
a bucket sort, with one counting pass beforehand.

It is even possible to sort in exactly n operations
if you know the min and max. You need to be able
to create a linked list with as many values as possible
key values. (Example -- short unsigned integer map
returns values 0 to 65535 -- exactly equal to the input.
This gives an array of linked lists with 65535 values).
Now for each call to map, attach the pointer to the key
to the appropriate linked list. One single scan of the
data, and the list is sorted.

David Hanley <da...@netright.com> wrote in article <329632...@netright.com>...

Michael Schuerig

unread,

Nov 23, 1996, 3:00:00 AM11/23/96

to

Richard Wesley <hawk...@punchdeck.com> wrote:

> I used to have a heapsort lying around in my C days. Guaranteed n log n
> time and no space requirements. Larger constant than quicksort (about a
> factor of two I believe). Very nice for sorting large numbers of objects
> that are mostly sorted.

I just tried the following with CodeWarrior 10 & STL (MSL variant) on a
PowerMac 7500/100 running Sys 7.5.5:

- generate a random int vector
- sort it with the regular "sort", an optimized quicksort that does a
insertion sort for vectory sizes <= 16.

- generate another random int vector of the same size
- sort it with "make_heap" & "sort_heap"

Optimization was set to the highest level, instruction scheduling for
PPC 601.

Guess what, starting with a vector size around 17000 the heapsort was
faster.

I'll have a look at the code as I guess more speed can be squeezed out
of the heapsort.

Michael

---
Michael Schuerig
mailto:uzs...@uni-bonn.de
http://www.rhrz.uni-bonn.de/~uzs90z/

Paul Keister

unread,

Nov 23, 1996, 3:00:00 AM11/23/96

to

Jordan Zimmerman <jor...@altura.com> wrote in article
<jordanz-2211...@204.147.232.52>...

> The idea is to merge k sorted arrays limiting the number of
> comparisons. A binary tree is built containing the results of comparing
> the heads of each array. The top most node is always the smallest entry.
> Then, its corresponding leaf in the tree is refilled and the tree is
> processed again. It's easier to see in the following example:
>
> Imagine 4 arrays:
> {5, 10, 15, 20}
> {10, 13, 16, 19}
> {2, 19, 26, 40}
> {18, 22, 23, 24}
>
> The initial tree looks like this:
>
> 2
> / \
> 2 5
> / \ / \
> 18 2 10 5
>

This is a special type of binary tree called a heap, isn't it? I would
discribe this algorythm as a combination of merge sort and heap sort, and a
good one, at that. IMHO heap sort is generally underated.

--
-- Be There!
-- Aloha.
///Paul Keister\\\\
\\\\ http://www.crl.com/~diyall/husband

Christopher Oliver

unread,

Nov 23, 1996, 3:00:00 AM11/23/96

to

: > The initial tree looks like this:

: >
: > 2
: > / \
: > 2 5
: > / \ / \
: > 18 2 10 5
: >
: This is a special type of binary tree called a heap, isn't it? I would
: discribe this algorythm as a combination of merge sort and heap sort, and a
: good one, at that. IMHO heap sort is generally underated.

No, though it's easy to confuse this. I think this particular beast is
called a 'loser' tree. The value of the parent node is the minimum value
of its two children.

--
Christopher Oliver Traverse Communications
Systems Coordinator 223 Grandview Pkwy, Suite 108
oli...@traverse.com Traverse City, Michigan, 49684
Is "Windows for Dummies" another case of "this sentence no verb?"

Steve Heller

unread,

Nov 24, 1996, 3:00:00 AM11/24/96

to

"Dann Corbit" <a-c...@microsoft.com> wrote:

>Counting sort is linear time, and the distribution
>does not have to be known (except min and max for
>each subset of the key counted). It is essentially
>a bucket sort, with one counting pass beforehand.

Thank you for straightening out the confusion on counting sort, so I
don't have to!

Paul Keister

unread,

Nov 24, 1996, 3:00:00 AM11/24/96

to

Christopher Oliver <oli...@co.traverse.com> wrote in article
<577esc$t...@bert.traverse.com>...

> : This is a special type of binary tree called a heap, isn't it? I would
> : discribe this algorythm as a combination of merge sort and heap sort,
and a
> : good one, at that. IMHO heap sort is generally underated.
>
> No, though it's easy to confuse this. I think this particular beast is
> called a 'loser' tree. The value of the parent node is the minimum value
> of its two children.

Are you saying that its not a heap because the parent node is a minimum
rather than a maximum? According to Budd,
_Classic_Data_Structures_in_C++_,

"A heap is a binary tree in which every node possesses the
property that the value associated with the node is smaller
than or equal to the value associated with either child node."

Personally, I don't think the issue of min/max is relevant to the operation
of the data structure. If you're looking for maximum values and used the
opposite rule, in which the parent is maximum, I would still call it a
heap.

I thought a 'loser' tree was one that owned no Microsoft stock<g>.

Bruce Hoult

unread,

Nov 25, 1996, 3:00:00 AM11/25/96

to

je...@bbn.hp.com (Jens Kilian) writes:
> David Hanley (da...@netright.com) wrote:
> > Actaully, that's not true. The difference between the two is that
> > mergesort requires extra space, while quicksort can be done 'in place'.
>
> Sorry, not true. Quicksort requires O(log N) extra space because of its

> recursive nature, and Mergesort requires exactly as much.
> Of course, this is again the *best* case for Quicksort, and the *worst* for
> Mergesort. The other way round, Quicksort requires O(N) space, and Mergesort
> O(1).

No, the O(log N) extra space for Quicksort is the *worst* case, provided
only that you always sort the smaller of the two partitions first and then
iterate (rather than recurse) to sort the larger partition. I doubt anyone
uses code that doesn't do this, as it's zero cost.

Christopher Oliver

unread,

Nov 25, 1996, 3:00:00 AM11/25/96

to

Paul Keister (pa...@envmgr.com) wrote:
: Christopher Oliver <oli...@co.traverse.com> wrote in article
: > No, though it's easy to confuse this. I think this particular beast is

: > called a 'loser' tree. The value of the parent node is the minimum value
: > of its two children.

: Are you saying that its not a heap because the parent node is a minimum
: rather than a maximum?

No. I'm saying that because in a heap, each node corresponds to a different
object. With the loser tree, we have a full binary tree where the value of
a parent is the minimum value of its children. See the third volume of
Knuth pages 251 to 253 for a discussion.

: I thought a 'loser' tree was one that owned no Microsoft stock<g>.

"Microsoft, a rather new corporation, may not have matured to the
position where it understands how it should act with respect to
the public interest and the ethics of the market place."
- Justice Stanley Sporkin

David Gillies

unread,

Nov 25, 1996, 3:00:00 AM11/25/96

to

William D Clinger wrote:
>
> chri...@neversoft.com (Christer Ericson) wrote:
> > How would you do an O(N lg N) worst case quicksort? Median-of-
> > three is still O(N^2) worst case. If I'm not mistaken, for any
> > given K, K independent of N, median-of-K is still O(N^2) worst
> > case.
>
> That's right. For an O(N lg N) quicksort, you need to find the
> median of N elements in O(N) time.

[some stuff deleted]

It's pointed out in Numerical Recipes that a quick and dirty random
number generator to select the partitioning element avoids the
O(N^2) problem for in-order arrays. I like Quicksort, but for small
to medium sorts I use Shell's sort, and for bigger ones I generally
use Heapsort.

Refs:

Press, Teukolsky, Flannery and Vetterling

Numerical Recipes in C, 2nd Ed.
Cambridge University Press, 1992
ISBN 0-521-43108-5

Also Knuth: Sorting and Searching, of course.
--
______________________________________________________________________
David A. G. Gillies (dagg...@vader.brad.ac.uk)
University of Bradford, Bradford, West Yorkshire, England
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

jcus...@grove.ufl.edu

unread,

Nov 25, 1996, 3:00:00 AM11/25/96

to Joost Romeu

> What's so elegant or efficient about an algorithm that takes multiple

I always thought that was called a QuickSort, but anyway...

it's effecient because its of the order log2 (N), meaning if you double
the number of entries in the list to be sorted, you only one extra
itteration. This is characteristic of the successive dividing the array
into halves. Incidentally, a binary search is also of the order log2 (N).

If you don't believe that it's really effecient, try it. Write a
QuickSort (merge sort, whatever) algorithm, and compare it to insertion
sorts, etc. and you'll find for most arrays, it beats them hands down.
(It's possible that for very small arrays, quicksort may not be as
efficient as the other sorting methods, which is why it's been suggested
to divide the array into subarrays of four or so entries).

--------------------------------------------------------------------------
Jim Cushing | Visit Jim Cushing's EgoWeb for Macintosh
University of Florida | files, political commentary, and more.
jcus...@grove.ufl.edu | http://grove.ufl.edu/~jcushing
AKA cir...@grove.ufl.edu |
--------------------------------------------------------------------------

Richard A. O'Keefe

unread,

Nov 26, 1996, 3:00:00 AM11/26/96

to

b...@pcae.hep.phy.cam.ac.uk (Ben Bullock) writes:

>Steve Heller should state that distribution counting or any other
>`O(n)' sorting method is only valid under the assumptions that

>(1) One knows the minimum and maximum values of the input

This can be discovered in O(n) time using the standard 3N/2-comparison
method.

>(2) The data is evenly distributed between those two values.

*This* one is completely false.
Single bucketing is O(N) under a wide range of distributions.
More precisely, let "f" be the probability density,
then single bucketing takes linear expected time if and only if
integral(f**2) < infinity.

Double bucketing does even better.

A good reference is
Lecture Notes on Bucket Algorithms
Luc Devroye
Birkh\"auser 1986, ISBN 0-8176-3328-6.

--
Mixed Member Proportional---a *great* way to vote!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.

Ben Bullock

unread,

Nov 26, 1996, 3:00:00 AM11/26/96

to

Ben Bullock wrote:

Thanks to everyone who fixed my mistake. If only I could understand
your answers... Well, never mind, I'll go and read about it somewhere.

Ben Bullock

unread,

Nov 26, 1996, 3:00:00 AM11/26/96

to

Richard A. O'Keefe wrote:
> b...@pcae.hep.phy.cam.ac.uk (Ben Bullock) writes:

> >Steve Heller should state that distribution counting or any other
> >`O(n)' sorting method is only valid under the assumptions that

> >(1) One knows the minimum and maximum values of the input

> This can be discovered in O(n) time using the standard 3N/2-comparison
> method.

Yes, this seems to be rather a glaring error in my post.

> >(2) The data is evenly distributed between those two values.

> *This* one is completely false.

Yes, again you are right, but the other one, number (1), was
completely false as well. As far as I can see we need to know the
distribution though. There was one post that said that we didn't, but
it was incomprehensible to me.

> Single bucketing is O(N) under a wide range of distributions.
> More precisely, let "f" be the probability density,
> then single bucketing takes linear expected time if and only if
> integral(f**2) < infinity.

I am really lost trying to understand the last sentence here.

Dann Corbit

unread,

Nov 26, 1996, 3:00:00 AM11/26/96

to

Ben Bullock <b...@pcae.hep.phy.cam.ac.uk> wrote in article
<57fjrk$5...@lyra.csx.cam.ac.uk>...
[snip]

> Yes, again you are right, but the other one, number (1), was
> completely false as well. As far as I can see we need to know the
> distribution though. There was one post that said that we didn't, but
> it was incomprehensible to me.

You do not have to know the distribution. You take 1 pass for each
chunk of key that you process and count it first. Let's suppose that
you are counting a single char. On the first pass, count the chars
by each category (values 0 through 255 on most machines). Now we know
the distribution. On the second pass, bucket sort them.

Now suppose we are going to try to sort floats in this way. We will
take 5 counting passes and 5 distribution phases. Pass 1 counts the
sign bit for the number. Then we bucketize by the sign. Pass 2 counts
the exponents. Then we bucketize by the exponents. Pass 3/4/5 counts
the mantissas (24 bits would take 16 million buckets -- therefore we
do it the easy way and count one byte of the mantissa at a time).
Then we bucketize the mantissas.

10 passes down the array and we are done. If we use larger buckets,
we can sort faster. It would be easy to combine the sign+exponent into
a single pass.

10*n = O(n).
[snip]

Steve Heller

unread,

Nov 27, 1996, 3:00:00 AM11/27/96

to

b...@pcae.hep.phy.cam.ac.uk (Ben Bullock) wrote:

>Ben Bullock wrote:

>Thanks to everyone who fixed my mistake. If only I could understand
>your answers... Well, never mind, I'll go and read about it somewhere.

It's covered in great detail in my book "Efficient C/C++
Programming". I'll be happy to provide details of the book by email if
you're interested.

Ben Bullock

unread,

Nov 27, 1996, 3:00:00 AM11/27/96

to

Dann Corbit wrote:

> You do not have to know the distribution. You take 1 pass for each
> chunk of key that you process and count it first. Let's suppose that
> you are counting a single char. On the first pass, count the chars
> by each category (values 0 through 255 on most machines). Now we know
> the distribution. On the second pass, bucket sort them.

This just doesn't make any sense at all. We count the characters
and then sort them?

Show me an algorithm that can sort the following data without any
knowledge of the distribution:

0
0.9
0.99
0.999
0.9999
0.99999
0.999999
0.9999999
0.99999999
0.999999999
...
1-(0.1)^n
...
1

It just isn't possible to do it in O(n) time.

> Now suppose we are going to try to sort floats in this way. We will
> take 5 counting passes and 5 distribution phases. Pass 1 counts the
> sign bit for the number. Then we bucketize by the sign.

> Pass 2 counts
> the exponents. Then we bucketize by the exponents. Pass 3/4/5 counts
> the mantissas (24 bits would take 16 million buckets -- therefore we
> do it the easy way and count one byte of the mantissa at a time).
> Then we bucketize the mantissas.

It's nonsense to describe this as an O(n) sort.

If you do a sort this way you are doing an O(n log(n)) sort because
you make log(n) passes at the data for your buckets.

> 10 passes down the array and we are done. If we use larger buckets,
> we can sort faster.

And then what goes into the buckets has to be sorted, at O(n log(n)),
again.

There is absolutely no way to make a sort faster than O(n log(n))
without knowledge of the distribution of data. I was surprised by the
way that many people who corrected my mistake don't know this.

You should try it on my example distribution above until you are
satisfied if you don't believe it.

Ben Bullock

unread,

Nov 27, 1996, 3:00:00 AM11/27/96

to

Steve Heller wrote:
> b...@pcae.hep.phy.cam.ac.uk (Ben Bullock) wrote:

> >Ben Bullock wrote:

> >Thanks to everyone who fixed my mistake. If only I could understand
> >your answers... Well, never mind, I'll go and read about it somewhere.

> It's covered in great detail in my book "Efficient C/C++
> Programming". I'll be happy to provide details of the book by email if
> you're interested.

Thanks very much but I already have some books about it. I didn't
understand how it was supposed to be possible to sort in O(n) time
without knowledge of the distribution.

By now I am very sure that it isn't possible, so the corrections to my
mistake contained mistakes too.

I thought the mistake in my point (1) about needing to know the
max/min was much worse than the mistake in (2) about having an even
distribution. The first mistake cannot be fixed to make more sense,
since we can find the max/min in O(n) anyway, whereas the error in (2)
can be easily fixed by replacing "even" with "known".

Anyway that is my final word on the subject. I am sorry to have
created this confusion, but glad to have an opportunity to clear some
up in the end.

David Hanley

unread,

Nov 27, 1996, 3:00:00 AM11/27/96

to

Ben Bullock wrote:
>
> Dann Corbit wrote:
>
> > You do not have to know the distribution. You take 1 pass for each
> > chunk of key that you process and count it first. Let's suppose that
> > you are counting a single char. On the first pass, count the chars
> > by each category (values 0 through 255 on most machines). Now we know
> > the distribution. On the second pass, bucket sort them.
>
> This just doesn't make any sense at all. We count the characters
> and then sort them?

We don't actaully *need* to count first. But mr corbit is right in
essence.

>
> Show me an algorithm that can sort the following data without any
> knowledge of the distribution:
>
> 0
> 0.9
> 0.99
> 0.999
> 0.9999
> 0.99999
> 0.999999
> 0.9999999
> 0.99999999
> 0.999999999
> ...
> 1-(0.1)^n
> ...
> 1
>
> It just isn't possible to do it in O(n) time.

It is trivial to do in O(n) time.

Say that the numers are represented as floats.

1) Create 256 buckets.
2) Place into puckets based in the value of the least signicant byte.
3) Coleace the buckets into a lits again, in order.
4) Move up to the next more significant byte, and repeat step 2 until we
have bucketed on all the bytes.
5) Your list is sorted.

This is order O(n).

>
> And then what goes into the buckets has to be sorted, at O(n log(n)),
> again.

No, this is not true.

>
> There is absolutely no way to make a sort faster than O(n log(n))
> without knowledge of the distribution of data.

Please study algorithms before making such sweeping statements! it
just makes you look silly.

dave

Thomas A. Russ

unread,

Nov 27, 1996, 3:00:00 AM11/27/96

to

In article <57hnqi$a...@lyra.csx.cam.ac.uk> b...@pcae.hep.phy.cam.ac.uk (Ben Bullock) writes:

>> You do not have to know the distribution. You take 1 pass for each
>> chunk of key that you process and count it first. Let's suppose that
>> you are counting a single char. On the first pass, count the chars
>> by each category (values 0 through 255 on most machines). Now we know
>> the distribution. On the second pass, bucket sort them.
>
> This just doesn't make any sense at all. We count the characters
> and then sort them?

I think this sorting method also goes by the name of radix sort. The
true complexity of such a sort is O(k*n) where k is the size of the
sorting key and n is the number of elements to be sorted. In most cases
where this type of sorting algorithm is appropriate, k is a constant, so
the complexity is properly described as O(n).

> Show me an algorithm that can sort the following data without any
> knowledge of the distribution:
>
> 0
> 0.9
> 0.99
> 0.999
> 0.9999
> 0.99999
> 0.999999
> 0.9999999
> 0.99999999
> 0.999999999
> ...
> 1-(0.1)^n
> ...
> 1
>
> It just isn't possible to do it in O(n) time.

The algorithm described above will work on any fixed size key
representation. Since in a computer, there are only finitely many real
numbers, they can, in fact, be sorted in O(n) time. The tricky part of
the algorithm is that you need to be careful to preserve the order of
elements from the previous sorting step and that you need to start
sorting at the low end of the radix. To use a simple example with four
digit numbers:

1000
1001
2000
2011
2001

Pass one looks at digit ___X and results in

Bucket 0 Bucket 1 Bucket 2
1000 1001
2000 2011
2001

Current order: 1000 2000 1001 2011 2001

Pass two looks at digit __X_ and results in

Bucket 0 Bucket 1 Bucket 2
1000 2011
2000
1001
2001

Current order: 1000 2000 1001 2001 2011

Pass three looks at digit _X__ and results no change, since that
position is the same for all numbers in this particular example.

Current order: 1000 2000 1001 2001 2011

Pass four looks at digit X___ and results in

Bucket 0 Bucket 1 Bucket 2
1000 2000
1001 2001
2011

Final order: 1000 1001 2000 2001 2011

The number of passes through the data set is determined solely by the
length of the key used for sorting. It is independent of the number of
elements to be sorted, but each pass processes all N sorting elements.
(Verifying that you only need four passes with, say, 20 numbers instead
of just 5 is left as an exercise for the reader :)

This makes it particularly well suited to sorting records where the key
length is much smaller than the number of items to be sorted. If k is
roughly equal to n, then the actual running time will be O(n^2). This
is a good algorithm to use for sorting numeric records or records with a
fixed (relatively short) length string key. It will not work on
arbitrary string keys.

The key observation in sorting real numbers in a computer is that they
have finite representations, typically 4 or 8 bytes in length. That
means that one can sort them using this method in 4*n or 8*n passes
using 256 buckets. Of course the space requirements in the worst case
can be k*n if fixed size buckets are used.
--
Thomas A. Russ, USC/Information Sciences Institute t...@isi.edu

Dann Corbit

unread,

Nov 27, 1996, 3:00:00 AM11/27/96

to

Ben Bullock <b...@pcae.hep.phy.cam.ac.uk> wrote in article

<57hnqi$a...@lyra.csx.cam.ac.uk>...
[snip]

>
> There is absolutely no way to make a sort faster than O(n log(n))

> without knowledge of the distribution of data. I was surprised by the
> way that many people who corrected my mistake don't know this.
>
> You should try it on my example distribution above until you are
> satisfied if you don't believe it.

Not only is it possible, I do it all the time.
Please read "Sorting and Sort Systems" by Lorin.
Pay particular attention to algorithm "Counting Sort"
Several linear time sorts are presented in that book.

It is not possible to do comparison sorting faster than n log n.
Therefore, no linear time sort does comparisons between one key
and the next. It must operate directly on the key.
Radix sort requires no knowledge of the key whatsoever (except
its length in bits). It is linear time. Counting sort is also
linear time. If you cannot understand my explanation, please buy
a book on sorting and read it. You can even find information on
the web for these sorts. Knuth has one of the best books on the
subject. He explains linear time sorting in that book.

You don't have to believe it if you don't want to, but linear time
sorting has been known for years and years. Many UNIX implementations
have a RADIX sort implementation. If you use UNIX, and your
implementation has radixsort, try reading the man page for it.

Steve Heller

unread,

Nov 27, 1996, 3:00:00 AM11/27/96

to

b...@pcae.hep.phy.cam.ac.uk (Ben Bullock) wrote:

>Thanks very much but I already have some books about it. I didn't
>understand how it was supposed to be possible to sort in O(n) time
>without knowledge of the distribution.

>By now I am very sure that it isn't possible, so the corrections to my
>mistake contained mistakes too.

>I thought the mistake in my point (1) about needing to know the
>max/min was much worse than the mistake in (2) about having an even
>distribution. The first mistake cannot be fixed to make more sense,
>since we can find the max/min in O(n) anyway, whereas the error in (2)
>can be easily fixed by replacing "even" with "known".

>Anyway that is my final word on the subject. I am sorry to have
>created this confusion, but glad to have an opportunity to clear some
>up in the end.

Actually, I find this message confusing. Are you convinced that it
is possible to sort doubles (for example) in O(n) without knowing
their distribution?

Richard A. O'Keefe

unread,

Nov 28, 1996, 3:00:00 AM11/28/96

to

b...@pcae.hep.phy.cam.ac.uk (Ben Bullock) writes:
>I didn't understand how it was supposed to be possible to sort in O(n)
>time without knowledge of the distribution.

>By now I am very sure that it isn't possible, so the corrections to my
>mistake contained mistakes too.

I repeat:

Lecture Notes on Bucket Algorithms
Luc Devroye
Birkh\"auser 1986

ISBN 0-8176-3328-6

V6 in the "Progress Computer Science" series.

Bucket sorting is a way of sorting that achieves O(N) expected time.
The distribution must _have_ certain properties for that to be true,
but the algorithm doesn't need to _know_ anything about the
distribution in advance to achieve that cost.
Double bucketing works very well in practice.

What's more, with a bucket sort, you don't _have_ to use a quadratic
algorithm to sort the buckets: double bucketing uses another round
of bucket sort, but you can use merge sort or even (gag) "quick" sort,
so the worst case can be O(NlgN).

If you know in advance that you are sorting
- integers of a known _fixed_ size
- floats of a known _fixed_ size
- strings of a known _fixed_ size
then you can sort an array of N such things in O(N) worst case time
using radix sort _whatever_ the distribution of the data.
Radix sort works brilliantly for 32-bit numbers.

Anyone who is sure that O(N) expected time sorting cannot be done
for realistic problems, in the absence of prior knowledge of the
distribution, is certainly wrong.

--
Fear most of all to be in error. -- Kierkegaard, quoting Socrates.

Richard A. O'Keefe

unread,

Nov 29, 1996, 3:00:00 AM11/29/96

to

David Gillies <dagg...@vader.brad.ac.uk> writes:
>It's pointed out in Numerical Recipes that a quick and dirty random
>number generator to select the partitioning element avoids the
>O(N^2) problem for in-order arrays.

Numerical Recipes has been savaged in review, but surely they can't
have got it _that_ wrong!

You can use a random number generator all you want, and the algorithm
will STILL be O(N**2) worst case.
Using a random number generator guarantees you (if you do it right, and
youturn out to need a much better random number generator than you
might expect) that the worst case will be extremely *unlikely*, but
it is still *possible*.

>I like Quicksort, but for small
>to medium sorts I use Shell's sort, and for bigger ones I generally
>use Heapsort.

You don't say _which_ Heapsort you use.
There are several, including one that is very close to 1.0 NlgN
comparisons in the worst case. Look in back issues of
The Computer Journal to find the code.
--
Suddenly I realised with a shock that others don't think of
the Egyptian New Kingdom as "modern".

Andrew Koenig

unread,

Nov 30, 1996, 3:00:00 AM11/30/96

to

In article <57j5dh$sh3$1...@goanna.cs.rmit.edu.au> o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:

> Anyone who is sure that O(N) expected time sorting cannot be done
> for realistic problems, in the absence of prior knowledge of the
> distribution, is certainly wrong.

For realistic problems, there is little difference between O(N)
and O(N log N), because there is usually a constant upper bound
on N (for example, the number of possible values of a pointer).
--
--Andrew Koenig
a...@research.att.com

Steve C

unread,

Dec 1, 1996, 3:00:00 AM12/1/96

to

In article <ymi3exv...@hobbes.isi.edu>, "Thomas A. Russ"
<t...@ISI.EDU> writes

>
>1000
>1001
>2000
>2011
>2001
>
>Pass one looks at digit ___X and results in
>

>Final order: 1000 1001 2000 2001 2011

Just one to say I found this thread MOST interesting and revealing.
I posted recently on other sort methods (I only new ripple/bubble) and
have now implemented a bucket sort to my application, with most
EXCELLENT results.

Ta big-time !

(BTW, my app was 1000+ long strings sorted by 3 paticular chars in the
string, so suited just nice)
--
Steve C

Richard A. O'Keefe

unread,

Dec 2, 1996, 3:00:00 AM12/2/96

to

b...@pcae.hep.phy.cam.ac.uk (Ben Bullock) writes:

>Yes, again you are right, but the other one, number (1), was
>completely false as well. As far as I can see we need to know the
>distribution though. There was one post that said that we didn't, but
>it was incomprehensible to me.

>> Single bucketing is O(N) under a wide range of distributions.

>> More precisely, let "f" be the probability density,
>> then single bucketing takes linear expected time if and only if
>> integral(f**2) < infinity.

>I am really lost trying to understand the last sentence here.

Assume numbers to be sorted are drawn from the interval [0,1].
(Any interval [Min,Max] will do.)

The probability density f is the function which tells you how
likely it is that one of these randomly distributed numbers
will fall into any particular interval: b
/
Probability(X element-of [a,b]) = | f(x) dx
/
a

"probability density" is a standard technical term in probability and
statistics; if you know what a "distribution" is, you know what this is.
1
Ok, the requirement is that / 2
| f(x) dx < infinity
/
0
(More generally, integrate from Min to Max, whatever is appropriate
for the distribution. And not knowing Min and Max in advance is not
a problem; if you estimate Min and Max from the data you will get
_more_ buckets doing useful work and the result is _better_, except
for the overhead of finding Min and Max. Devroye chapter 2 shows
that the O(n) expected cost applies even if f is non-zero on an
infinite domain.)

If this is true, single bucketing (one pass of bucket sort, with the
buckets sorted using a quadratic cost method such as insertion sort)
will take O(n) expected time to sort n numbers drawn from this
distribution.

This will be true whether you *KNOW* what the distribution is or not,
as long as the distribution *IS* one with this property, which is a
lot of them.

Read Devroye's book! It's a little gem!
--
Govt saves money by cutting legal aid, guilty plea rates soar;
poverty is a crime! (See also recent Sci.Am.)

Rolf Czedzak

unread,

Dec 2, 1996, 3:00:00 AM12/2/96

to

Andrew Koenig wrote: <E1ouw...@research.att.com>

AK> In article <57j5dh$sh3$1...@goanna.cs.rmit.edu.au>
AK> o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:
AK>
AK> > Anyone who is sure that O(N) expected time sorting cannot be done
AK> > for realistic problems, in the absence of prior knowledge of the
AK> > distribution, is certainly wrong.
AK>
AK> For realistic problems, there is little difference between O(N)
AK> and O(N log N), because there is usually a constant upper bound
AK> on N (for example, the number of possible values of a pointer).

Your argument pushes _any_ 'realistic problem' into class O(1).

AK> --Andrew Koenig
AK> a...@research.att.com

Rolf

Steve Heller

unread,

Dec 2, 1996, 3:00:00 AM12/2/96

to

Steve C <St...@mjrigby.demon.co.uk> wrote:

>Just one to say I found this thread MOST interesting and revealing.
>I posted recently on other sort methods (I only new ripple/bubble) and
>have now implemented a bucket sort to my application, with most
>EXCELLENT results.

>Ta big-time !

>(BTW, my app was 1000+ long strings sorted by 3 paticular chars in the
>string, so suited just nice)

I've had the experience of fixing a program that sorted 3000
integers using a bubble sort(!) by replacing that algorithm with a
bucket sort. The run time went from 15 minutes to 8 seconds.

Steve C

unread,

Dec 2, 1996, 3:00:00 AM12/2/96

to

In article <57uoeb$1...@news.utdallas.edu>, Steve Heller
<hel...@utdallas.edu> writes

> I've had the experience of fixing a program that sorted 3000
>integers using a bubble sort(!) by replacing that algorithm with a
>bucket sort. The run time went from 15 minutes to 8 seconds.
>
>
>Steve Heller, author and software engineer

I can well believe it. Mine went from about 10mins+ to 45 secs, but then
my first attempt was probably well inefficient.

BTW, how does double-bucketing work ?

And for that matter, would someone explain the mergesort (in Enlish-like
the 1001,2001,etc bucket example) please ?
--
Steve C

Erik J. Rogers

unread,

Dec 3, 1996, 3:00:00 AM12/3/96

to

I know this might belong in another thread (or group for that matter),
but I think here it targets those interested more closely, and it does
relate to the original topic...

* What is a good book/text on such theory (search/sort & other basics)?
I could easily find something suitable, but perhaps there are some that
come with stronger recommendations than others.

Thanks,

Erik J. Rogers

Xiaobo Ma

unread,

Dec 3, 1996, 3:00:00 AM12/3/96

to

Hi, there:

I got a question that may be interesting to you.

Why doesn't the use of standard ".h" include files in C
provide an adequate data abstraction capacity for C programs?

What is your answer?

Paul

Chris

unread,

Dec 3, 1996, 3:00:00 AM12/3/96

to

Because including all the headers you like will not change the language.
Anyway,it sounds a little bit like homeworks...

--
Chris, drunk philosoph and bad programmer
------------------------------------------------------------------
"The nail pulling up calls the hammer"
zen proverb

Xiaobo Ma <m...@cs.uwindsor.ca> a écrit dans l'article
<32A3D5...@cs.uwindsor.ca>...

Andrew Koenig

unread,

Dec 3, 1996, 3:00:00 AM12/3/96

to

In article <6M5-4...@09.viking.ruhr.com> r...@viking.ruhr.com (Rolf Czedzak) writes:

> Your argument pushes _any_ 'realistic problem' into class O(1).

Not at all. Admittedly the argument is entirely subjective, but
in practice, computers have the property that if you ask them to store
N objects, N will be subjectively large but log N will not be.

Suppose, for example, that we limit N to 2^100 (which is much larger
than the memory of any computer available today) and talk about
log base 2 (to make the constrasts as extreme as we can). Then

O(N) will be much bigger than O(1)
O(N log N) will not be much bigger than O(N)
[well, it might be 100 times as large, but that's
much, much less than 2^100]
O(N^2) will be much bigger than O(N log N)
O(N^3) will be much bigger than O(N^2)

and so on. Of course there might be constant factors involved, but
they're almost surely unimportant compared to the size of N.

To make things more concrete, consider N=1,000,000, and look at four
programs with different performance characteristics:

A program that runs in 1 microsecond takes 1 microsecond :-)
A program that runs in N microseconds takes 1 second
A program that runs in N log N microseconds takes about 20 seconds
A program that runs in N^2 microseconds takes 11.6 days.

So what I am claiming is that, for practical purposes, a program that
runs in 1 second is nearly equivalent to one that runs in 20 seconds, when you
are comparing them with programs that run in 1 microsecond or in 11.6 days.

Again: O(N log N) is almost as good as O(N). It is not almost as good
as O(1), and is much better than O(N^2).
--
--Andrew Koenig
a...@research.att.com

a...@laphroig.mch.sni.de

unread,

Dec 4, 1996, 3:00:00 AM12/4/96

to

If you limit N to below a fixed value, then it doesn't make any sense
to speak about O notation, because that make sense only in the limit:
to say f(n) = O(g(n)) means lim f(n)/g(n) < c for some constant c.

Andreas Eder

David Gillies

unread,

Dec 4, 1996, 3:00:00 AM12/4/96

to

The Bible:

Knuth Vol. 3, Sorting and Searching.

And I know some people don't like it, but I go back to Numerical
Recipes (Press, Flannery, Teukolsky and Vetterling) more often
than to any other collection of algorithms.
--
______________________________________________________________________
David A. G. Gillies (dagg...@vader.brad.ac.uk)
University of Bradford, Bradford, West Yorkshire, England
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

Rolf Czedzak

unread,

Dec 5, 1996, 3:00:00 AM12/5/96

to

Andrew Koenig wrote: <E1uDz...@research.att.com>

AK> In article <6M5-4...@09.viking.ruhr.com> r...@viking.ruhr.com (Rolf
AK> Czedzak) writes:
AK>
AK> > Your argument pushes _any_ 'realistic problem' into class O(1).
AK>
AK> Not at all.

Whenever you agree in upper limits, you at once get constants dominating
_any_ function of n.

AK> Admittedly the argument is entirely subjective, but
AK> in practice, computers have the property that if you ask them to
AK> store N objects, N will be subjectively large but log N will not be.

Which is hardly a computer property. ;-)

AK> To make things more concrete, consider N=1,000,000, and look at four
AK> programs with different performance characteristics:
AK>
AK> A program that runs in 1 microsecond takes 1 microsecond :-)
AK> A program that runs in N microseconds takes 1 second
AK> A program that runs in N log N microseconds takes about 20 seconds
AK> A program that runs in N^2 microseconds takes 11.6 days.

If You want me to read that like:
'A program that runs in O(?) takes ... microseconds ?'
I agree, not even insisting that the first program might take 20 days
and still belong to O(1).
Given a factor of 20 between 2 applications makes the second one
unusable -look back some 10 years and get app. that difference in
(hardware) computing power. Now make me believe, that todays applications
can be run on (almost) yesterdays hardware. Or imagine the O(n) and
O(n log n) applications are not just write only ones, but run a few
thousend times a month. Its no longer the same class of programs.

AK> So what I am claiming is that, for practical purposes, a program that
AK> runs in 1 second is nearly equivalent to one that runs in 20 seconds,
AK> when you are comparing them with programs that run in 1 microsecond
AK> or in 11.6 days.

An Apple ][ is almost as good as a Pentium based computer comparing
them with no computer at all.

AK> Again: O(N log N) is almost as good as O(N). It is not almost as
AK> good as O(1), and is much better than O(N^2).

Its uncomparable. O(n) might easily be outscored by O(N^2) for all
practical problems.

AK> --Andrew Koenig

Rolf

Dann Corbit

unread,

Dec 5, 1996, 3:00:00 AM12/5/96

to

Rolf Czedzak <r...@viking.ruhr.com> wrote in article
<6MIHn...@09.viking.ruhr.com>...
[mega-snip]

> AK> Again: O(N log N) is almost as good as O(N). It is not almost as
> AK> good as O(1), and is much better than O(N^2).
>
> Its uncomparable. O(n) might easily be outscored by O(N^2) for all
> practical problems.

False.

If the problem is tiny, it may not matter what sort of algorithm,
since all will finish in a blink. If the problem is huge, then
it matters enormously, because for some problem size, O(N*log(N))
will always be better than O(n^2). There are algorithms that are
O(N!), which are pretty much incomputable for all but tiny problem
sets. If you implement such an algorithm, you deserve whatever
blasting you get (unless there are no alternatives).

Of course, the real-life, optimal solution for a problem may be
a combination of several algorithms with different behaviors,
and the optimal mix is determined by benchmarking.

But in any case, we need to know the average and worse case
behavior of algorithms we implement so we know how the program
will behave when the problem set expands (and it will!).

Andrew Koenig

unread,

Dec 5, 1996, 3:00:00 AM12/5/96

to

Actually it does, because you don't necessarily know the value in advance.
--
--Andrew Koenig
a...@research.att.com

Mike Rubenstein

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

"Dann Corbit" <dco...@solutionsiq.com> wrote:

No. This shows a misunderstanding of what O(f(n)) means.

For one thing, O(f(n)) means that the time (or other quantity of
interest) is smaller that A*f(n) for some constant A. It does not
imply that it is not much smaller.

Thus heapsort is O(n!) and bubble sort is O(n^2). This does not mean
that bubble sort is better than heapsort.

More to the point, O(f(n)) means that if n is sufficiently large the
quantity is smaller than A*f(n). But performance may be quite bood if
n is not sufficiently large. Consider two sort algorithms:

1. Use heapsort if the number of records is less than 10^20
and bubble sort if it is larger.

2. Use bubble sort if the number of records is less than
10^20 and heapsort if it is larger.

The first is O(n^2) time and the second is O(n log n). Which would
you choose?

Finally, even if the proportionality is exact, the constant must be
considered. Suppose we have two sort algorithms one in which the time
is 10^-30 * n^2 hours and the second in which the time is 100 n log n
hours. The first is O(n^2) and the second O(n log n). Which is
better for practical sized problems?

Michael M Rubenstein

Erik Naggum

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

* Dann Corbit

| If the problem is tiny, it may not matter what sort of algorithm, since
| all will finish in a blink. If the problem is huge, then it matters
| enormously, because for some problem size, O(N*log(N)) will always be
| better than O(n^2).

it appears that many people don't undertand the O notation. O indicates
complexity, not execution speed. the execution speed depends on a constant
factor that is insignificant compared to the variable factor as it grows
without bound. a function of complexity O(n log n) may have a constant
factor (k) that makes execution time (or other resources) _higher_ than a
function of complexity O(n^2) with a much smaller constant (c) as long as
the relation

k n
--- >> -----
c log n

holds. however, your statement that O(n log n) will _always_ be better
than O(n^2) does not hold, precisely because of the possibility of the
above relation.

since Andrew Koenig surprised many by limiting n and then arguing how
insignificant log n is compared to n, as if this is anything but obvious,
others have followed in his muddled footsteps and also conveniently forget
the constant factor or the actual meaning of the O notation. I'm frankly
amazed that this is possible for even half-educated computer scientists.

the O notation does _not_ indicate execution time. the O notation is the
_simplest_ function that yields an upper bound on the _relation_ between
the length of the input and the execution time (or space).

I recommend that Andrew Koenig and others who have been confused by him
read Alfred V. Aho and Jeffrey D. Ullman's excellent book "Foundations of
Computer Science" (Computer Science Press, 1992; ISBN 0-7167-8233-2), which
has grown out of the course CS109 -- Introduction to Computer Science at
Stanford University. the topic of chapter 3 is the running time of
programs. section 3.10 specifically analyses merge sort. it should be
instructive to more than the target audience of students, and any
programmer worth his salt _should_ be able to understand this chapter with
little effort. in particular, their treatment of constants and of the
simplification involved in the O notation is fundamental knowledge and
_should_ not come as a surprise to anyone who has been following these
discussions. _please_ take the time to look up this book. (Dr. Aho and
Dr. Ullman have both worked at Bell Labs, which is why I chose this
textbook over many other candidates. this will hopefully make it harder
for Andrew Koenig to ignore them, as he is wont to do for most everything
else shown to correct or contradict his inaccurate statements.)

#\Erik
--
Please address private replies, only, to "erik". Junk mail, spam,
stupid flames, courtesy copies, etc, should be sent to "nobody".

Rolf Czedzak

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

Dann Corbit wrote: <01bbe2fb$c692d1c0$c761...@DCorbit.solutionsiq.com>

DC> Rolf Czedzak <r...@viking.ruhr.com> wrote in article
DC> <6MIHn...@09.viking.ruhr.com>...
DC> [mega-snip]
DC> > AK> Again: O(N log N) is almost as good as O(N). It is not almost
DC> > AK> as good as O(1), and is much better than O(N^2).
DC> >
DC> > Its uncomparable. O(n) might easily be outscored by O(N^2) for all
DC> > practical problems.
DC> False.

Hi Dann,

I wrote 'uncomparable' and 'might'+'prcatical problems' in reply to
Andrew's statement about real life problems (= upper limints for N)
and comparability of O-Classes, which in fact are classifiers of
asymptotic behaviour.
What could be wrong with this?

Rolf

Tim Fischer

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

In article <32b06793...@nntp.ix.netcom.com>, mik...@ix.netcom.com
(Mike Rubenstein) wrote:
[references snipped]

> No. This shows a misunderstanding of what O(f(n)) means.

[snip]

> Thus heapsort is O(n!) and bubble sort is O(n^2). This does not mean
> that bubble sort is better than heapsort.

[snip]

I think most people here are using O when they really mean Theta.
Although this is a bit sloppy in terminology, in practice, I've found most
people do this, especially online, where it's not convenient to represent
a 'Theta'.

-Tim

--
---------------------------------------
Tim Fischer
Coda Music Technology

The following email address is mangled to prevent automated
unsolicited junk mail. Replace the '_AT_' with an '@':

tfischer_AT_codamusic.com

Erik Naggum

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

* Tim Fischer

| I think most people here are using O when they really mean Theta.

we mean Omega, sometimes called "big-oh".

textbooks on this topic abound. get one. read it.

Graham Hughes

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

-----BEGIN PGP SIGNED MESSAGE-----

Erik Naggum <nob...@naggum.no> writes:

>we mean Omega, sometimes called "big-oh".

Not always. When people call quicksort O(n log n), they really do mean
Theta.
- --
Graham Hughes (graham...@resnet.ucsb.edu)
alt.PGPlike-key."gra...@A-abe.resnet.ucsb.edu".finger.look.examine
alt.homelike-page."http://A-abe.resnet.ucsb.edu/~graham/".search.browse.view
alt.silliness."http://www.astro.su.se/~robert/aanvvv.html".look.go.laugh

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBMqfniSqNPSINiVE5AQEVKgQAgwBptSneV/Fp2n50LUJ59L0ZdlUP5SPC
D8Ber9e4ITn0+rnGcs2BfpfHdM+ys1c6Ot//tsHqJcfWMwHT0eZgfapeIqdkB/NO
YYqQ1b8AAFTuO//MF1WL2wAdAp4REgsVAmEczP6u8JjDFh44lhCC/5OLbvLqc4pY
29SBfV0oJh8=
=y74a
-----END PGP SIGNATURE-----

Dann Corbit

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

Mike Rubenstein <mik...@ix.netcom.com> wrote in article
<32b06793...@nntp.ix.netcom.com>...

[snip]
> Thus heapsort is O(n!) and bubble sort is O(n^2). This does not mean
> that bubble sort is better than heapsort.

In what sense is heapsort O(n!)?

See also my follow-up to Eric Naggum.
[snip]

Dann Corbit

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

Erik Naggum <nob...@naggum.no> wrote in article
<30588247...@naggum.no>...

> * Dann Corbit
> | If the problem is tiny, it may not matter what sort of algorithm, since
> | all will finish in a blink. If the problem is huge, then it matters
> | enormously, because for some problem size, O(N*log(N)) will always be
> | better than O(n^2).

[snip]

>
> the O notation does _not_ indicate execution time. the O notation is the
> _simplest_ function that yields an upper bound on the _relation_ between
> the length of the input and the execution time (or space).

But there is a relationship between big O notation and execution time.
You notice that I prefaced my statement "for some problem size", meaning
that given a large enough input set, an O(n) algorithm WILL outperform
(for the average case) an O(n^2) algorithm

| * /
| * /
| */
| T /*
| I / *
| M / *
| E / *
| / *
| / *
| / *
| *
+---------------------------------
f(0) f(i) f(k)

Input set size (n)

The asterisks are a parabolic curve y0 = a*x^2 + b
The diagonal lines are some linear equation y1 = m*x + b
Early on, because of startup and whatever, the algorithm
may not clearly fit big O notation behavior by examination.
By the time our algorithm f() takes an input of size i,
from that point forward, the running time of the algorithm
will always lay below the graphs for some functions y0 and y1.
Further, at some input set size f(k), the graph of y0, being
parabolic will always be above the graph of y1, being linear.
Of course, a specific case may run in linear time (or even 0
time) for the n^2 algorithm, but its average behavior is
more accurately defined as being limited by the parabola for
some constants a and b.

I also mentioned benchmarking as being necessary to make the
proper choice. None the less, there is clearly a relationship
between complexity and running time. In the case of Big O
notation applied as average case behavior (Vs worst case as
an alternative) we can statistically predict how long the
routine will take to complete on average. We cannot predict
how long a single run will take, except as a function of
probability.
[snip]

Mike Rubenstein

unread,

Dec 6, 1996, 3:00:00 AM12/6/96

to

a...@laphroig.mch.sni.de wrote:

> If you limit N to below a fixed value, then it doesn't make any sense
> to speak about O notation, because that make sense only in the limit:
> to say f(n) = O(g(n)) means lim f(n)/g(n) < c for some constant c.

Not quite. f(n) = O(g(n)) means that for sufficiently large n f(n) /
g(n) < c. There is no requirement that there be a limit. For
example, sin(n) = O(1) even though sin(n) / 1 does not approach a
limit.

Michael M Rubenstein

Steve Heller

unread,

Dec 7, 1996, 3:00:00 AM12/7/96

to

"Dann Corbit" <dco...@solutionsiq.com> wrote:

>I also mentioned benchmarking as being necessary to make the
>proper choice. None the less, there is clearly a relationship
>between complexity and running time. In the case of Big O
>notation applied as average case behavior (Vs worst case as
>an alternative) we can statistically predict how long the
>routine will take to complete on average. We cannot predict
>how long a single run will take, except as a function of
>probability.
>[snip]

That's another reason I like distribution counting (at least if you
have enough physical memory): its run time for a given number (and
size) of elements is essentially independent of the distribution or
arrangement of the keys and can therefore be predicted accurately.

Steve Heller, author and software engineer

http://ourworld.compuserve.com/homepages/steve_heller

Andrew Koenig

unread,

Dec 7, 1996, 3:00:00 AM12/7/96

to

In article <30588247...@naggum.no> Erik Naggum <nob...@naggum.no> writes:

> since Andrew Koenig surprised many by limiting n and then arguing how
> insignificant log n is compared to n, as if this is anything but obvious,
> others have followed in his muddled footsteps and also conveniently forget
> the constant factor or the actual meaning of the O notation. I'm frankly
> amazed that this is possible for even half-educated computer scientists.

Notation is usually defined by usage and implicit understanding.
When a complexity theorist uses O notation in formal writing, that
means something slightly different from what the practical programmers
I've seen mean when they use it in informal conversation.

For example, I understand that, formally speaking, if I can describe
f(n) as O(n), I can also describe it as O(n^2). However, that property
of the notation is not terribly useful in the context of my previous posting.

I don't think that anyone who wants to understand what I was saying
will have the slightest difficulty doing so.
--
--Andrew Koenig
a...@research.att.com

Reini Urban

unread,

Dec 8, 1996, 3:00:00 AM12/8/96

to

David Gillies <dagg...@vader.brad.ac.uk> recommended:

>Knuth Vol. 3, Sorting and Searching.

>Numerical Recipes (Press, Flannery, Teukolsky and Vetterling)

Sedgewick: Algorithms
is quite complete too and has nice illustrations, that's why I prefer
that one.
---
Reini Urban, TU Graz, Architecture & X-RAY
Attention! From: header is garbled on purpose!
<rur...@sbox.tu-graz.ac.at> http://xarch.tu-graz.ac.at/autocad/

Keir Spilka

unread,

Dec 14, 1996, 3:00:00 AM12/14/96

to

Sorry, I just got in on this discussion, but here's the straight goods.

On average, Quicksort is actually faster than Mergesort, but Mergesort has
a couple of nice features:

Mergesort GUARANTEES O(nlogn) runtime. By the same token, it always
performs all of the steps. Ie. There are no skipped steps or
advantageous cases in most codings of the algorithm. Unless you drop out
of the mergesort at a sufficiently small size of arrays to use an
insertion sort or something else which works well on small problems.

Furthermore Mergesort is a stable algorithm. Ie. It sorts identical
elements and leaves them in their original positions. This is
advantageous in certain conditions. Ie. Imagine a contest for guessing
the number of jelly beans in a jar. If a bunch of people have the correct
answer, you want the prize to go to the first person who submitted his
entry.....You get the picture.

THe main problem is extra space requirements, but its guaranteed runtime
is a big plus!

Keir
--
------------------------------------------------------------------------------
Keir Spilka knsp...@undergrad.math.uwaterloo.ca
2B CS/EEE Major http://www.undergrad.math.uwaterloo.ca/~knspilka
University of Waterloo

Marc Schwarz

unread,

Dec 15, 1996, 3:00:00 AM12/15/96

to

mik...@ix.netcom.com (Mike Rubenstein) writes:

> For one thing, O(f(n)) means that the time (or other quantity of
> interest) is smaller that A*f(n) for some constant A. It does not
> imply that it is not much smaller.
>

> Thus heapsort is O(n!) and bubble sort is O(n^2). This does not mean
> that bubble sort is better than heapsort.

You might be confusing heapsort with something else.
According to Sedgewick's book, "Algorithms," heapsort is, on
average, O(N log N); O(N * N) worst case.

To heapify the first element is order O(1).

To downheap each individual element is order O(log N), on
average, for a balanced heap, with a worst-case of order O(N)
for a "maximally" unbalanced heap.

A benefit of mergesort (over heapsort) is that it is NOT data
dependent; mergesort is order O(N log N) for all countable data
sets of finite length N.

I suggest you take a look at either Sedgewick, or the book,
"Intro to Algorithms" by Corman, Leiserson, and Rivest. If
you prefer "the classics" then take a look at "The Design
and Analysis of Computer Algorithms," by Aho, Hopcroft,
and Ullman. Keep in mind though that the Aho, Hopcroft,
and Ullman book doesn't handle Red/Black Trees or Quicksort
as well as Sedgewick. A **MUST** purchase for those interested
in an intro on the foundations of computer science is the
upcoming Principles of Algorithms text by Siegel and Cole
(the preferred algorithms text at NYU).

Marc.
-=-
sch...@cs.nyu.edu

Marty Hall

unread,

Dec 16, 1996, 3:00:00 AM12/16/96

to

knsp...@undergrad.math.uwaterloo.ca (Keir Spilka) writes:
> On average, Quicksort is actually faster than Mergesort

I made the same claim about 100 years ago in this same thread, and was
gently but immediately given compelling evidence that this is false
for the vast majority of situations.
- Marty

(proclaim '(inline skates))

sl...@lehigh.edu

unread,

Dec 17, 1996, 3:00:00 AM12/17/96

to

In article <591rqs$h...@news.nyu.edu>, sch...@barney.cs.nyu.edu (Marc Schwarz) w
rites:

>You might be confusing heapsort with something else.
>According to Sedgewick's book, "Algorithms," heapsort is, on
>average, O(N log N); O(N * N) worst case.

In fact, heapsort is quite robust. It is quicksort which is O(N^2) in the
worst case. Look at Computer Algorithms by Sara Baase page 80. The table
there gives heapsort as O(nlogn) in the general case and 2(n-1)logn in the
worst case. It also gives mergesort as O(nlogn) in general and nlogn in the
worst case. (The theta used there is a somewhat more stringent form of the
O.) A disadvantage of mergesort is the extra space requirement. For very
large lists which have to be kept on disk mergesort may be the only
practical sort. Here the disk overhead becomes a disadvantage.

S. L. Gulden
sl...@lehigh.edu

Christopher Oliver

unread,

Dec 17, 1996, 3:00:00 AM12/17/96

to

Marc Schwarz (sch...@barney.cs.nyu.edu) wrote:
: You might be confusing heapsort with something else.

: According to Sedgewick's book, "Algorithms," heapsort is, on
: average, O(N log N); O(N * N) worst case.

Heh? What?

: I suggest you take a look at either Sedgewick, or the book,

: "Intro to Algorithms" by Corman, Leiserson, and Rivest.

Citing the above (pp 44 & 145-149), heap building runs in O(N)
time, and the extraction from the heap runs in O(N log N). I
get the same sort of noise from Knuth too.

--
Christopher Oliver Traverse Communications
Systems Coordinator 223 Grandview Pkwy, Suite 108
oli...@traverse.com Traverse City, Michigan, 49684
The loop macro: because no language is complete without a little COBOL.

Jeff Barnett

unread,

Dec 20, 1996, 3:00:00 AM12/20/96

to

In article <595atr$b...@bert.traverse.com>, oli...@co.traverse.com (Christopher Oliver) writes:
|> Marc Schwarz (sch...@barney.cs.nyu.edu) wrote:
|> : You might be confusing heapsort with something else.
|> : According to Sedgewick's book, "Algorithms," heapsort is, on
|> : average, O(N log N); O(N * N) worst case.
|>
|> Heh? What?
|>
|> : I suggest you take a look at either Sedgewick, or the book,
|> : "Intro to Algorithms" by Corman, Leiserson, and Rivest.
|>
|> Citing the above (pp 44 & 145-149), heap building runs in O(N)

Unless you or CL+R or SEDg have made a real break through, heap
building takes NlogN just like extraction. BTW, if you want to
be annal about the whole thing, the times actually more resemble
\sum_{i=1}^{N} i log i.

|> time, and the extraction from the heap runs in O(N log N). I
|> get the same sort of noise from Knuth too.

Jeff Barnett

Christopher Oliver

unread,

Dec 21, 1996, 3:00:00 AM12/21/96

to

Jeff Barnett (jbar...@nrtc.northrop.com) wrote:
: |> Citing the above (pp 44 & 145-149), heap building runs in O(N)

: Unless you or CL+R or SEDg have made a real break through, heap
: building takes NlogN just like extraction. BTW, if you want to
: be annal about the whole thing, the times actually more resemble
: \sum_{i=1}^{N} i log i.

According to CL&R (I'm sorry for the ambiguity):

n \sum_{h=0}^{\lfloor lg n \rfloor} \frac{h}{2^h}

They bound the right term of the expression by letting n (and consequently
lg n) go to infinity and demonstrating it's equivalent to the derivative of
a geometric series so its value is bounded by a constant.

I.e. as n gets large, this term approaches 2.

See CL&R (p. 145) for a discussion of the algorithm (bottom up) and bounds
derivation.