quick sort and O(nlgn) worst case bound

mohangupta13

unread,

Nov 28, 2009, 10:23:35 AM11/28/09

to

hello all ,
Using median of median or just median to choose the pivot in quick
sort it can be deterministically proved that the worst case running
time of quicksort is O(nlgn) and not O(n^2) .

So why is that quick sort is still blamed to be having a worst case of
O(n^2).??
Is there some problem with the above mentioned methods of choosing the
pivot ?

thanks
Mohan

Willem

unread,

Nov 28, 2009, 10:35:01 AM11/28/09

to

mohangupta13 wrote:
) hello all ,
) Using median of median or just median to choose the pivot in quick
) sort it can be deterministically proved that the worst case running
) time of quicksort is O(nlgn) and not O(n^2) .
)
) So why is that quick sort is still blamed to be having a worst case of
) O(n^2).??
) Is there some problem with the above mentioned methods of choosing the
) pivot ?

Finding the median.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

mohangupta13

unread,

Nov 28, 2009, 10:58:02 AM11/28/09

to

On Nov 28, 8:35 pm, Willem <wil...@stack.nl> wrote:
> mohangupta13 wrote:
>
> ) hello all ,
> ) Using median of median or just median to choose the pivot in quick
> ) sort it can be deterministically proved that the worst case running
> ) time of quicksort is O(nlgn) and not O(n^2) .
> )
> ) So why is that quick sort is still blamed to be having a worst case of
> ) O(n^2).??
> ) Is there some problem with the above mentioned methods of choosing the
> ) pivot ?
>
> Finding the median.

we have well defined algorithm that can find the median in linear time
worst case and that linear time can be absorbed in the partition step
of quicksort(which is also linear time) giving a overall worst case
running time of O(nlgn)

Willem

unread,

Nov 28, 2009, 11:12:14 AM11/28/09

to

mohangupta13 wrote:
) On Nov 28, 8:35�pm, Willem <wil...@stack.nl> wrote:
)> Finding the median.
) we have well defined algorithm that can find the median in linear time
) worst case and that linear time can be absorbed in the partition step
) of quicksort(which is also linear time) giving a overall worst case
) running time of O(nlgn)

Yes, but practically speaking that well defined algorithm is quite slow.

mohangupta13

unread,

Nov 28, 2009, 1:26:36 PM11/28/09

to

On Nov 28, 9:12 pm, Willem <wil...@stack.nl> wrote:
> mohangupta13 wrote:
>
> ) On Nov 28, 8:35 pm, Willem <wil...@stack.nl> wrote:
> )> Finding the median.
> ) we have well defined algorithm that can find the median in linear time
> ) worst case and that linear time can be absorbed in the partition step
> ) of quicksort(which is also linear time) giving a overall worst case
> ) running time of O(nlgn)
>
> Yes, but practically speaking that well defined algorithm is quite slow.

But here I was referring theoretical consideration , if we see
practically then quicksort despite of it O(n^2) worst case is better
than O(dn) radix sort all other
sorting algorithms out there like merge sort and heap sort . (the
randomized quick sort of-course)
Even in all theoretical books i have seen them calling worst case
quick sort be O(n^2) .

Mohan

Willem

unread,

Nov 28, 2009, 1:53:56 PM11/28/09

to

mohangupta13 wrote:
) On Nov 28, 9:12�pm, Willem <wil...@stack.nl> wrote:
)> Yes, but practically speaking that well defined algorithm is quite slow.
)
) But here I was referring theoretical consideration

All practical implementations of quicksort don't use the median, so
that is the version of quicksort that people will be talking about,
in theoretical considerations also.

James Dow Allen

unread,

Nov 29, 2009, 3:06:11 AM11/29/09

to

On Nov 28, mohangupta13 <mohangupt...@gmail.com> wrote:
> Using median of median or just median to choose the pivot in quick
> sort it can be deterministically proved that the worst case running
> time of quicksort is O(nlgn) and not O(n^2) .

There's an algorithm to find medians with *average* O(n) cost,
and that leads directly to ordinary quicksort. You're
speaking of the *guaranteed* O(n) median finder, which is, I
think, significantly slower *on average* than the simple
median finder.

> So why is that quick sort is still blamed to be having a worst case of
> O(n^2).??

I believe Willem's responses were correct, but response from a
different perspective might have been less confusing.

The sort you imply -- call it mohansort -- would indeed be a
O(n log n) sort based on quick-sort. But there are other
sorts that are O(n log n) that would be faster on average,
because you've sacrificed the big advantages of quicksort:
coding simplicity and fast *average* speed.

(Perhaps a sorting algorithm should detect that it's encountered
pathological input (e.g. size after 4 divisions > 1/2 original
size when expected size is 1/16) and switch to a guaranteed
O(n log n) method....)

A few days ago, Ben gave a link to a very interesting paper
by Bentley which discussed the tuning of quicksort, and fixes
for various worst-case inputs. (I suppose Bentley-sort itself
still has some worst-case, O(n^2)-yielding input; perhaps
the black hats have engineered such data to use in a
denial-of-service attack!)

James

robert...@yahoo.com

unread,

Nov 30, 2009, 5:33:44 PM11/30/09

to

On Nov 29, 2:06 am, James Dow Allen <jdallen2...@yahoo.com> wrote:
> (Perhaps a sorting algorithm should detect that it's encountered
> pathological input (e.g. size after 4 divisions > 1/2 original
> size when expected size is 1/16) and switch to a guaranteed
> O(n log n) method....)

That would be Introsort. Basically uses Quicksort until a run of bad
(as in not very productive) partitioning operations is detected (using
a fairly straight-forward method of comparing the partition size and
the depth of the partition tree - basically what you suggested), then
switches to heap sort for that partition. David Musser's original
Introsort paper is worth reading just for the analysis of just how
easy it is to actually generate datasets with pathological
partitioning characteristics.

Stephen Howe

unread,

Dec 2, 2009, 3:11:22 PM12/2/09

to

Yes, they are O(N). All those O(N) calls add up.
Most pivots for Quicksort are chosen in O(1)

But this is academic. If you use Musser's hybrid Introsort, it wil lbe Quicksort with O(1) pivots and Heapsort where it blows
up. That guarantees O(N * lg N) performance in worst case and close to Quicksort in the average case.

Stephen Howe

unread,

Dec 2, 2009, 3:19:22 PM12/2/09

to

On Sun, 29 Nov 2009 00:06:11 -0800 (PST), James Dow Allen <jdall...@yahoo.com> wrote:

>A few days ago, Ben gave a link to a very interesting paper
>by Bentley which discussed the tuning of quicksort, and fixes
>for various worst-case inputs. (I suppose Bentley-sort itself
>still has some worst-case, O(n^2)-yielding input; perhaps
>the black hats have engineered such data to use in a
>denial-of-service attack!)
>
>James

Yes. See
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

which is an engineered comparison function for feeding into C's qsort()
If the C's qsort() is based on quicksort, even well-engineered quicksort (like Bentley's), it guarantees quadratic behaviour.

I think it is proven that all quicksorts that choose a pivot in O(1) will have a worst-case.

Stephen Howe

pete

unread,

Dec 21, 2009, 7:55:22 AM12/21/09

to

Outside of introsort, I'm unfamiliar with any quicksort
which is truly O(N * log(N)) for worst case.

A qsort type quicksort function called mqsort,
has a worst case (reverse ordered array) that takes
1.450 times as many comparison calls as its average case
when N equals 1024 and
1.468 times as many comparison calls as its average case
when N equals 32768.
That seems pretty close to O(N * log(N)) worst case, but not exactly there.

A very interesting sort is leapfrogging sample sort,
which has the amazing property of making the same number of compar
calls on a randomized array as mergesort does.
However it is O(N * log(N) * log(N)) worst case.
http://www.springerlink.com/content/p70564506802n575/
mergesort does theoretical lower information bound comparisons.
sfsort, which is a qsort type leapfrogging sample sort function
taken from the code here:
http://www.adnu.edu.ph/ccs/NCITE08/KeynoteLecture_Albacea_NewSortingLowerBound.pdf

Here in sfsort_test.c are:
mqsort,
sfsort,
and also included for contrast are:
m_sort, (a mergesort )
q_sort, (a very simple quicksort)

/* BEGIN sfsort_test.c */

#include <stdio.h>
#include <stdlib.h>
/*
** This program compares various qsort type
** array sorting functions, by counting compar calls.
*/
#define ARRAY_ELEM (32768 / 32)
#define TRIALS 10
/*
** Make sure that SEQUENCE and SEQUENCE_STR match.
*/
#define SEQUENCE RANDOM
#define SEQUENCE_STR "RANDOM"
/*
** #define SEQUENCE RANDOM
** #define SEQUENCE FORWARD
** #define SEQUENCE REVERSE
*/
/*
** Decrease KEY_RANGE, if more equal keys are wanted.
*/
#define KEY_RANGE 0XFFFFFFFFLU /* 0X1LU, 0XFFFFFFFFLU */
/* sfsort */ /*
** Leap frog sample quicksort
*/
/* m_sort */ /*
** stable mergesort,
** Number of comparisons:
** best case: N * log2(N) / 2
** worst case: N * log2(N) - N + 1
*/
/* q_sort */ /*
** Simple Lomuto qsort
*/
/* qsort */ /*
** Standard library qsort
** defined by your C implementation
*/
#define FUNCTIONS_AND_STRINGS { \
{sfsort, "sfsort"}, \
{mqsort, "mqsort"}, \
{m_sort, "m_sort"}, \
{q_sort, "q_sort"}, \
{qsort, " qsort"} \
}
#define BYTE_SWAP(A, B, S) \
{ \
p1 = (A); \
p2 = (B); \
end = p2 + (S); \
do { \
const unsigned char swap = *p1; \
\
*p1++ = *p2; \
*p2++ = swap; \
} while (p2 != end); \
}
#define NUM_LEN 12
#define SEED_RAND 123456789
#define RANDOM (lus_seed = LU_RAND(lus_seed) % KEY_RANGE)
#define FORWARD element
#define REVERSE (0LU - 1 - element)
#define LU_RAND(S) ((S) * 69069LU + 362437 & 0XFFFFFFFFLU)
#define NMEMB(A) (sizeof A / sizeof *A)
#define str(s) # s
#define xstr(s) str(s)

void
sfsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
void
m_sort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
void
q_sort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
void
mqsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
void
free_arrays(long unsigned **array, size_t nmemb);
int
comparison(const void *arg1, const void *arg2);
int
comparray(long unsigned *s1, long unsigned *s2, size_t nmemb,
int (*compar)(const void *, const void *));

static void
sfrog(size_t s1, size_t ss,
void *base, size_t u,
size_t size, int (*compar)(const void *, const void *));
static void
merg(unsigned char *buff, unsigned char *base,
size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
static void
m2sort(size_t sorted, void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
static void
block_swap(void *first, void *middle,
void *last, size_t size);
static void
reverse(unsigned char *first, unsigned char *last, size_t size);

long unsigned Comps;

int main(void)
{
size_t sort, element, n;
struct {
void (*func)(void *, size_t, size_t,
int (*)(const void *, const void *));
char *name;
} F[] = FUNCTIONS_AND_STRINGS;
long unsigned *array[NMEMB(F)];
long unsigned trial,
totalC[NMEMB(F)], CompCounter[NMEMB(F)];
long unsigned lus_seed = SEED_RAND;

for (n = 0; n != NMEMB(array); ++n) {
array[n] = malloc(ARRAY_ELEM * sizeof *array[n]);
if (array[n] == NULL) {
free_arrays(array, n);
fputs("\nmalloc() array failure\n", stderr);
exit(EXIT_FAILURE);
}
totalC[n] = CompCounter[n] = 0;
}
puts("/* BEGIN sfsort_test.c output */\n");
puts("Counting compar calls "
"for different qsort type functions,");
printf("sorting arrays of %lu keys.\n\n"
, (long unsigned)ARRAY_ELEM);
puts("SEQUENCE is " SEQUENCE_STR "\n");
printf(" ");
for (n = 0; n != NMEMB(array); ++n) {
printf(" %" xstr(NUM_LEN) "s", F[n].name);
}
printf("\n Trial:\n");
for (trial = 0; trial != TRIALS; ++trial) {
element = ARRAY_ELEM;
while (element-- != 0) {
array[0][element] = SEQUENCE;
for (sort = 1; NMEMB(F) > sort; ++sort) {
array[sort][element] = array[0][element];
}
}
for (n = 0; n != NMEMB(array); ++n) {
Comps = 0;
F[n].func(array[n], ARRAY_ELEM
, sizeof *array[n], comparison);
CompCounter[n] = Comps;
totalC[n] += CompCounter[n];
}
printf("%6lu:", trial + 1);
for (n = 0; n != NMEMB(array); ++n) {
printf(" %" xstr(NUM_LEN) "lu", CompCounter[n]);
}
putchar('\n');
for (sort = 1; NMEMB(F) > sort; ++sort) {
if (comparray(array[0], array[sort]
, ARRAY_ELEM, comparison))
{
fputs("\nSort discrepancy!!!\n", stderr);
free_arrays(array, n);
exit(EXIT_SUCCESS);
}
}
}
free_arrays(array, NMEMB(array));
printf("\n%lu Trial%s, %lu keys, "
"%lu possible different key value%s.\n\n"
, trial, trial == 1 ? "" : "s"
, (long unsigned)ARRAY_ELEM, KEY_RANGE
, KEY_RANGE == 1 ? "" : "s");
for (n = 0; n != NMEMB(array); ++n) {
printf("total %s comparisons: %" xstr(NUM_LEN) "lu\n"
, F[n].name, totalC[n]);
}
puts("\n/* END sfsort_test.c output */");
return 0;
}

void
sfsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *))
{
size_t s;

if (nmemb > 1) {
for (s = 1; nmemb - s > s; s += s + 1) {
sfrog(0, s - 1, base, s + s, size, compar);
}
sfrog(0, s - 1, base, nmemb - 1, size, compar);
}
}

static void
sfrog(size_t s1, size_t ss,
void *base, size_t u, size_t size,
int (*compar)(const void *, const void *))
{
unsigned char *p1, *p2, *end;
unsigned char *const A = base;

if ((size_t)(s1 - ss) != 1) {
if (u > ss) {
const size_t sm = (ss + s1) / 2;
const size_t sms = sm * size;
size_t j = (u + 1) * size;
size_t i = ss * size;

do {
i += size;
} while (j > i && compar(A + sms, A + i) > 0);
do {
j -= size;
} while (j > i && compar(A + j, A + sms) > 0);
while (j > i) {
BYTE_SWAP(A + j, A + i, size);
do {
i += size;
} while (j > i && compar(A + sms, A + i) > 0);
do {
j -= size;
} while (j > i && compar(A + j, A + sms) > 0);
}
if (j == i) {
j -= size;
}
i = ss * size;
if (j > i) {
size_t k = j;

while (i + size > sms) {
BYTE_SWAP(A + i, A + k, size);
k -= size;
i -= size;
}
j /= size;
} else {
j = ss;
}
sfrog(s1, sm - 1, A, sm + j - ss - 1, size, compar);
sfrog(sm + j - ss + 1, j, A, u, size, compar);
}
} else {
sfsort(A + s1 * size, u - ss, size, compar);
}
}

void
m_sort(void *base,
size_t nmemb,
size_t size,
int (*compar)(const void *, const void *))
{
if (nmemb > 1) {
void *const buff = malloc(nmemb / 2 * size);

if (buff != NULL) {
merg(buff, base, nmemb, size, compar);
free(buff);
} else {
puts("\nmalloc failure in m_sort function.\n");
}
}
}

static void
merg(unsigned char *buff,
unsigned char *base,
size_t nmemb,
size_t size,
int (*compar)(const void *, const void *))
{
if (nmemb > 1) {
const void *const after = nmemb * size + base;
const size_t half_nmemb = nmemb / 2;
const size_t half_bytes = half_nmemb * size;
unsigned char *midd = base + half_bytes;

merg(buff, midd, nmemb - half_nmemb, size, compar);
merg(buff, base, half_nmemb, size, compar);
do {
if (compar(base, midd) > 0) {
const unsigned char *end = midd;

buff += half_bytes;
do {
*--buff = *--end;
} while (end != base);
end += size;
do {
*base++ = *midd++;
} while (base != end);
do {
if (midd != after) {
end += size;
if (compar(buff, midd) > 0) {
do {
*base++ = *midd++;
} while (base != end);
} else {
do {
*base++ = *buff++;
} while (base != end);
}
} else {
do {
*base++ = *buff++;
} while (base != midd);
}
} while (base != midd);
} else {
base += size;
}
} while (base != midd);
}
}

void
q_sort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *))
{
unsigned char *p1, *p2, *end;
unsigned char *left;
size_t nmemb_right, middle, last, right;

left = base;
while (nmemb-- > 1) {
right = nmemb * size;
last = middle = 0;
do {
middle += size;
if (compar(left, left + middle) > 0) {
last += size;
BYTE_SWAP(left + middle, left + last, size);
}
} while (middle != right);
BYTE_SWAP(left, left + last, size);
nmemb = last / size;
nmemb_right = (right - last) / size;
if (nmemb_right > nmemb) {
q_sort(left, nmemb, size, compar);
left += last + size;
nmemb = nmemb_right;
} else {
q_sort(left + last + size, nmemb_right, size, compar);
}
}
}

void
mqsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *))
{
m2sort(1, base, nmemb, size, compar);
}

static void
m2sort(size_t sorted, void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *))
{
if (sorted == 0) {
sorted = 1;
}
if (nmemb > sorted) {
size_t pivot;
size_t sorted_bytes;
size_t midd;
unsigned char *const left = base;
const size_t right = (nmemb - 1) * size;
size_t last = right;

while (nmemb - sorted > sorted) {
m2sort(sorted, base, sorted * 2, size, compar);
sorted *= 2;
}
pivot = sorted / 2 * size;
sorted_bytes = sorted * size;
midd = sorted_bytes;
while (last > midd && compar(left + pivot, left + midd) > 0) {
midd += size;
}
while (last >= midd && compar(left + last, left + pivot) > 0) {
last -= size;
}
while (last > midd) {
unsigned char *p1, *p2, *end;

BYTE_SWAP(left + midd, left + last, size);
do {
midd += size;
} while (midd != last && compar(left + pivot, left + midd)
> 0);
do {
last -= size;
} while (last >= midd && compar(left + last, left + pivot)
> 0);
}
if (last >= sorted_bytes) {
size_t nmemb_right;

block_swap(left + pivot, left + sorted_bytes, left + last,
size);
pivot += last - sorted_bytes + size;
nmemb_right = (right - pivot) / size;
nmemb = pivot / size;
m2sort(sorted / 2,
left, nmemb, size, compar);
m2sort((sorted - 1) / 2,
left + pivot + size, nmemb_right, size, compar);
} else {
m2sort((sorted - 1) / 2,
left + pivot + size, nmemb - sorted / 2 - 1, size, compar);
}
}
}

static void
block_swap(void *first, void *middle,
void *last, size_t size)
{
unsigned char *const m = middle;

reverse(first, m - size, size);
reverse(middle, last, size);
reverse(first, last, size);
}

static void
reverse(unsigned char *first, unsigned char *last, size_t size)
{
unsigned char *p1, *p2, *end;

while (last > first) {
BYTE_SWAP(first, last, size);
first += size;
last -= size;
}
}

int
comparison(const void *arg1, const void *arg2)
{
++Comps;
return *(const long unsigned *)arg2
> *(const long unsigned *)arg1 ? -1
: *(const long unsigned *)arg2
!= *(const long unsigned *)arg1;
}

void
free_arrays(long unsigned **array, size_t nmemb)
{
while (nmemb-- != 0) {
free(array[nmemb]);
}
}

int
comparray(long unsigned *s1, long unsigned *s2, size_t nmemb,
int (*compar)(const void *, const void *))
{
int rc = 0;

while (nmemb--) {
rc = compar(s1, s2);
if (rc != 0) {
break;
}
++s1;
++s2;
}
return rc;
}

/* END sfsort_test.c */

--
pete

Willem

unread,

Dec 21, 2009, 1:29:21 PM12/21/09

to

pete wrote:
) Stephen Howe wrote:
)> Yes, they are O(N). All those O(N) calls add up.
)> Most pivots for Quicksort are chosen in O(1)
)>
)> But this is academic. If you use Musser's hybrid Introsort, it wil lbe Quicksort with O(1) pivots and Heapsort where it blows
)> up. That guarantees O(N * lg N) performance in worst case and close to Quicksort in the average case.
)
) Outside of introsort, I'm unfamiliar with any quicksort
) which is truly O(N * log(N)) for worst case.

There is an O(N) algorithm to find the median of an array.
Using that, you can get O(N * log(N)) quicksort.
But that is purely academical, because with that algorithm, the
constant factor becomes much much worse than other O(NlgN) sorts.

John Slimick

unread,

Dec 21, 2009, 5:20:17 PM12/21/09

to

On 2009-12-21, Willem <wil...@stack.nl> wrote:

> pete wrote:
>
> There is an O(N) algorithm to find the median of an array.
> Using that, you can get O(N * log(N)) quicksort.
> But that is purely academical, because with that algorithm, the
> constant factor becomes much much worse than other O(NlgN) sorts.
>
>
> SaSW, Willem

My recollection is that older implementations of QS
were O(n*lg n) but that worst cases (sorting a constant
array) were more like O(n ^ 2). This is why for years
people depended on binary merge since the time was
almost always the same -- also O(n*lg n) -- for
the same size arrays.

john slimick
university of pittsburgh at bradford
sli...@pitt.edu

Stephen Howe

unread,

Jan 3, 2010, 1:55:14 PM1/3/10

to

On Mon, 21 Dec 2009 07:55:22 -0500, pete <pfi...@mindspring.com> wrote:

>> But this is academic. If you use Musser's hybrid Introsort, it wil lbe Quicksort with O(1) pivots and Heapsort where it blows
>> up. That guarantees O(N * lg N) performance in worst case and close to Quicksort in the average case.
>
>Outside of introsort, I'm unfamiliar with any quicksort
>which is truly O(N * log(N)) for worst case.

Then feed your sort functions the adversary from
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

I think you will find they blow up (anything that is using a O(1) method to get a pivot).

Stephen Howe

Richard Harter

unread,

Jan 3, 2010, 2:10:25 PM1/3/10

to

I hope that if you reread the material you commented on you will
see the true value of your comment.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.

Daniel Pitts

unread,

Jan 3, 2010, 3:01:47 PM1/3/10

to

Richard Harter wrote:
> On Sun, 03 Jan 2010 18:55:14 +0000, Stephen Howe
> <sjhoweATdialDOTpipexDOTcom> wrote:
>
>> On Mon, 21 Dec 2009 07:55:22 -0500, pete <pfi...@mindspring.com> wrote:
>>
>>>> But this is academic. If you use Musser's hybrid Introsort, it wil lbe Quicksort with O(1) pivots and Heapsort where it blows
>>>> up. That guarantees O(N * lg N) performance in worst case and close to Quicksort in the average case.
>>> Outside of introsort, I'm unfamiliar with any quicksort
>>> which is truly O(N * log(N)) for worst case.
>> Then feed your sort functions the adversary from
>> http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
>>
>> I think you will find they blow up (anything that is using a O(1) method to get a pivot).
>
> I hope that if you reread the material you commented on you will
> see the true value of your comment.

I misread the context myself. I'm not sure if it was because it was a
hidden double negative ("Outside of" + "unfamiliar"), or just the way
the line wraps, but the message was easily misinterpreted as "quicksort
have worst case (N*log(N))"

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Jon Harrop

unread,

Jan 5, 2010, 11:22:19 AM1/5/10

to

I'm confused by this discussion. Quicksort with median pivot is O(n log n),
right?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Ben Bacarisse

unread,

Jan 5, 2010, 12:28:32 PM1/5/10

to

Jon Harrop <j...@ffconsultancy.com> writes:

Yes, given a sound definition of all the terms, but the paper uses a
non-standard view of the input (and, consequently, of the whole
problem of comparison sorting). From the abstract:

"Quicksort can be made to go quadratic by constructing input on the
fly in response to the sequence of items compared."

I.e. there is no know input until the end of the sort! You can't
apply the O(1) median algorithm in such circumstances (actually I am
guessing here -- I assume you can't but I have not checked).

--
Ben.

Willem

unread,

Jan 5, 2010, 12:40:15 PM1/5/10

to

Ben Bacarisse wrote:
) Yes, given a sound definition of all the terms, but the paper uses a
) non-standard view of the input (and, consequently, of the whole
) problem of comparison sorting). From the abstract:
)
) "Quicksort can be made to go quadratic by constructing input on the
) fly in response to the sequence of items compared."
)
) I.e. there is no know input until the end of the sort! You can't
) apply the O(1) median algorithm in such circumstances (actually I am
) guessing here -- I assume you can't but I have not checked).

Median pivot selection is O(N), isn't it ?

Constructing input in the fly seems nonsense anyway, the very first
division will compare (and thus fix) all the elements, so you can't
influence any of the following divisions.

Richard Harter

unread,

Jan 5, 2010, 12:39:48 PM1/5/10

to

It is confusing. Briefly:

(1) Pete said (paraphrased) that no quicksort (except introsort)
was O(n log n) worst case - at least as far as he knew.

(2) Stephen Howe misread what Pete wrote. He understood Pete to
be asserting that quicksort was O(n log n) worst case. He
pointed to a process for generating a counter example for pivot
methods that are O(1).

(3) Richard Harter (that's me) pointed out that Stephen had
misread what Pete had written.

(4) Stephen Howe agreed that he had misread Pete - quite
understandably, I will add. Pete's wording was a trifle obscure.

(5) Jon Harrop (that's you) pointed out that quicksort using a
median pivot is O(n log n).

-----

There is a nomenclature problem here. The term, quicksort,
denotes a family of algorithms defined by a general pattern, to
wit:

(a) Use some method for selecting one element as a pivot.
(b) Compare each element with the pivot.
(c) Use the comparison result to divide the data into segments.
There can be 2 or 3 segments, depending on variant.
(d) Order the segments.
(e) Apply the procedure recursively to each segment not equal to
the pivot.

It is well known that the worst case behaviour of quicksort
algorithms depends upon the method used for pivot selection.
Thus the worst case is O(n^2) if the pivot selection method is
O(1). It is O(n log n) if (A) the method is O(n) at worst and
(B) there is some constant theta such that the segment with the
fewest elements has at least theta*n elements. (The wording gets
more complicated for fat pivots).

HTH. HAND.

Ben Bacarisse

unread,

Jan 5, 2010, 12:56:27 PM1/5/10

to

Willem <wil...@stack.nl> writes:

> Ben Bacarisse wrote:
> ) Yes, given a sound definition of all the terms, but the paper uses a
> ) non-standard view of the input (and, consequently, of the whole
> ) problem of comparison sorting). From the abstract:
> )
> ) "Quicksort can be made to go quadratic by constructing input on the
> ) fly in response to the sequence of items compared."
> )
> ) I.e. there is no know input until the end of the sort! You can't
> ) apply the O(1) median algorithm in such circumstances (actually I am
> ) guessing here -- I assume you can't but I have not checked).
>
> Median pivot selection is O(N), isn't it ?

Yes. I was thinking about the bounds quoted in the paper[1] and wrote
that instead! So, quite apart from what I wrote, any algorithm for
picking the median that is not O(1) is excluded so the paper does
apply to all quicksorts.

[1] The first restriction on the algorithms considered is: "Pick a
data item as pivot. We assume that this phase uses O(1) comparisons"

<snip>
--
Ben.

Richard Harter

unread,

Jan 5, 2010, 1:02:52 PM1/5/10

to

On Tue, 5 Jan 2010 17:40:15 +0000 (UTC), Willem <wil...@stack.nl>
wrote:

>Ben Bacarisse wrote:
>) Yes, given a sound definition of all the terms, but the paper uses a
>) non-standard view of the input (and, consequently, of the whole
>) problem of comparison sorting). From the abstract:
>)
>) "Quicksort can be made to go quadratic by constructing input on the
>) fly in response to the sequence of items compared."
>)
>) I.e. there is no know input until the end of the sort! You can't
>) apply the O(1) median algorithm in such circumstances (actually I am
>) guessing here -- I assume you can't but I have not checked).
>
>Median pivot selection is O(N), isn't it ?

Correct, sort of. (There are O(N) median algorithms but they are
impractical.)

>
>Constructing input in the fly seems nonsense anyway, the very first
>division will compare (and thus fix) all the elements, so you can't
>influence any of the following divisions.

Not at all. The idea is very simple. All you really know is
that the elements in the "small" segment are smaller than the
pivot and the ones in the "large" segment are larger than the
pivot. I am the oracle. You use an O(1) algorithm to pick the
pivot. I produce one element that is "small". All of the rest
are "large" but you don't get to see them. Pivot selection looks
at k elements for some fixed k. At each step I only need to
produce k+1 elements. When we are all done we have generated a
data set that will be O(n^2).

It's a standard oracle argument.

Willem

unread,

Jan 5, 2010, 1:24:55 PM1/5/10

to

Richard Harter wrote:
) On Tue, 5 Jan 2010 17:40:15 +0000 (UTC), Willem <wil...@stack.nl>
) wrote:
)>Median pivot selection is O(N), isn't it ?
)
) Correct, sort of. (There are O(N) median algorithms but they are
) impractical.)

Practicality is not an issue. The question is if median pivot selection is
in O(1), which it clearly isn't.

)>Constructing input in the fly seems nonsense anyway, the very first
)>division will compare (and thus fix) all the elements, so you can't
)>influence any of the following divisions.
)
) Not at all. The idea is very simple. All you really know is
) that the elements in the "small" segment are smaller than the
) pivot and the ones in the "large" segment are larger than the
) pivot. I am the oracle. You use an O(1) algorithm to pick the
) pivot. I produce one element that is "small". All of the rest
) are "large" but you don't get to see them.

Oh I see. You only get to know the result of the comparison, and not
the actual element itself, which can then be changed later (as long
as you don't change the results of any of the comparisons).

Daniel Pitts

unread,

Jan 5, 2010, 3:38:12 PM1/5/10

to

Did you read the document linked above?

If the pivot is chosen in O(1) time, then a data-set could cause a
pathological condition and quicksort's performance would degrade to near
n^2 time complexity, instead of n log n.

If you make pivot selection an O(N) operation, then quicksort
automatically becomes O(N^2) (actually becomes worse than selection sort).

Richard Harter

unread,

Jan 5, 2010, 4:38:26 PM1/5/10

to

This is incorrect. A moment's reflection should show you why.
Hint: What is the cost of partitioning.

Daniel Pitts

unread,

Jan 5, 2010, 7:04:06 PM1/5/10

to

Ah, I see. I was forgetting the order of operations. Partitioning is
O(N). So, if picking the "best" pivot is O(N), you end up with the loop
being N+N, which is still O(N). Then you still have divide and conquer,
so O(N log N). You end up with about the same constant multiplier as
heapsort, which is approximately 2 * N log N.

I wonder what the time overhead for building a balanced tree would be.
Would that be about the same as 2 * N log N? I know the space required
would increase.

Pascal J. Bourguignon

unread,

Jan 5, 2010, 6:22:23 PM1/5/10

to

Daniel Pitts <newsgroup....@virtualinfinity.net> writes:

> If the pivot is chosen in O(1) time, then a data-set could cause a
> pathological condition and quicksort's performance would degrade to
> near n^2 time complexity, instead of n log n.

The pivot could be choose randomly (in O(1)). However, the input set
could still be (2 2 2 2 2 2 ... 2), so that a blind quick sort could
still be O(n�).

> If you make pivot selection an O(N) operation, then quicksort
> automatically becomes O(N^2) (actually becomes worse than selection
> sort).

--
__Pascal Bourguignon__ http://www.informatimago.com/

Jon Harrop

unread,

Jan 6, 2010, 8:16:14 PM1/6/10

to

Ben Bacarisse wrote:
> Yes. I was thinking about the bounds quoted in the paper[1] and wrote
> that instead! So, quite apart from what I wrote, any algorithm for
> picking the median that is not O(1) is excluded so the paper does
> apply to all quicksorts.

The paper does not apply to all quicksorts.

Alf P. Steinbach

unread,

Jan 6, 2010, 7:17:38 PM1/6/10

to

* Jon Harrop:

> Ben Bacarisse wrote:
>> Yes. I was thinking about the bounds quoted in the paper[1] and wrote
>> that instead! So, quite apart from what I wrote, any algorithm for
>> picking the median that is not O(1) is excluded so the paper does
>> apply to all quicksorts.
>
> The paper does not apply to all quicksorts.
>

There was a paper once by I think it was Doug McIlroy, showing how to
dynamically foil any quicksort, forcing it to qudratic time. :-)

Cheers,

- Alf

Moi

unread,

Jan 6, 2010, 7:28:00 PM1/6/10

to

Yes. A link was posted in this thread a few days ago.
(Steven Howe, Sunday 18:55 UTC)

HTH,
AvK

pete

unread,

Jan 6, 2010, 9:09:38 PM1/6/10

to

Yes.
It also happens to be the paper under discussion.
And as Jon Harrop has stated,
the paper does not apply to all quicksorts.

--
pete

pete

unread,

Jan 12, 2010, 8:02:39 AM1/12/10

to

Pascal J. Bourguignon wrote:
> Daniel Pitts <newsgroup....@virtualinfinity.net> writes:
>
>
>>If the pivot is chosen in O(1) time, then a data-set could cause a
>>pathological condition and quicksort's performance would degrade to
>>near n^2 time complexity, instead of n log n.
>
>
> The pivot could be choose randomly (in O(1)). However, the input set
> could still be (2 2 2 2 2 2 ... 2), so that a blind quick sort could
> still be O(n�).

That would depend on the looping style
with which quicksort was implemented.

It would be O(N*N) for Lomuto style loops, as in q_sort,
but less than N*log2(N) for Sedgewick style loops,
as in q1sort.

/* BEGIN q1sort_test.c output */

Counting compar calls for different qsort type functions,

sorting arrays of 16384 keys.

SEQUENCE is RANDOM

q_sort q1sort
Trial:
1: 134209536 212993

1 Trial, 16384 keys, 1 possible key value.

total q_sort comparisons: 134209536
total q1sort comparisons: 212993

/* END q1sort_test.c output */

/* BEGIN q1sort_test.c */

#include <stdio.h>
#include <stdlib.h>
/*
** This program compares various qsort type
** array sorting functions, by counting compar calls.
*/

#define ARRAY_ELEM (32768 / 2)
#define TRIALS 1

/*
** Make sure that SEQUENCE and SEQUENCE_STR match.
*/
#define SEQUENCE RANDOM
#define SEQUENCE_STR "RANDOM"
/*
** #define SEQUENCE RANDOM
** #define SEQUENCE FORWARD
** #define SEQUENCE REVERSE
*/
/*
** Decrease KEY_RANGE, if more equal keys are wanted.
*/

#define KEY_RANGE 0X1LU /* 0X1LU, 0XFFFFFFFFLU */

/* q_sort */ /*
** Simple Lomuto qsort
*/

/* q1sort */ /*
** Sedgewick style qsort
*/
#define FUNCTIONS_AND_STRINGS { \
{q_sort, "q_sort"}, \
{q1sort, "q1sort"} \

}
#define BYTE_SWAP(A, B, S) \
{ \
p1 = (A); \
p2 = (B); \
end = p2 + (S); \
do { \
const unsigned char swap = *p1; \
\
*p1++ = *p2; \
*p2++ = swap; \
} while (p2 != end); \
}
#define NUM_LEN 12
#define SEED_RAND 123456789
#define RANDOM (lus_seed = LU_RAND(lus_seed) % KEY_RANGE)
#define FORWARD element
#define REVERSE (0LU - 1 - element)
#define LU_RAND(S) ((S) * 69069LU + 362437 & 0XFFFFFFFFLU)
#define NMEMB(A) (sizeof A / sizeof *A)
#define str(s) # s
#define xstr(s) str(s)

void

q_sort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
void

q1sort(void *base, size_t nmemb, size_t size,

int (*compar)(const void *, const void *));
void
free_arrays(long unsigned **array, size_t nmemb);
int
comparison(const void *arg1, const void *arg2);
int
comparray(long unsigned *s1, long unsigned *s2, size_t nmemb,
int (*compar)(const void *, const void *));

long unsigned Comps;

int main(void)
{
size_t sort, element, n;

long unsigned lus_seed = SEED_RAND;

const long unsigned arrar_elem = ARRAY_ELEM;

struct {
void (*func)(void *, size_t, size_t,
int (*)(const void *, const void *));
char *name;
} F[] = FUNCTIONS_AND_STRINGS;
long unsigned *array[NMEMB(F)];
long unsigned trial,
totalC[NMEMB(F)], CompCounter[NMEMB(F)];

for (n = 0; n != NMEMB(array); ++n) {
array[n] = malloc(arrar_elem * sizeof *array[n]);

if (array[n] == NULL) {
free_arrays(array, n);
fputs("\nmalloc() array failure\n", stderr);
exit(EXIT_FAILURE);
}
totalC[n] = CompCounter[n] = 0;
}

puts("/* BEGIN q1sort_test.c output */\n");

puts("Counting compar calls "
"for different qsort type functions,");

printf("sorting arrays of %lu keys.\n\n", arrar_elem);

puts("SEQUENCE is " SEQUENCE_STR "\n");
printf(" ");
for (n = 0; n != NMEMB(array); ++n) {
printf(" %" xstr(NUM_LEN) "s", F[n].name);
}
printf("\n Trial:\n");
for (trial = 0; trial != TRIALS; ++trial) {

element = arrar_elem;

while (element-- != 0) {
array[0][element] = SEQUENCE;
for (sort = 1; NMEMB(F) > sort; ++sort) {
array[sort][element] = array[0][element];
}
}
for (n = 0; n != NMEMB(array); ++n) {
Comps = 0;

F[n].func(array[n], arrar_elem

, sizeof *array[n], comparison);
CompCounter[n] = Comps;
totalC[n] += CompCounter[n];
}
printf("%6lu:", trial + 1);
for (n = 0; n != NMEMB(array); ++n) {
printf(" %" xstr(NUM_LEN) "lu", CompCounter[n]);
}
putchar('\n');
for (sort = 1; NMEMB(F) > sort; ++sort) {
if (comparray(array[0], array[sort]

, arrar_elem, comparison))

{
fputs("\nSort discrepancy!!!\n", stderr);
free_arrays(array, n);
exit(EXIT_SUCCESS);
}
}
}
free_arrays(array, NMEMB(array));
printf("\n%lu Trial%s, %lu keys, "

"%lu possible key value%s.\n\n"

, trial, trial == 1 ? "" : "s"

, arrar_elem, KEY_RANGE

, KEY_RANGE == 1 ? "" : "s");
for (n = 0; n != NMEMB(array); ++n) {
printf("total %s comparisons: %" xstr(NUM_LEN) "lu\n"
, F[n].name, totalC[n]);
}

puts("\n/* END q1sort_test.c output */");
return 0;
}

void

q_sort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *))
{
unsigned char *p1, *p2, *end;
unsigned char *left;
size_t nmemb_right, middle, last, right;

left = base;
while (nmemb-- > 1) {
right = nmemb * size;
last = middle = 0;
do {
middle += size;
if (compar(left, left + middle) > 0) {
last += size;
BYTE_SWAP(left + middle, left + last, size);
}
} while (middle != right);
BYTE_SWAP(left, left + last, size);
nmemb = last / size;
nmemb_right = (right - last) / size;
if (nmemb_right > nmemb) {
q_sort(left, nmemb, size, compar);
left += last + size;
nmemb = nmemb_right;
} else {
q_sort(left + last + size, nmemb_right, size, compar);
}
}
}

void q1sort(void *base, size_t nmemb, size_t size,

int (*compar)(const void*, const void*))

{

unsigned char *left;
size_t nmemb_right, middle, last, right;

unsigned char *p1, *p2, *end;

left = base;
while (nmemb-- > 1) {
last = right = nmemb * size;
middle = size;
while (middle != last && compar(left, left + middle) > 0) {
middle += size;
}
while (compar(left + last, left) > 0) {
last -= size;
}
while (last > middle) {

BYTE_SWAP(left + middle, left + last, size);

do {
middle += size;
} while (compar(left, left + middle) > 0);
do {
last -= size;
} while (compar(left + last, left) > 0);

}
BYTE_SWAP(left, left + last, size);
nmemb = last / size;
nmemb_right = (right - last) / size;
if (nmemb_right > nmemb) {

q1sort(left, nmemb, size, compar);

left += last + size;
nmemb = nmemb_right;
} else {

q1sort(left + last + size, nmemb_right, size, compar);
}
}
}

int
comparison(const void *arg1, const void *arg2)

{
const long unsigned a = *(const long unsigned *)arg1;
const long unsigned b = *(const long unsigned *)arg2;

++Comps;
return b > a ? -1 : b != a;
}

void
free_arrays(long unsigned **array, size_t nmemb)
{
while (nmemb-- != 0) {
free(array[nmemb]);
}
}

int
comparray(long unsigned *s1, long unsigned *s2, size_t nmemb,
int (*compar)(const void *, const void *))
{
int rc = 0;

while (nmemb--) {
rc = compar(s1, s2);
if (rc != 0) {
break;
}
++s1;
++s2;
}
return rc;
}

/* END q1sort_test.c */

--
pete

Jon Harrop

unread,

Jan 20, 2010, 12:13:29 PM1/20/10

to

Pascal J. Bourguignon wrote:
> Daniel Pitts <newsgroup....@virtualinfinity.net> writes:
>> If the pivot is chosen in O(1) time, then a data-set could cause a
>> pathological condition and quicksort's performance would degrade to
>> near n^2 time complexity, instead of n log n.
>
> The pivot could be choose randomly (in O(1)). However, the input set
> could still be (2 2 2 2 2 2 ... 2), so that a blind quick sort could
> still be O(n²).

That kind of "blind" quicksort is always going to be O(n^2) if a proportion
of the input values are the same but the solution is easy: just subdivide
into xs<ys<zs and only quicksort the xs and zs.

pete

unread,

Jan 21, 2010, 6:54:03 PM1/21/10

to

Jon Harrop wrote:
> Pascal J. Bourguignon wrote:
>> Daniel Pitts <newsgroup....@virtualinfinity.net> writes:
>>> If the pivot is chosen in O(1) time, then a data-set could cause a
>>> pathological condition and quicksort's performance would degrade to
>>> near n^2 time complexity, instead of n log n.
>> The pivot could be choose randomly (in O(1)). However, the input set
>> could still be (2 2 2 2 2 2 ... 2), so that a blind quick sort could
>> still be O(n²).
>
> That kind of "blind" quicksort is always going to be O(n^2) if a proportion
> of the input values are the same but the solution is easy: just subdivide
> into xs<ys<zs and only quicksort the xs and zs.

I think it's much easier just to use Sedgewick style loops
and to swap array elements which compare equal to the pivot.

--
pete

Stephen Howe

unread,

Jan 29, 2010, 10:55:21 AM1/29/10

to

>It also happens to be the paper under discussion.
>And as Jon Harrop has stated,
>the paper does not apply to all quicksorts.

Yes does the paper apply to quicksorts where the pivot selection is O(1) ?
If the pivot selection is O(N) you can gurantee to get the median every time.

I think I am saying that all quicksorts will have at least 1 worst case if the pivot selection is O(1)

Pardon me for misunderstanding you :-)

Cheers

Stephen Howe

Richard Heathfield

unread,

Jan 29, 2010, 11:10:02 AM1/29/10

to

Stephen Howe wrote:
>> It also happens to be the paper under discussion.
>> And as Jon Harrop has stated,
>> the paper does not apply to all quicksorts.
>
> Yes does the paper apply to quicksorts where the pivot selection is O(1) ?
> If the pivot selection is O(N) you can gurantee to get the median every time.
>
> I think I am saying that all quicksorts will have at least 1 worst case if the pivot selection is O(1)

Um, did you mean what you just said? Surely *all* algorithms have at
least 1 worst case?

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

Stephen Howe

unread,

Jan 29, 2010, 11:35:44 AM1/29/10

to

>> I think I am saying that all quicksorts will have at least 1 worst case if the pivot selection is O(1)
>
>Um, did you mean what you just said? Surely *all* algorithms have at
>least 1 worst case?

All right, rephrasing that:

I think I am saying that all quicksorts will have at least 1 worst case that is O(N * log N) if the pivot selection is O(1)

Stephen Howe

Ben Bacarisse

unread,

Jan 29, 2010, 1:52:18 PM1/29/10

to

Stephen Howe <sjhoweATdialDOTpipexDOTcom> writes:

That needs re-phrasing as well.

First off, did you mean O(N * log N) there? I suspect you want to say
that with O(1) pivot selection all quicksorts are Theta(n * n).
(Theta is both an upper and a lower bound so it is more helpful when
describing how bad things can be. An O(n * log n) algorithm is also
O(n * n) and O(n * n * n) etc.)

The other problem is that there is no meaning of big O for a single
case. No single bad case can have any effect on the asymptotic
behaviour of an algorithm. You probably meant that there is some
pattern of cases that uncovers that bad performance.

--
Ben.

Stephen Howe

unread,

Jan 29, 2010, 2:57:15 PM1/29/10

to

>> All right, rephrasing that:
>>
>> I think I am saying that all quicksorts will have at least 1 worst
>> case that is O(N * log N) if the pivot selection is O(1)
>
>That needs re-phrasing as well.

Yes, my mistake.
I think I am saying that for all quicksort algorithm where the pivot selection is O(1), there exists a family of distributions
as N tends to infinity such that sorting is O(N *N).

>First off, did you mean O(N * log N) there? I suspect you want to say
>that with O(1) pivot selection all quicksorts are Theta(n * n).

Yes. I have just been thinking about the existence of a lower bound.
But it is only applied for the distributions that blow up.
Otherwise it is not a lower bound.

>The other problem is that there is no meaning of big O for a single
>case. No single bad case can have any effect on the asymptotic
>behaviour of an algorithm. You probably meant that there is some
>pattern of cases that uncovers that bad performance.

Yes. I am saying there is a at least 1 distribution as N => infinity if pivot selection is O(1)

Stephen Howe

Ben Bacarisse

unread,

Jan 29, 2010, 4:37:59 PM1/29/10

to

Stephen Howe <sjhoweATdialDOTpipexDOTcom> writes:

>>> All right, rephrasing that:
>>>
>>> I think I am saying that all quicksorts will have at least 1 worst
>>> case that is O(N * log N) if the pivot selection is O(1)
>>
>>That needs re-phrasing as well.
>
> Yes, my mistake. I think I am saying that for all quicksort
> algorithm where the pivot selection is O(1), there exists a family
> of distributions as N tends to infinity such that sorting is O(N*N).

Sorry to bang on about it, but O(f) is an upper bound. Let's assume
that the worst input case provokes quicksort into using no more than
n*log(n) comparisons. It would *still* be correct to describe it as
O(n*n) because very f in O(n*log(n)) is also in O(n*n).

>>First off, did you mean O(N * log N) there? I suspect you want to say
>>that with O(1) pivot selection all quicksorts are Theta(n * n).
>
> Yes. I have just been thinking about the existence of a lower bound.
> But it is only applied for the distributions that blow up.
> Otherwise it is not a lower bound.

If all you want is to express the lower bound, use Omega instead of
Theta.

<snip>
--
Ben.

Stephen Howe

unread,

Jan 29, 2010, 6:16:12 PM1/29/10

to

>Sorry to bang on about it, but O(f) is an upper bound. Let's assume
>that the worst input case provokes quicksort into using no more than
>n*log(n) comparisons. It would *still* be correct to describe it as
>O(n*n) because very f in O(n*log(n)) is also in O(n*n).

But you can also say f is also in O(n * n * n) and every f is in O(n * n * n * n)
and so on. All very true but not useful.
You want tight asymptotic upper bounds that correctly describes the behaviour for all cases.

Cheers

Stephen Howe

Ben Bacarisse

unread,

Jan 29, 2010, 7:46:10 PM1/29/10

to

Stephen Howe <sjhoweATdialDOTpipexDOTcom> writes:

This is pretty much what I've been saying, but you've managed to make
it sounds as if *I* was pushing the use of these unhelpful upper bounds!

--
Ben.

websnarf

unread,

Jan 29, 2010, 10:09:57 PM1/29/10

to

On Dec 21 2009, 10:29 am, Willem <wil...@stack.nl> wrote:
> pete wrote:
> ) Stephen Howe wrote:
>
> )> Yes, they are O(N). All those O(N) calls add up.
> )> Most pivots for Quicksort are chosen in O(1)
> )>
> )> But this is academic. If you use Musser's hybrid Introsort, it wil lbe Quicksort with O(1) pivots and Heapsort where it blows
> )> up. That guarantees O(N * lg N) performance in worst case and close to Quicksort in the average case.
> )
> ) Outside of introsort, I'm unfamiliar with any quicksort
> ) which is truly O(N * log(N)) for worst case.
>
> There is an O(N) algorithm to find the median of an array.
> Using that, you can get O(N * log(N)) quicksort.
> But that is purely academical, because with that algorithm, the
> constant factor becomes much much worse than other O(NlgN) sorts.

That's what I thought at first, but upon reflection I am not so sure.

If you arrange it right, the median of median select algorithm itself
ends up performing the exact same thing as the partitioning step does
as a side effect of trying to determine the median element. So at the
very least the partition step "comes for free". The median of median
algorithm also does operations that are close to sorting themselves.
So the array becomes closer to sorted faster than the naive
partitioning as it is doing its work. On the other hand the naive
quicksort doesn't do especially well at sorting mostly sorted
elements.

Let me be clear -- in what follows I assume you understand how the
basic median of medians of 5 SELECT algorithm works. The median of
median algorithm is a recursive algorithm where each pass reduces the
number of elements in the array by epsilon less than 20%. If you
remember your geometric series, that means the total number of element
accesses is O(n). Now in the top most pass, you only require that the
pivot be in the middle 3/5s somewhere, not the exact median. In other
words you just break the n elements into groups of 5 (each spaced ceil
(n/5) away), brute force sort the median to the middle of each, then
recursively find the real median of the medians and pick that as your
final pivot. Then just do the relevant shifts to get you 3/5 of the
partitioning "for free" (the steps leading up to this point should
make clear which 3/5 of the elements you already know the pivot-
relative comparison for), and then perform the final raw partitioning
which only needs to be applied to the final 2/5 entries whose relation
to the pivot is unknown. This would complete the "find pivot, then
split the partition into those element less than, equal to or greater
than the pivot" part of the quicksort algorithm, after which you just
recurse. Its guaranteed to be O(n*log(n)) because the partitioning is
guaranteed to be no worse than 1/5 : 4/5, while on average it will be
much better.

I think it would remain to be seen if it was truly slower, but I have
not actually measured and tried it myself. Its not obvious to me
anyways.
But someone on USENET who challenged me some years ago on this (who
claimed my quicksort code was crap but then had to backtrack when they
realized they could not seriously improve upon it) was apparently
aware of at least some of this, and it may be resolved as to what the
true state is.

The upshot in my opinion is that people like Sedgewick and Bently have
said that Quicksort is just as fast as Heapsort, but only back it up
from an academic point of view. Because their ultimate goal is to
publish papers, they might limit their analysis to that level instead
of going for the optimization I suggest above (which makes no
theoretical difference, but may make a serious practical difference.)
But I have not read their papers in too much detail either.

So the question is, does this challenge Introsort or Heapsort in the
L1-cache with the Intel compiler? Its not clear to me, and I don't
really have that much free time to go find out. Perhaps if someone
was interested they could try to go figure this out, or if this is
already known and someone could point me in the right direction that
would be great.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jon Harrop

unread,

Jan 30, 2010, 10:40:53 AM1/30/10

to

Ben Bacarisse wrote:
> I suspect you want to say that with O(1) pivot selection all quicksorts
> are Theta(n * n).

Probably not given that sorting a sequence of identical values should be
O(n) even with O(1) pivot selection.

Ben Bacarisse

unread,

Jan 30, 2010, 5:19:42 PM1/30/10

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Ben Bacarisse wrote:
>> I suspect you want to say that with O(1) pivot selection all quicksorts
>> are Theta(n * n).
>
> Probably not given that sorting a sequence of identical values should be
> O(n) even with O(1) pivot selection.

Hmm... not sure if I get your point.

The function being described is, presumably, the greatest number of
comparisons used taken over all inputs of size n. The existence of
simple-to-sort cases does not stop this function being in Theta(n*n).

--
Ben.

Jon Harrop

unread,

Jan 31, 2010, 2:04:31 AM1/31/10

to

Ben Bacarisse wrote:
> Jon Harrop <j...@ffconsultancy.com> writes:
>> Ben Bacarisse wrote:
>>> I suspect you want to say that with O(1) pivot selection all quicksorts
>>> are Theta(n * n).
>>
>> Probably not given that sorting a sequence of identical values should be
>> O(n) even with O(1) pivot selection.
>
> Hmm... not sure if I get your point.

I'd say it was Omega(1) and O(n^2).

> The function being described is, presumably, the greatest number of
> comparisons used taken over all inputs of size n.

Why the greatest?

> The existence of simple-to-sort cases does not stop this function being in
> Theta(n*n).

Unless all cases are simple-to-sort.

Ben Bacarisse

unread,

Jan 31, 2010, 12:28:07 PM1/31/10

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Ben Bacarisse wrote:
>> Jon Harrop <j...@ffconsultancy.com> writes:
>>> Ben Bacarisse wrote:
>>>> I suspect you want to say that with O(1) pivot selection all quicksorts
>>>> are Theta(n * n).
>>>
>>> Probably not given that sorting a sequence of identical values should be
>>> O(n) even with O(1) pivot selection.
>>
>> Hmm... not sure if I get your point.
>
> I'd say it was Omega(1) and O(n^2).
>
>> The function being described is, presumably, the greatest number of
>> comparisons used taken over all inputs of size n.
>
> Why the greatest?

From the context. I was trying to re-phrase what someone else seemed
to be trying to say. He wanted to state how bad a class of sort can
get using O(f) notation which, of course, does not work and I was
suggesting using a lower bound on the worst case for each size. The
original statement talked about the "worst case" which I took to be
the maximum no. of comparisons. In a sub-thread all about clarity, it
was probably not clear enough.

The simplest way to say that some function of an algorithm is worse
than O(f) is simple to state that it is not O(f), but the poster
wanted to say something stringer than that.

<snip>
--
Ben.

pete

unread,

Feb 11, 2010, 12:23:29 PM2/11/10

to

Stephen Howe wrote:
> I think I am saying that for all quicksort algorithm where the pivot selection is O(1),
> there exists a family of distributions
> as N tends to infinity such that sorting is O(N *N).

I also think that there is.
And that was the point of the paper which had been mentioned.

--
pete