4 views

Skip to first unread message

Aug 28, 1995, 3:00:00 AM8/28/95

to

Bob Buckley <cis...@icsea.levels.unisa.edu.au> wrote:

>I've been playing with sorting algorithms using Gofer. Here are some of

>my results. nmsort at the bottom seems to be the best overall!

>...

>I've been playing with sorting algorithms using Gofer. Here are some of

>my results. nmsort at the bottom seems to be the best overall!

>...

When I saw this subject line, I was expecting something similar to

Simon B Jones' recent "How useful is laziness?" post, i.e. how useful is

laziness in sorting? Most of the time, laziness plays no role -- the language

might as well be eager. However, there is an interesting case where laziness

*does* play an important role: when only some prefix (of unknown size) of the

resulting sorted list will be needed. A good example of when this happens

is Kruskal's minimal spanning tree algorithm. First you sort the edges, then

you process the edges (in sorted order) only until the resulting graph

is connected. This will usually require processing only some small fraction

of the edges.

So, let me rephrase the sorting problem: given n elements, generate the sorted

list containing the k smallest elements. Furthermore, suppose you don't know

k ahead of time. Now, how would you program a sorting routine to take

advantage of the fact that k may be much less than n? Ideally, we would like

an algorithm that ran in O(n + k log n) time.

It turns out that many sorting routines "just work right" under lazy

evaluation, especially quicksort and heapsort (but see below). Interestingly,

only heapsort seems suitable for the same problem under eager evaluation.

Functional heaps have been described several times in this newsgroup in

recent months. However, most of these posts described top-down

heap construction. Note that to support the desired O(n + k log n) time, it

is better to build the initial heap bottom-up than top-down. Top-down heap

construction (i.e. repeated inserts) requires O(n log n) time whereas

bottom-up heap construction requires only O(n) time.

This might be written in Gofer as

>data Heap a = Empty | Node a (Heap a) (Heap a)

>

>heapify :: Ord a => [a] -> Heap a

>heapify xs = fst (h (length xs) xs)

> where h 0 xs = (Empty, xs)

> h n (x:xs) = (siftDown x a b, zs)

> where m = n `div` 2

> (a, ys) = h m xs

> (b, zs) = h (n-m-1) ys

>

>siftDown :: Ord a => a -> Heap a -> Heap a -> Heap a

>siftDown x Empty Empty = Node x Empty Empty

>siftDown x (Node y a b) Empty

> | x <= y = Node x (Node y a b) Empty

> | y <= x = Node y (siftDown x a b) Empty

>siftDown x (Node y a b) (Node z c d)

> | x <= y && x <= z = Node x (Node y a b) (Node z c d)

> | y <= x && y <= z = Node y (siftDown x a b) (Node z c d)

> | z <= x && z <= y = Node z (Node y a b) (siftDown x c d)

(Note that I have written the above more for clarity than raw speed.

There is lots of room for improvement here...)

Chris

--

-------------------------------------------------------------------------------

| Chris Okasaki | Life is NOT a zero-sum game! |

| coka...@cs.cmu.edu | http://foxnet.cs.cmu.edu/people/cokasaki/home.html |

-------------------------------------------------------------------------------

Aug 29, 1995, 3:00:00 AM8/29/95

to

coka...@cs.cmu.edu (Chris Okasaki) writes:

> ...

> ...

>So, let me rephrase the sorting problem: given n elements, generate the sorted

>list containing the k smallest elements. Furthermore, suppose you don't know

>k ahead of time. Now, how would you program a sorting routine to take

>advantage of the fact that k may be much less than n? Ideally, we would like

>an algorithm that ran in O(n + k log n) time.

>It turns out that many sorting routines "just work right" under lazy

>evaluation, especially quicksort and heapsort (but see below).

...

<nifty sorting algorithm deleted>

The usual versions of functional quicksort are quadratic in the worst

case. Nevertheless, one might hope/expect that it takes O(nk) to

compute the first k elements. But in fact it is still quadratic in n

in the worst case, even for k=1 (A rigorous analysis of this

particular example is found in [*]). Did you have an "average"

case-analysis in mind? A better example would be mergesort [**],

which I believe has the desired O(n + k log n). I don't know what it

is like in practice.

Cheers,

Dave.

[*]

@Article{Sands:Naive,

semno = "D-173",

author = "D. Sands",

title = "{A Na\"{\i}ve Time Analysis and its

Theory of Cost Equivalence}",

journal = "The Journal of Logic and Computation",

year = "199X",

publisher = "Oxford University Press",

note = "Accepted, to appear

(Preliminary version available as TOPPS report D-173, 1993,

ftp://ftp.diku.dk/diku/semantics/papers/D-173.ps.Z

)",

[**] This is the kind of mergesort I had in mind (some old bit of Miranda(TM)

that I dug up). There is nothing special about it.

mergesort x

= ms x (#x)

where

ms y n = y, if n <= 1 || invariant: length y = n

= merge sort_floor sort_ceil, otherwise

where

k = n div 2

sort_floor = ms us k

sort_ceil = ms vs (k + (n mod 2))

(us,vs) = split k y || invariant: k <= length y

where split 0 ys = ([],ys)

split (m+1) (y:ys) = (y:u, v)

where (u,v) = split m ys

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

David Sands Email: da...@diku.dk

DIKU, University of Copenhagen Phone: +45 35 321 408

Universitetsparken 1 fax: +45 35 321 401

DK-2100 Copenhagen East, DENMARK W3: http://www.diku.dk/users/dave/

Aug 29, 1995, 3:00:00 AM8/29/95

to

I <coka...@cs.cmu.edu> wrote:

>So, let me rephrase the sorting problem: given n elements, generate the sorted

>list containing the k smallest elements. Furthermore, suppose you don't know

>k ahead of time. Now, how would you program a sorting routine to take

>advantage of the fact that k may be much less than n? Ideally, we would like

>an algorithm that ran in O(n + k log n) time.

>

>It turns out that many sorting routines "just work right" under lazy

>evaluation, especially quicksort and heapsort (but see below).

>So, let me rephrase the sorting problem: given n elements, generate the sorted

>list containing the k smallest elements. Furthermore, suppose you don't know

>k ahead of time. Now, how would you program a sorting routine to take

>advantage of the fact that k may be much less than n? Ideally, we would like

>an algorithm that ran in O(n + k log n) time.

>

>It turns out that many sorting routines "just work right" under lazy

>evaluation, especially quicksort and heapsort (but see below).

David Sands <da...@diku.dk> wrote:

>The usual versions of functional quicksort are quadratic in the worst

>case. Nevertheless, one might hope/expect that it takes O(nk) to

>compute the first k elements. But in fact it is still quadratic in n

>in the worst case, even for k=1 (A rigorous analysis of this

>particular example is found in [*]). Did you have an "average"

>case-analysis in mind? A better example would be mergesort [**],

>which I believe has the desired O(n + k log n). I don't know what it

>is like in practice.

You're right, I should have explicitly mentioned that the usual caveats

about quicksort being quadratic in the worst case apply. (While

it is certainly possible to use a linear-time median finding algorithm

to choose the pivot -- making quicksort O(n log n) in the worst case --

no one ever does.)

Still, it is very easy to visualize the behavior of quicksort when

only some prefix of the resulting list is demanded. What happens is that

within "most" of the calls to quicksort, the second recursive call is

ignored.

Mergesort does indeed also have the desired O(n + k log n) behavior, but it

is much harder to visualize. Consider the following picture that illustrates

what might happen for k=4:

1:2:3:4:_

/ \

1:3:7:_ 2:4:_

/ \ / \

1:3:nil 7:_ 2:4:_ 5:_

| | | | | | | |

1 3 7 6 2 4 8 5

Each interior node represents a call to merge. The underscores indicate the

portions of the merged lists that are never demanded (and hence are never

computed). Each of the first k elements participate in O(log n) of the

interior nodes, accounting for O(k log n) time. In addition, almost every

interior node generates one extra element, accounting for an extra O(n).

This, together with the O(n) overhead in splitting the orginial list and

setting up the calls to merge in the first place yields a total of

O(n + k log n).

Chris

Aug 29, 1995, 3:00:00 AM8/29/95

to

cis...@icsea.levels.unisa.edu.au (Bob Buckley) writes:

|--

|I've been playing with sorting algorithms using Gofer. Here are some of

|my results. nmsort at the bottom seems to be the best overall!

[Stuff deleted.]

|-- natural merge-sort (see Knuth vol. 2, like Paulson's SAM-sort)

|-- this versions of runs seems to perform very well in Gofer

|-- and may suit foldr optimisation in GHC-0.26?

|-- in Gofer, this beats all the other sorting algorithms I've tested.

|runs xs = foldr op [] xs

| where op z xss@(xs@(x:_):xss') | z<=x = (z:xs):xss'

| | otherwise = [z]:xss

| op z xss = [z]:xss

|nmsort :: (Ord a) => [a] -> [a]

|nmsort [] = [] -- (foldb f []) is undefined

|nmsort xs = foldb merge (runs xs)

You can do a bit better than this by crossing over to insertion sort

on small input. (Well, my natural merge sort did anyway. Yours may

be quicker.) On small lists there's a lot of overhead in merging.

Bert.

----------------------------------------------------------------------

Bert Thompson a...@cs.mu.oz.au

----------------------------------------------------------------------

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu