68 views

Skip to first unread message

Jan 24, 2012, 4:54:24 AM1/24/12

to parallel...@googlegroups.com

Dear all, I have this straightforward mergesort program

using par/pseq:

using par/pseq:

merge :: Ord a => [a] -> [a] -> [a]

merge [] ys = ys ; merge xs [] = xs

merge (x:xs) (y:ys) =

if x < y then x : merge xs (y:ys)

else y : merge (x:xs) ys

split :: [a] -> ([a],[a])

split xs = splitAt ( div ( length xs ) 2 ) xs

msort :: Ord a => [a] -> [a]

msort [] = [] ; msort [x] = [x]

msort xs =

let ( here, there ) = split xs

mshere = msort here

msthere = msort there

in par ( last mshere )

$ pseq ( last msthere )

( merge mshere msthere )

and it gives reasonable speed-ups with +RTS -N1,2,4,8

(OK, going from 4 to 8 does not help).

Now - what is the Right Way to write this with strategy annotations?

(`using` rpar? ...)

I want to tell my students 1. "mergesort already *is* an inherently parallel

algorithm" and 2. "you can execute it as such - you just need to annotate

the source text".

Thanks - J.W.

Jan 24, 2012, 5:22:28 AM1/24/12

to gclhp-paral...@lo.gmane.org, parallel...@googlegroups.com

Hi Johannes,

The problem with trying to parallelise merge sort (or quick sort) is that the algorithm is heavily sequential. Quicksort is sequential at the pivot point, and merge sort at the merge point. Because of this, both algorithms really don't parallelise that well, in fact!

What kind of speedups did you get with your solution?

I played around with merge sort a few months back and came up with a solution using balanced binary trees. The advantage to this is that you eliminate as much sequential processing as you can from the sort so that it can be parallelised easier. You also need a threshold to turn off the parallelism when granularity of tasks is too low, as the overhead of creating the task and communicating with it is more than the benefit of parallelising it.

Kevin Hammond and I are in the process of writing a book on Strategies and Parallel Haskell programming in general. We will talk about merge sort in much more detail there.

I've attached a version of merge sort I remember to work *reasonably* well on 8 cores. In order to run this I remember you have to make the heap bigger, otherwise GHC just spends all its time GCing.

I hope this helps.

Chris.

instance NFData a => NFData (Tree a) where

rnf (Node l r) = rnf l `pseq` rnf r

rnf (Leaf l) = rnf l

threshold = 1000 :: Int

createTree l n | n < threshold = (Leaf h, t)

| otherwise = (Node lists1 lists2, rest2)

where

(lists1, rest1) = createTree l n'

(lists2, rest2) = createTree rest1 n''

n' = n `div` 2

n'' = n - n'

(h,t) = splitAt n l

mergeSort (Leaf a) = sort a

mergeSort (Node x y)

= let x2 = mergeSort x

y2 = mergeSort y

r1 = merge x2 y2

in r1 `using` strat x2 y2

where strat x2 y2 res = x2 `par` rnf y2 `pseq` return res

merge :: Ord a => [a] -> [a] -> [a]

merge x [] = x

merge [] x = x

merge (x:xs) (y:ys)

| x < y = x : merge xs (y:ys)

| otherwise = y : merge (x:xs) ys

forceList [] = ()

forceList (x:xs) = x `pseq` forceList xs

size :: Int

size = 100000

depth :: Int

depth = 8

main = do

let input = take size (randomRs (0,100000) (mkStdGen 42)) :: [Int]

let (r,[]) = createTree input (length input)

pseq r (return ())

let f = mergeSort r

pseq f (return ())

putStrLn ("length of sort: " ++ show (length f))

Jan 24, 2012, 8:54:14 AM1/24/12

to parallel...@googlegroups.com

Christopher Brown <chrisbrown.guitar@...> writes:

> [...] merge sort at the merge point.

Yes. - My focus (for this example) is "mild" parallelism,

that is, just enough to keep 4 .. 8 cores busy -

contrary to "massive" parallelism where the number of cores

would be comparable to the number of elements to be processed.

The topmost merge is Theta(n). This is only 1/log(n)

of the total work, so it will not dominate the execution time -

and the loss in concurrency (because other cores are idle

when we execute the final merge) should not hurt too much either?

Best regards, Johannes.

PS: My complete code for this example is here:

http://www.imn.htwk-leipzig.de/~waldmann/edu/ss11/skpp/code/haskell/msort/

and there are some measurements at the start of the file

PS: I would be interested in obtaining a draft of your book (chapter)

Jan 26, 2012, 1:03:02 PM1/26/12

to chrisbro...@googlemail.com, gclhp-paral...@lo.gmane.org, parallel...@googlegroups.com

The problem with trying to parallelise merge sort (or quick sort) is that the algorithm is heavily sequential. Quicksort is sequential at the pivot point, and merge sort at the merge point.

Actually, it is possible to do a parallel divide-and-conquer for the merge phase too.

Take a look at "cilkmerge" in this Cilk sorting code:

They cite a 1987 paper.

Cheers,

-Ryan

Jan 27, 2012, 8:06:42 PM1/27/12

to gclhp-paral...@m.gmane.org, parallel...@googlegroups.com

Hi Johannes,

no one's appeared to answer your actual question about how to write

beautiful code with Strategies. So here's my attempt - I'll only

repeat the non-trivial case of 'merge', that's the only place that

needs change.

import Control.Parallel.Strategies

msort xs =

let ( here, there ) = split xs

mshere = msort here `using` rpar `dot` spineList

msthere = msort there `using` spineList

in merge mshere msthere

spineList :: Strategy [a]

spineList = evalList r0

What do the 'using' annotations do? Two things. For one, they force

the spines of the lists mshere and msthere (thereby forcing complete

evaluation of their thunks) as soon as their first elements are

demanded. Plus, the "rpar `dot`" sparks the forcing of the spine of

mshere in parallel.

I get about the same speedups as your original version - it's not the

ideal showcase for parallelism. And it generates way too many sparks -

you might want to introduce a threshold in the merge function to limit

parallelism to the first few levels in the call tree.

There's one other drawback: The above version thrashes through more

memory than the original one, and it requires more stack space (linear

in the size of the lists). This is because spineList isn't tail

recursive (see Section 4.7 of the 'Seq no more' paper). The remedy is

to replace spineList with a lifted version of a tail-recursive

sequential Strategy.

import qualified Control.Seq as Seq

spineList :: Strategy [a]

spineList = evalSeq (Seq.seqList Seq.r0) -- Seq.seqList Seq.r0 is

tail-recursive

This version has about the same runtime behaviour as the previous one

but memory use appears to be roughly where your original version was.

Cheers,

Patrick

PS: I'd be keen to hear how your students got along with Strategies.

Jan 30, 2012, 4:59:09 AM1/30/12

to c.patri...@gmail.com, gclhp-paral...@m.gmane.org, parallel...@googlegroups.com

On 28/01/2012 01:06, Patrick Maier wrote:

> Hi Johannes,

>

> no one's appeared to answer your actual question about how to write

> beautiful code with Strategies. So here's my attempt - I'll only

> repeat the non-trivial case of 'merge', that's the only place that

> needs change.

>

>

> import Control.Parallel.Strategies

>

> msort xs =

> let ( here, there ) = split xs

> mshere = msort here `using` rpar `dot` spineList

> msthere = msort there `using` spineList

> in merge mshere msthere

>

> spineList :: Strategy [a]

> spineList = evalList r0

> Hi Johannes,

>

> no one's appeared to answer your actual question about how to write

> beautiful code with Strategies. So here's my attempt - I'll only

> repeat the non-trivial case of 'merge', that's the only place that

> needs change.

>

>

> import Control.Parallel.Strategies

>

> msort xs =

> let ( here, there ) = split xs

> mshere = msort here `using` rpar `dot` spineList

> msthere = msort there `using` spineList

> in merge mshere msthere

>

> spineList :: Strategy [a]

> spineList = evalList r0

And here's a version using monad-par:

msort :: Ord a => [a] -> Par [a]

msort [] = return []; msort [x] = return [x]

msort xs = do

let !( as, bs ) = split xs

av <- spawn_ (liftM spineList $ msort as)

bv <- spawn_ (liftM spineList $ msort bs)

a <- get av

b <- get bv

return (merge a b)

use Patrick's spineList, or define your own like this:

spineList :: [a] -> [a]

spineList xs = go xs `seq` xs

where go [] = ()

go (x:xs) = go xs

Cheers,

Simon

Jan 30, 2012, 1:55:14 PM1/30/12

to parallel...@googlegroups.com

Patrick Maier <c.patrick.maier@...> writes:

> msort xs =

> let ( here, there ) = split xs

> mshere = msort here `using` rpar `dot` spineList

> msthere = msort there `using` spineList

> in merge mshere msthere

Thanks, that is exactly what I wanted.

> I get about the same speedups as your original version - it's not the

> ideal showcase for parallelism.

I think it is - it does illustrate the method of parallelization

by annotations ONLY (if you drop these, you get the original program -

not some transformed or monadified version) so it is fine for teaching.

Thanks - J.W.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu