how to write this with Control.Conurrent.Strategies ?

68 views
Skip to first unread message

Johannes Waldmann

unread,
Jan 24, 2012, 4:54:24 AM1/24/12
to parallel...@googlegroups.com
Dear all, I have this straightforward mergesort program
using par/pseq:

merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys ; merge xs [] = xs
merge (x:xs) (y:ys) =
if x < y then x : merge xs (y:ys)
else y : merge (x:xs) ys

split :: [a] -> ([a],[a])
split xs = splitAt ( div ( length xs ) 2 ) xs

msort :: Ord a => [a] -> [a]
msort [] = [] ; msort [x] = [x]
msort xs =
let ( here, there ) = split xs
mshere = msort here
msthere = msort there
in par ( last mshere )
$ pseq ( last msthere )
( merge mshere msthere )

and it gives reasonable speed-ups with +RTS -N1,2,4,8
(OK, going from 4 to 8 does not help).

Now - what is the Right Way to write this with strategy annotations?
(`using` rpar? ...)

I want to tell my students 1. "mergesort already *is* an inherently parallel
algorithm" and 2. "you can execute it as such - you just need to annotate
the source text".

Thanks - J.W.


Christopher Brown

unread,
Jan 24, 2012, 5:22:28 AM1/24/12
to gclhp-paral...@lo.gmane.org, parallel...@googlegroups.com
Hi Johannes,

The problem with trying to parallelise merge sort (or quick sort) is that the algorithm is heavily sequential. Quicksort is sequential at the pivot point, and merge sort at the merge point. Because of this, both algorithms really don't parallelise that well, in fact!

What kind of speedups did you get with your solution?

I played around with merge sort a few months back and came up with a solution using balanced binary trees. The advantage to this is that you eliminate as much sequential processing as you can from the sort so that it can be parallelised easier. You also need a threshold to turn off the parallelism when granularity of tasks is too low, as the overhead of creating the task and communicating with it is more than the benefit of parallelising it.

Kevin Hammond and I are in the process of writing a book on Strategies and Parallel Haskell programming in general. We will talk about merge sort in much more detail there.

I've attached a version of merge sort I remember to work *reasonably* well on 8 cores. In order to run this I remember you have to make the heap bigger, otherwise GHC just spends all its time GCing.

I hope this helps.
Chris.

instance NFData a => NFData (Tree a) where
rnf (Node l r) = rnf l `pseq` rnf r
rnf (Leaf l) = rnf l

threshold = 1000 :: Int

createTree l n | n < threshold = (Leaf h, t)


| otherwise = (Node lists1 lists2, rest2)
where
(lists1, rest1) = createTree l n'
(lists2, rest2) = createTree rest1 n''
n' = n `div` 2
n'' = n - n'
(h,t) = splitAt n l

mergeSort (Leaf a) = sort a
mergeSort (Node x y)
= let x2 = mergeSort x
y2 = mergeSort y
r1 = merge x2 y2
in r1 `using` strat x2 y2
where strat x2 y2 res = x2 `par` rnf y2 `pseq` return res

merge :: Ord a => [a] -> [a] -> [a]

merge x [] = x
merge [] x = x


merge (x:xs) (y:ys)

| x < y = x : merge xs (y:ys)
| otherwise = y : merge (x:xs) ys

forceList [] = ()
forceList (x:xs) = x `pseq` forceList xs


size :: Int
size = 100000

depth :: Int
depth = 8

main = do
let input = take size (randomRs (0,100000) (mkStdGen 42)) :: [Int]
let (r,[]) = createTree input (length input)
pseq r (return ())
let f = mergeSort r
pseq f (return ())
putStrLn ("length of sort: " ++ show (length f))

Johannes Waldmann

unread,
Jan 24, 2012, 8:54:14 AM1/24/12
to parallel...@googlegroups.com
Christopher Brown <chrisbrown.guitar@...> writes:

> [...] merge sort at the merge point.

Yes. - My focus (for this example) is "mild" parallelism,
that is, just enough to keep 4 .. 8 cores busy -
contrary to "massive" parallelism where the number of cores
would be comparable to the number of elements to be processed.

The topmost merge is Theta(n). This is only 1/log(n)
of the total work, so it will not dominate the execution time -
and the loss in concurrency (because other cores are idle
when we execute the final merge) should not hurt too much either?

Best regards, Johannes.

PS: My complete code for this example is here:
http://www.imn.htwk-leipzig.de/~waldmann/edu/ss11/skpp/code/haskell/msort/
and there are some measurements at the start of the file

PS: I would be interested in obtaining a draft of your book (chapter)

Ryan Newton

unread,
Jan 26, 2012, 1:03:02 PM1/26/12
to chrisbro...@googlemail.com, gclhp-paral...@lo.gmane.org, parallel...@googlegroups.com
The problem with trying to parallelise merge sort (or quick sort) is that the algorithm is heavily sequential. Quicksort is sequential at the pivot point, and merge sort at the merge point. 

Actually, it is possible to do a parallel divide-and-conquer for the merge phase too.

Take a look at "cilkmerge" in this Cilk sorting code:


They cite a 1987 paper.

Cheers,
  -Ryan

 

Patrick Maier

unread,
Jan 27, 2012, 8:06:42 PM1/27/12
to gclhp-paral...@m.gmane.org, parallel...@googlegroups.com
Hi Johannes,

no one's appeared to answer your actual question about how to write
beautiful code with Strategies. So here's my attempt - I'll only
repeat the non-trivial case of 'merge', that's the only place that
needs change.


import Control.Parallel.Strategies

msort xs =
let ( here, there ) = split xs

mshere = msort here `using` rpar `dot` spineList
msthere = msort there `using` spineList
in merge mshere msthere

spineList :: Strategy [a]
spineList = evalList r0

What do the 'using' annotations do? Two things. For one, they force
the spines of the lists mshere and msthere (thereby forcing complete
evaluation of their thunks) as soon as their first elements are
demanded. Plus, the "rpar `dot`" sparks the forcing of the spine of
mshere in parallel.

I get about the same speedups as your original version - it's not the
ideal showcase for parallelism. And it generates way too many sparks -
you might want to introduce a threshold in the merge function to limit
parallelism to the first few levels in the call tree.

There's one other drawback: The above version thrashes through more
memory than the original one, and it requires more stack space (linear
in the size of the lists). This is because spineList isn't tail
recursive (see Section 4.7 of the 'Seq no more' paper). The remedy is
to replace spineList with a lifted version of a tail-recursive
sequential Strategy.


import qualified Control.Seq as Seq

spineList :: Strategy [a]
spineList = evalSeq (Seq.seqList Seq.r0) -- Seq.seqList Seq.r0 is
tail-recursive


This version has about the same runtime behaviour as the previous one
but memory use appears to be roughly where your original version was.

Cheers,
Patrick

PS: I'd be keen to hear how your students got along with Strategies.

Simon Marlow

unread,
Jan 30, 2012, 4:59:09 AM1/30/12
to c.patri...@gmail.com, gclhp-paral...@m.gmane.org, parallel...@googlegroups.com
On 28/01/2012 01:06, Patrick Maier wrote:
> Hi Johannes,
>
> no one's appeared to answer your actual question about how to write
> beautiful code with Strategies. So here's my attempt - I'll only
> repeat the non-trivial case of 'merge', that's the only place that
> needs change.
>
>
> import Control.Parallel.Strategies
>
> msort xs =
> let ( here, there ) = split xs
> mshere = msort here `using` rpar `dot` spineList
> msthere = msort there `using` spineList
> in merge mshere msthere
>
> spineList :: Strategy [a]
> spineList = evalList r0

And here's a version using monad-par:

msort :: Ord a => [a] -> Par [a]
msort [] = return []; msort [x] = return [x]
msort xs = do
let !( as, bs ) = split xs
av <- spawn_ (liftM spineList $ msort as)
bv <- spawn_ (liftM spineList $ msort bs)
a <- get av
b <- get bv
return (merge a b)

use Patrick's spineList, or define your own like this:

spineList :: [a] -> [a]
spineList xs = go xs `seq` xs
where go [] = ()
go (x:xs) = go xs

Cheers,
Simon

Johannes Waldmann

unread,
Jan 30, 2012, 1:55:14 PM1/30/12
to parallel...@googlegroups.com
Patrick Maier <c.patrick.maier@...> writes:

> msort xs =
> let ( here, there ) = split xs
> mshere = msort here `using` rpar `dot` spineList
> msthere = msort there `using` spineList
> in merge mshere msthere

Thanks, that is exactly what I wanted.

> I get about the same speedups as your original version - it's not the
> ideal showcase for parallelism.

I think it is - it does illustrate the method of parallelization
by annotations ONLY (if you drop these, you get the original program -
not some transformed or monadified version) so it is fine for teaching.

Thanks - J.W.


Reply all
Reply to author
Forward
0 new messages