[Haskell-beginners] Performance of parallel mergesort

Jon Harrop

unread,

Dec 23, 2009, 3:00:30 PM12/23/09

to begi...@haskell.org

I took the parallel merge sort described here:

http://martin.sulzmann.googlepages.com/AMP-Spring2008-intro.pdf

and added a simple test:

main :: IO ()
main = do
putStrLn $ show $ sum $ mergesort $ [sin x | x <- [1.0 .. 1000000.0]]

Compiled it with GHC 6.8.2 from Debian testing and ran it with various
+RTS -N<n> arguments to measure its performance and obtained the following
results on an 8 core:

$ ghc --make -O2 -threaded mergesort.hs -o mergesort
[1 of 1] Compiling Main ( mergesort.hs, mergesort.o )
Linking mergesort ...
$ time ./mergesort +RTS -N1
-0.117109518058233

real 0m9.723s
user 0m9.461s
sys 0m0.232s
$ time ./mergesort +RTS -N2
-0.117109518058233

real 0m13.574s
user 0m15.225s
sys 0m0.140s
$ time ./mergesort +RTS -N3
-0.117109518058233

real 0m13.185s
user 0m15.529s
sys 0m0.184s
$ time ./mergesort +RTS -N4
-0.117109518058233

real 0m13.251s
user 0m15.829s
sys 0m0.336s
$ time ./mergesort +RTS -N5
-0.117109518058233

real 0m13.093s
user 0m16.929s
sys 0m0.164s
$ time ./mergesort +RTS -N6
Segmentation fault

real 0m5.711s
user 0m7.408s
sys 0m0.136s

That segfault must be due to a bug in GHC so I thought perhaps a newer version
of GHC might fix the segfault but I installed GHC 6.10.4 from Sid and now I
get:

$ ghc --make -O2 -threaded mergesort.hs -o mergesort
Binary: Int64 truncated to fit in 32 bit Int
ghc: panic! (the 'impossible' happened)
(GHC version 6.10.4 for i386-unknown-linux):
Prelude.chr: bad argument

Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug

I'll try the haskell-platform package next. Are these known problems?

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
_______________________________________________
Beginners mailing list
Begi...@haskell.org
http://www.haskell.org/mailman/listinfo/beginners

Antoine Latter

unread,

Dec 23, 2009, 7:16:35 PM12/23/09

to Jon Harrop, begi...@haskell.org

Hi Jon,

You're more likely to get a useful response over on the
glasgow-haskell-users mailing list, as it looks like this is a GHC
specific bug: http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Running this at -N6 doesn't want to complete for me, so I'm not sure
if it would segfault. I'm on a 64-bit Intel Mac OS X using GHC 6.12,
dual core.

With such a small example as this your best bet would be to file a bug report:

http://hackage.haskell.org/trac/ghc/

You'll need to use the login guest/guest to file a new ticket.

Antoine

Antoine Latter

unread,

Dec 23, 2009, 7:22:12 PM12/23/09

to Jon Harrop, begi...@haskell.org

On Thu, Dec 24, 2009 at 12:16 AM, Antoine Latter <asla...@gmail.com> wrote:
> Hi Jon,
>
> You're more likely to get a useful response over on the
> glasgow-haskell-users mailing list, as it looks like this is a GHC
> specific bug: http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
> Running this at -N6 doesn't want to complete for me, so I'm not sure
> if it would segfault. I'm on a 64-bit Intel Mac OS X using GHC 6.12,
> dual core.
>
> With such a small example as this your best bet would be to file a bug report:
>
> http://hackage.haskell.org/trac/ghc/
>
> You'll need to use the login guest/guest to file a new ticket.
>
> Antoine
>

Here's a paste-bin link for the code in question:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=14871#a14871

Antoine Latter

unread,

Dec 23, 2009, 7:23:17 PM12/23/09

to Jon Harrop, begi...@haskell.org

On Thu, Dec 24, 2009 at 12:22 AM, Antoine Latter <asla...@gmail.com> wrote:
> On Thu, Dec 24, 2009 at 12:16 AM, Antoine Latter <asla...@gmail.com> wrote:
>> Hi Jon,
>>
>> You're more likely to get a useful response over on the
>> glasgow-haskell-users mailing list, as it looks like this is a GHC
>> specific bug: http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>
>> Running this at -N6 doesn't want to complete for me, so I'm not sure
>> if it would segfault. I'm on a 64-bit Intel Mac OS X using GHC 6.12,
>> dual core.
>>
>> With such a small example as this your best bet would be to file a bug report:
>>
>> http://hackage.haskell.org/trac/ghc/
>>
>> You'll need to use the login guest/guest to file a new ticket.
>>
>> Antoine
>>
>
> Here's a paste-bin link for the code in question:
> http://hpaste.org/fastcgi/hpaste.fcgi/view?id=14871#a14871
>

-N6 finally completed. I'm using GHC 6.12 on Inter, dual-core
mac-book. I'm not sure how comparable my setup is to what you tried.

Antoine

Jon Harrop

unread,

Dec 23, 2009, 8:10:21 PM12/23/09

to begi...@haskell.org

For some reason all traffic from this beginners list stopped reaching me after
the 13th of December.

Thanks for the response, Antoine. I've tried to install more recent versions
of GHC (6.10 and 6.12) but I cannot get either of them to work at all.

GHC 6.8 generates code that segfaults and the GHC 6.10 from Debian testing
cannot compile anything for me. So I thought I'd try GHC 6.12. That isn't in
any repo so I decided to build it from source. I uninstalled all other GHCs
to give it a fresh start. The GHC page says "Stop, install the Haskell
Platform" but the Haskell Platform says you must install GHC first, so I've
got a nice circular dependency right from the start!

So I tried installing GHC 6.12 from source but that requires GHC to be
installed. So I installed GHC 6.8 (the one that segfaults) using apt again
from Debian. That seems to have built a GHC 6.12 that I can run from the
command line but it cannot compile that program because it doesn't have
parallel stuff. So I thought I'd install the Haskell Platform.

Unfortunately, the Haskell Platform configure script gives the nonsensical
error that I must "upgrade" from GHC 6.12 down to GHC 6.10. I was getting
pretty run down by this point so I just told it to sod off using
the --enable-unsupported-ghc-version command line option. The Haskell
Platform's configure script then completed but building it failed with:

[21 of 21] Compiling Graphics.UI.GLUT ( Graphics/UI/GLUT.hs,
dist/build/Graphics/UI/GLUT.p_o )
Registering GLUT-2.1.1.2...
Setup: GLUT-2.1.1.2: dependency
"OpenGL-2.2.1.1-182b091280ce0de861295bc592bae77c" doesn't exist (use --force
to override)

Error:
Building the GLUT-2.1.1.2 package failed
make: *** [build.stamp] Error 2

I have glut and use it all the time from OCaml without a hitch so I've no idea
what the problem is here.

Oh well, I just discovered that installing GHC 6.12 has at least fixed GHC
6.10 so it can now compile and run that mergesort. Oh FFS, spoke too soon:

$ time ./mergesort +RTS -N8
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize' to increase it.

real 0m38.801s
user 4m0.851s
sys 0m1.308s

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Antoine Latter

unread,

Dec 23, 2009, 9:55:44 PM12/23/09

to Jon Harrop, begi...@haskell.org

On Thu, Dec 24, 2009 at 2:24 AM, Jon Harrop <j...@ffconsultancy.com> wrote:
>
> For some reason all traffic from this beginners list stopped reaching me after
> the 13th of December.
>
> Thanks for the response, Antoine. I've tried to install more recent versions
> of GHC (6.10 and 6.12) but I cannot get either of them to work at all.
>
> GHC 6.8 generates code that segfaults and the GHC 6.10 from Debian testing
> cannot compile anything for me. So I thought I'd try GHC 6.12. That isn't in
> any repo so I decided to build it from source. I uninstalled all other GHCs
> to give it a fresh start. The GHC page says "Stop, install the Haskell
> Platform" but the Haskell Platform says you must install GHC first, so I've
> got a nice circular dependency right from the start!
>
> So I tried installing GHC 6.12 from source but that requires GHC to be
> installed. So I installed GHC 6.8 (the one that segfaults) using apt again
> from Debian. That seems to have built a GHC 6.12 that I can run from the
> command line but it cannot compile that program because it doesn't have
> parallel stuff. So I thought I'd install the Haskell Platform.
>

The Haskell Platform is currently the recommended way to go, but it
won't be up on GHC 6.12 until some time in the new year (I don't know
their timelines).

For me, I didn't have a working GHC 6.12 setup until this morning. The
compiler is ready, but we're still working on moving the libraries
over to the new major version.

Maybe compiling your own 6.10.4 would be the best choice until the
next platform release.

If you're willing to dive in, the version of cabal-install that works
for GHC 6.12.1 is out now, but it does require some boot-strapping to
get going, as GHC no longer ships libraries that aren't required to
build GHC.

But back to the GHC panic, it might be worth posting that error to the
GHC-users list I mentioned earlier.

Antoine

Tom Davie

unread,

Dec 24, 2009, 4:09:56 AM12/24/09

to Jon Harrop, begi...@haskell.org

The mergesort example given here has a rather nasty problem ï¿½ it creates
sparks for mergeing the two lists at the same time as it creates the lists,
so there's always at least one task blocking waiting for it's producers.

This variation on a theme goes roughly twice as fast for me with -N1, and
*does* get a speedup from -N<somethinghigher>:

mergesort [] = []
mergesort [x] = [x]
mergesort [x,y] = if x < y then [x,y] else [y,x]
mergesort xs = (forceList sleft) `par`
(forceList sright) `pseq`
merge sleft sright
where
(left,right) = split (length xs `div` 2) xs
sleft = mergesort left
sright = mergesort right

Note the `pseq` not `par` before the merge. I also added one more
short-circuit case ï¿½ we want to avoid spawning threads for tiny tasks like
sorting 1 element lists.

Here's my results on a dual core running OS X:

LappyBob:~ tatd2$ time ./test +RTS -N1
-0.117109518058233

real 0m4.608s
user 0m4.400s
sys 0m0.189s
LappyBob:~ tatd2$ time ./test +RTS -N2
-0.117109518058233

real 0m3.648s
user 0m6.360s
sys 0m0.220s
LappyBob:~ tatd2$ time ./test +RTS -N3
-0.117109518058233

real 0m50.679s
user 1m24.235s
sys 0m0.620s

The last result is still off the wall, but it's a lot better.

Bob

On Thu, Dec 24, 2009 at 2:24 AM, Jon Harrop <j...@ffconsultancy.com> wrote:

Chaddaï Fouché

unread,

Dec 24, 2009, 8:24:52 AM12/24/09

to Tom Davie, begi...@haskell.org

On Thu, Dec 24, 2009 at 10:09 AM, Tom Davie <tom....@gmail.com> wrote:
> The mergesort example given here has a rather nasty problem – it creates

> sparks for mergeing the two lists at the same time as it creates the lists,
> so there's always at least one task blocking waiting for it's producers.

Don't worry too much about this thread, here is why :

- Jon Harrop is definitely not a beginner in Haskell land and has no
business asking this question on this list
- Jon Harrop is very well known on established functional programming
mailing lists and forum as a very disagreeable troll (not only on
Haskell). He love to trash whatever language hasn't his favours at the
moment (I think right now he likes F#, which is a very interesting
language nonetheless).
- He apparently made an effort to include approximately all elements
that could hurt Haskell reputation in this thoughtful gift to the
community in this Christmas period :
** As you pointed the initial code he chose doesn't parallelize very
well basically
** He tried to compile it with GHC 6.8 which was an important step
towards better internals to handle parallelism in GHC but was also a
time of big upheaval in this field and so not as stable as one could
wish.
** He then installed GHC 6.10 and did not even clean up his previous
compilations attempts before declaring that it didn't work (the error
message he cited is usually observed when GHC encounters an interface
file created by a previous version of GHC or on another architecture)
at all without testing on some other clean sources.
** He then apparently decided that using a binary package for the 6.12
version was beneath its dignity and went on to install it from source
(generally not recommended except for GHC developers that needs to
work on the code or for port to unusual architectures) and claimed
that there was a "circularity problem" since its 6.10 could not
compile it (which it obviously could, as he discovered and qualified
as "miraculous", not even considering that he had a working 6.8
before).

I could choose to believe that Harrop really is as incompetent as he
sounds and really did encounter all those problems (which are all
possibles if unlikely to be met in quick succession) but the simple
fact that he decided to post this message on the Haskell-Beginner list
instead of the Haskell-cafe list where he's already well known and an
occasional poster makes it that much more unlikely. The other
interpretation is that JdH is up to his usual tricks, just sinking to
a new low by posting in a venue where he's less likely to be
recognized and more likely to worry unwary would-be-haskell-beginners.

Merry Christmas !
--
Jedaï

Jon Harrop

unread,

Dec 24, 2009, 4:36:14 PM12/24/09

to Tom Davie, begi...@haskell.org

On Thursday 24 December 2009 09:09:41 you wrote:
> The mergesort example given here has a rather nasty problem – it creates

> sparks for mergeing the two lists at the same time as it creates the lists,
> so there's always at least one task blocking waiting for it's producers.

I see. So the sparks either contend for locks on the thunks to force that list
or they repeat the same work.

> This variation on a theme goes roughly twice as fast for me with -N1, and
> *does* get a speedup from -N<somethinghigher>:
>
> mergesort [] = []
> mergesort [x] = [x]
> mergesort [x,y] = if x < y then [x,y] else [y,x]
> mergesort xs = (forceList sleft) `par`
> (forceList sright) `pseq`
> merge sleft sright
> where
> (left,right) = split (length xs `div` 2) xs
> sleft = mergesort left
> sright = mergesort right
>
> Note the `pseq` not `par` before the merge.

Can you explain why that helps?

> I also added one more short-circuit case – we want to avoid spawning threads

> for tiny tasks like sorting 1 element lists.

This is something that concerns me. Lots of discussions of parallelism,
including the Haskell literature I have read, neglect this critical problem
of making sure that more time is spent doing useful work than spawning tasks
(sparks). How is this solved in Haskell? Do you just put magic numbers in
that work on the machine you're currently using?

Also, how much work does a Haskell thread usually have to do to make it worth
sparking?

> Here's my results on a dual core running OS X:
>
> LappyBob:~ tatd2$ time ./test +RTS -N1
> -0.117109518058233
>
> real 0m4.608s
> user 0m4.400s
> sys 0m0.189s
> LappyBob:~ tatd2$ time ./test +RTS -N2
> -0.117109518058233
>
> real 0m3.648s
> user 0m6.360s
> sys 0m0.220s
> LappyBob:~ tatd2$ time ./test +RTS -N3
> -0.117109518058233
>
> real 0m50.679s
> user 1m24.235s
> sys 0m0.620s
>
> The last result is still off the wall, but it's a lot better.

That's another question: why did you get such awful performance for N=3 and I
get similar results for N=8 on this 8 core?

Jon Harrop

unread,

Dec 24, 2009, 4:39:53 PM12/24/09

to Antoine Latter, begi...@haskell.org

On Thursday 24 December 2009 02:55:33 you wrote:
> The Haskell Platform is currently the recommended way to go, but it
> won't be up on GHC 6.12 until some time in the new year (I don't know
> their timelines).

I see, thanks.

> For me, I didn't have a working GHC 6.12 setup until this morning. The
> compiler is ready, but we're still working on moving the libraries
> over to the new major version.
>
> Maybe compiling your own 6.10.4 would be the best choice until the
> next platform release.

Is that better than using the 6.10.4 from Debian (and the haskell-platform
package)?

> If you're willing to dive in, the version of cabal-install that works
> for GHC 6.12.1 is out now, but it does require some boot-strapping to
> get going, as GHC no longer ships libraries that aren't required to
> build GHC.

Yeah, I tried installing cabal from source as well but couldn't get it to work
either.

> But back to the GHC panic, it might be worth posting that error to the
> GHC-users list I mentioned earlier.

I cannot reproduce it with GHC 6.10 so it has probably long since been fixed
(6.8 is two years old!).

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Jon Harrop

unread,

Dec 24, 2009, 6:29:32 PM12/24/09

to Chaddaï Fouché, begi...@haskell.org

On Thursday 24 December 2009 13:24:30 you wrote:
> ** As you pointed the initial code he chose doesn't parallelize very
> well basically

I Googled for "parallel mergesort haskell" and found that code in lecture
notes by an associate university professor who has pages of published papers
on his specialist topic of parallel programming in Haskell. I had no idea it
was bad code.

> ** He tried to compile it with GHC 6.8 which was an important step
> towards better internals to handle parallelism in GHC but was also a
> time of big upheaval in this field and so not as stable as one could
> wish.

I didn't know that.

> ** He then installed GHC 6.10 and did not even clean up his previous
> compilations attempts before declaring that it didn't work (the error
> message he cited is usually observed when GHC encounters an interface
> file created by a previous version of GHC or on another architecture)
> at all without testing on some other clean sources.

Looks like 6.12 fixed that problem.

> ** He then apparently decided that using a binary package for the 6.12
> version was beneath its dignity and went on to install it from source
> (generally not recommended except for GHC developers that needs to
> work on the code or for port to unusual architectures)

I had no idea that building GHC from source was such a big deal. I'll use the
binaries.

> and claimed
> that there was a "circularity problem" since its 6.10 could not
> compile it (which it obviously could, as he discovered and qualified
> as "miraculous", not even considering that he had a working 6.8
> before).

The circularity problem is that the GHC and Haskell Platform download pages
both say that you must install the other first:

"Stop! For most users, we recommend installing the Haskell Platform instead
of GHC." -
http://www.haskell.org/ghc/download_ghc_6_12_1.html

"You only need GHC installed to get started:" -
http://hackage.haskell.org/platform/

I wasn't sure how to proceed so I guess and apparently got it wrong.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Tommy M. McGuire

unread,

Dec 26, 2009, 1:11:05 PM12/26/09

to Antoine Latter, begi...@haskell.org

Antoine Latter wrote:
>> Here's a paste-bin link for the code in question:
> http://hpaste.org/fastcgi/hpaste.fcgi/view?id=14871#a14871

I am not familiar with Control.Parallel at all, so someone please correct me
if I'm wrong....

..but that parallel mergesort looks very wrong to me.

In the first place, the split is using the length of the list, which is going
to force the spine of the list, leaving a bazillion thunks unless the runtime
will evaluate the numerical value at each element.

More importantly, it seems to be running the forcelist in parallel with the
merge; forcelist is not going to do anything useful after the first pass over
each element, and the merge has a data dependency on the split sub-lists,
which is going to limit parallelism quite a bit.

Am I wrong here?

(Yep, Jedaï, I'm familiar with Jon and the reason he posted this here rather
than the cafe.)

--
Tommy M. McGuire
mcg...@crsr.net

Jon Harrop

unread,

Dec 26, 2009, 1:30:34 PM12/26/09

to Tom Davie, begi...@haskell.org

On Thursday 24 December 2009 09:09:41 Tom Davie wrote:
> Here's my results on a dual core running OS X:
>
> LappyBob:~ tatd2$ time ./test +RTS -N1
> -0.117109518058233
>
> real 0m4.608s
> user 0m4.400s
> sys 0m0.189s
> LappyBob:~ tatd2$ time ./test +RTS -N2
> -0.117109518058233
>
> real 0m3.648s
> user 0m6.360s
> sys 0m0.220s
> LappyBob:~ tatd2$ time ./test +RTS -N3
> -0.117109518058233
>
> real 0m50.679s
> user 1m24.235s
> sys 0m0.620s

Here are my results on a dual quad-core:

$ ghc-6.10.4 --make -O2 -threaded mergesort.hs -o mergesort

[1 of 1] Compiling Main ( mergesort.hs, mergesort.o )
Linking mergesort ...

jdh30@leper:~/Programs/mine/haskell/mergesort$ time ./mergesort
+RTS -K100000000 +RTS -N1
-0.117109518058233

real 0m7.664s
user 0m7.528s
sys 0m0.136s
$ time ./mergesort +RTS -K100000000 +RTS -N2
-0.117109518058233

real 0m7.224s
user 0m9.773s
sys 0m0.108s
$ time ./mergesort +RTS -K100000000 +RTS -N3
-0.117109518058233

real 0m7.293s
user 0m11.637s
sys 0m0.168s
$ time ./mergesort +RTS -K100000000 +RTS -N4
-0.117109518058233

real 0m7.333s
user 0m12.689s
sys 0m0.360s
$ time ./mergesort +RTS -K100000000 +RTS -N5
-0.117109518058233

real 0m7.449s
user 0m14.757s
sys 0m0.548s
$ time ./mergesort +RTS -K100000000 +RTS -N6
-0.117109518058233

real 0m7.043s
user 0m14.985s
sys 0m0.252s
$ time ./mergesort +RTS -K100000000 +RTS -N7
-0.117109518058233

real 0m8.018s
user 0m19.325s
sys 0m1.012s
$ time ./mergesort +RTS -K100000000 +RTS -N8
-0.117109518058233

real 0m55.963s
user 5m55.654s
sys 0m1.492s

As you can see, there is still no significant speedup from parallelism.

Jon Harrop

unread,

Dec 26, 2009, 1:42:08 PM12/26/09

to begi...@haskell.org

On Saturday 26 December 2009 18:00:49 Tommy M. McGuire wrote:
> Antoine Latter wrote:
> >> Here's a paste-bin link for the code in question:
> >
> > http://hpaste.org/fastcgi/hpaste.fcgi/view?id=14871#a14871
>
> I am not familiar with Control.Parallel at all, so someone please correct
> me if I'm wrong....
>

> ...but that parallel mergesort looks very wrong to me.

>
> In the first place, the split is using the length of the list, which is
> going to force the spine of the list, leaving a bazillion thunks unless the
> runtime will evaluate the numerical value at each element.
>
> More importantly, it seems to be running the forcelist in parallel with the
> merge; forcelist is not going to do anything useful after the first pass
> over each element, and the merge has a data dependency on the split
> sub-lists, which is going to limit parallelism quite a bit.
>
> Am I wrong here?

The parallelism is obviously not obtaining any real speedup so something is
wrong. But can anyone fix it?

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Stephen Blackheath [to Haskell-Beginners]

unread,

Dec 27, 2009, 3:57:08 PM12/27/09

to Jon Harrop, begi...@haskell.org

Jon,

Jon Harrop wrote:
> This is something that concerns me. Lots of discussions of parallelism,
> including the Haskell literature I have read, neglect this critical
problem
> of making sure that more time is spent doing useful work than spawning
tasks
> (sparks). How is this solved in Haskell? Do you just put magic numbers in
> that work on the machine you're currently using?

It is simply not true that Haskell literature neglects the question of
spark granularity - this is very basic and is always covered. Read
"Real World Haskell" (available free online). There's no 'magic
number'. You must explicitly write your code to give the right granule
size. I don't know the exact cost of sparking, but in my experience it
is quite small - so - as long as your granule is doing *some* real work,
it should speed up.

Jon Harrop wrote:
> The parallelism is obviously not obtaining any real speedup so something is
> wrong. But can anyone fix it?

I've spent a bit of time measuring parallel speedup on real commercial
projects, and this is what I've found:

1. ghc-6.12 performs significantly better than ghc-6.10, and has now
been released, therefore don't use ghc-6.10.

2. The defaults generally work better than giving huge heap sizes. Your
-K100000000 - maximum heap size per thread - will either do nothing or
cause an artificial slowdown (I have observed this with the minimum heap
size option). Don't use it, unless experimentation proves it makes
things better.

3. +RTS -s is well worth using. It breaks the time down into MUT
(mutator) and GC (garbage collector).

4. MUT parallelization is excellent, but the parallel GC is not so good.
If your code is GC-heavy it can spend around half of its time garbage
collecting, which doesn't parallelize so well, and this eats into the
speedup.

5. There seems to be a scalability problem with the parallel gc for
larger numbers of cores (namely 8). I am guessing somewhat, but my
experiments tend to confirm the issue raised in Simon Marlow's (the
implementor of GHC parallelization) recent paper that it's to do with
"stopping the world" for gc.

If GHC's lack of perfection at this point in time makes Haskell "look
bad" I don't mind. I am not selling anything, so the reader at least
knows they're getting the truth. I see this as one of the great
advantages of open source.

Progress on GHC has been very rapid in the last couple of years, and so
I know we'll continue to see the speed of GHC's parallelism improving in
leaps and bounds. It's actually still quite a new area, considering the
difficulty of some of the technical issues and how recent it is that
multicores are widely available on consumer hardware. I know you
OCaml/F# guys are making great progress too.

Steve

Jon Harrop

unread,

Dec 27, 2009, 9:05:55 PM12/27/09

to Stephen Blackheath [to Haskell-Beginners], begi...@haskell.org

On Sunday 27 December 2009 20:56:51 you wrote:
> Jon Harrop wrote:
> > This is something that concerns me. Lots of discussions of parallelism,
> > including the Haskell literature I have read, neglect this critical
> > problem
> > of making sure that more time is spent doing useful work than spawning
> > tasks
> > (sparks). How is this solved in Haskell? Do you just put magic numbers in
> > that work on the machine you're currently using?
>
> It is simply not true that Haskell literature neglects the question of
> spark granularity - this is very basic and is always covered. Read
> "Real World Haskell" (available free online). There's no 'magic
> number'. You must explicitly write your code to give the right granule
> size.

There is no "right granule" size. That's the whole point: the optimum is a
function of the machine. If you hardcode the granularity then your code isn't
future proof and isn't portable.

From chapter 24 of Real World Haskell on sorting:

"At this fine granularity, the cost of using par outweighs any possible
usefulness. To reduce this effect, we switch to our non-parallel sort after
passing some threshold."

From the Sorting.hs file, parSort2 accepts a threshold "d":

parSort2 :: (Ord a) => Int -> [a] -> [a]
parSort2 d list@(x:xs)
| d <= 0 = sort list
| otherwise = force greater `par` (force lesser `pseq`
(lesser ++ x:greater))
where lesser = parSort2 d' [y | y <- xs, y < x]
greater = parSort2 d' [y | y <- xs, y >= x]
d' = d - 1
parSort2 _ _ = []

From the SortMain.hs file, it is always invoked with the magic number "2":

testFunction = parSort2 2

Moreover, their approach of subdividing a fixed number of times is suboptimal
because it inhibits load balancing.

Later, about parallelized IO, they give the code:

chunkedReadWith :: (NFData a) =>
([LB.ByteString] -> a) -> FilePath -> IO a
chunkedReadWith func path =
withChunks (lineChunks (numCapabilities * 4)) func path

where "4" is one magic number that gets multiplied by the magic number the
user supplied via the +RTS -N<n> command-line option.

They make no attempt to adapt the granularity to the machine at all and rely
entirely upon magic numbers. Consequently, their parallel sort that got a 25%
speedup on two cores achieves a 30% slowdown on my 8 core.

> I don't know the exact cost of sparking, but in my experience it
> is quite small - so - as long as your granule is doing *some* real work,
> it should speed up.

Can you quantify it, e.g. How many FLOPS?

> Jon Harrop wrote:
> > The parallelism is obviously not obtaining any real speedup so something
> > is wrong. But can anyone fix it?
>
> I've spent a bit of time measuring parallel speedup on real commercial
> projects, and this is what I've found:
>
> 1. ghc-6.12 performs significantly better than ghc-6.10, and has now
> been released, therefore don't use ghc-6.10.

Ok.

> 2. The defaults generally work better than giving huge heap sizes. Your
> -K100000000 - maximum heap size per thread - will either do nothing or
> cause an artificial slowdown (I have observed this with the minimum heap
> size option). Don't use it, unless experimentation proves it makes
> things better.

On the contrary, the Haskell program dies without it:

$ time ./mergesort +RTS -N8
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize' to increase it.

[1]+ Done kwrite mergesort.hs

real 0m33.320s
user 3m29.397s
sys 0m0.592s

I had to add that -K command line option just to get the program to run to
completion.

> 3. +RTS -s is well worth using. It breaks the time down into MUT
> (mutator) and GC (garbage collector).

Its says 78.5% GC time (with GHC 6.10).

> 4. MUT parallelization is excellent, but the parallel GC is not so good.
> If your code is GC-heavy it can spend around half of its time garbage
> collecting, which doesn't parallelize so well, and this eats into the
> speedup.

Ok.

> 5. There seems to be a scalability problem with the parallel gc for
> larger numbers of cores (namely 8). I am guessing somewhat, but my
> experiments tend to confirm the issue raised in Simon Marlow's (the
> implementor of GHC parallelization) recent paper that it's to do with
> "stopping the world" for gc.

Do you mean this bug:

http://hackage.haskell.org/trac/ghc/ticket/3553

> If GHC's lack of perfection at this point in time makes Haskell "look
> bad" I don't mind. I am not selling anything, so the reader at least
> knows they're getting the truth. I see this as one of the great
> advantages of open source.

I'm sure we'd all rather see speedups. :-)

> Progress on GHC has been very rapid in the last couple of years, and so
> I know we'll continue to see the speed of GHC's parallelism improving in
> leaps and bounds. It's actually still quite a new area, considering the
> difficulty of some of the technical issues and how recent it is that
> multicores are widely available on consumer hardware.

My original intent was to test a claim someone made: that mergesort in Haskell
is competitively performant and trivially parallelized.

> I know you OCaml/F# guys are making great progress too.

F# is production ready but OCaml is dead in the water when it comes to
multicore. I'm working on an OCaml replacement called HLVM but it is early
days yet.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e