Can't gain any speedup benefits from parallelizing

89 views
Skip to first unread message

Kir

unread,
Jan 26, 2012, 7:53:51 AM1/26/12
to parallel-haskell
Hello,
I've tried first two examples from this tutorial:
http://research.microsoft.com/pubs/74058/parallel_haskell2.pdf
But when i launched programs with +RTS -N2 flags i had bigger
execution time rather then without this flag. I've compiled examples
with -O2 flags, but when i compile first example without this flag i
saw some speedup, second example faster in serial mode in this
situation too. What can i did wrong? Is these some limitation on using
optimizations and parallelizing?

Ran on Windows 7 Haskell Platform 2011.4.0.0. Intel core i5.
Compilation command: ghc --make -O2 -rtsopts -threaded test.hs
Run command: test.exe +RTS -N2 -s

Thanks!

Mikolaj Konarski

unread,
Jan 26, 2012, 8:37:16 AM1/26/12
to kolodyazh...@gmail.com, parallel-haskell
Hi Kir,

There is a step-by-step tutorial about diagnosing
parallel performance problems here:

http://www.haskell.org/haskellwiki/ThreadScope_Tour

It focuses on ThreadScope and problems in the tested code,
not in GHC or library code, but it can still prove general enough
to help you see what's going on. And of course you are welcome
to actually use ThreadScope, with our help if needed.
Any feedback about the tutorial and/or TS would be appreciated.

Kir

unread,
Jan 26, 2012, 8:44:23 AM1/26/12
to parallel...@googlegroups.com
Thanks for advise but there is a problem with current version of ThreadScope under windows (i did not try under Linux) - it does not compiled on my system. Compiler say that  widgetSetCanFocus is out of scope in Gtk2hs 0.12.2 (I can't install previous version because of missing dependencies). I've installed ThreadScope from sources and remove widgetSetCanFocus call, but now ThreadScope crashes every time i try to open the event log.

Mikolaj Konarski

unread,
Jan 26, 2012, 8:57:34 AM1/26/12
to kolodyazh...@gmail.com, parallel...@googlegroups.com
Too bad about TS compilation. Still there is one chapter
of the tutorial does does not depend on TS at all.
I wonder how do your numbers compare to those:

http://www.haskell.org/haskellwiki/ThreadScope_Tour/Statistics

> Thanks for advise but there is a problem with current version of ThreadScope
> under windows (i did not try under Linux)

I think we've had no problems under Linux with GHC 7.0.4,
though I'm not sure we've tested with exactly the HP set of libraries.

> - it does not compiled on my
> system. Compiler say that  widgetSetCanFocus is out of scope in Gtk2hs
> 0.12.2

That happens when installing Gtk2hs alone, too, doesn't it?

> (I can't install previous version because of missing dependencies).

0.12.1? 0.12.0? Or previous version of TS?

> I've installed ThreadScope from sources and remove widgetSetCanFocus call,
> but now ThreadScope crashes every time i try to open the event log.

You mean you've removed the line
"widgetSetCanFocus drawArea True"?
The removal should not cause any problems,
so crash must have yet different causes.
No backtrace of the crash, by any chance?

Kir

unread,
Jan 26, 2012, 9:08:10 AM1/26/12
to parallel...@googlegroups.com, kolodyazh...@gmail.com
1. Of cause i've checked statistics, and it shows that all sparks were converted to parallel. Here is my output:
                      MUT time (elapsed)       GC time  (elapsed)
Task  0 (worker) :    0.00s    (  2.43s)       0.00s    (  0.00s)
Task  1 (worker) :    1.34s    (  2.43s)       0.00s    (  0.00s)
Task  2 (bound)  :    2.14s    (  2.43s)       0.03s    (  0.00s)
Task  3 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)

SPARKS: 1 (1 converted, 0 pruned)

INIT  time    0.00s  (  0.00s elapsed)
MUT   time    3.48s  (  2.43s elapsed)
GC    time    0.03s  (  0.00s elapsed)
EXIT  time    0.00s  (  0.00s elapsed)
Total time    3.51s  (  2.43s elapsed)

2. Gtk2hs was installed without any problems, demos were compiled and launched succesfully
3. Yes I've removed line "widgetSetCanFocus drawArea True" in EventsView.hs
Backtrace is:
_cairo_win32_scaled_font_ucs4_to_index:GetGlyphIndicesW: Unknown GDI error>threa
dscope: user error (out of memory)

Christopher Brown

unread,
Jan 26, 2012, 9:11:54 AM1/26/12
to kolodyazh...@gmail.com, parallel...@googlegroups.com
According to that you are getting 1 spark converted. That's not enough to get any parallelism.

It would help if you could show us the exact code for your example. Perhaps you typed something in wrong?

Chris.

Kir

unread,
Jan 26, 2012, 9:15:31 AM1/26/12
to parallel...@googlegroups.com
I've took example from tutorial:

import System.Time
import Control.Parallel

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

mkList :: Int -> [Int]
mkList n = [1..n-1]

relprime :: Int -> Int -> Bool
relprime x y = gcd x y == 1

euler :: Int -> Int
euler n = length (filter (relprime n) (mkList n))

sumEuler :: Int -> Int
sumEuler = sum . (map euler) . mkList

parSumFibEuler :: Int -> Int -> Int
parSumFibEuler a b = f `par` (e `pseq`(f + e))
                     where f = fib a
                              e = sumEuler b

secDiff :: ClockTime -> ClockTime -> Float
secDiff (TOD secs1 psecs1) (TOD secs2 psecs2) = fromInteger (psecs2 - psecs1) / 1e12 + fromInteger (secs2 - secs1)

r1 :: Int
r1 = parSumFibEuler 40 5300

main :: IO ()
main = do
   
    t4 <- getClockTime
    pseq r1 (return ())
    t5 <- getClockTime
    putStrLn ("sum: " ++ show r1)
    putStrLn ("time: " ++ show (secDiff t4 t5) ++ " seconds")

Also i measured times for "fib" and "sumEuler" functions separately and on my system each took about one second.

Mikolaj Konarski

unread,
Jan 26, 2012, 4:11:48 PM1/26/12
to kolodyazh...@gmail.com, parallel...@googlegroups.com
Hello Kir,

I've run your code and I've got no speedup either.
On GHC 7.4.x I've got the following sparks summary:

SPARKS: 1 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 0 fizzled)

which shows there was no real parallelism at all,
either between several sparks (as there is only one)
nor even between the one spark and the main execution thread,
because the single spark proved unnecessary and was garbage
collected before it got executed. I haven't analyzed yet why the code
exhibits such a behaviour.

It's possible we actually have the same spark profile,
but 7.4 is more accurate or less buggy when it reports sparks.
Do you get any speedup on the sudoku test
from the TS tutorial?

> 2. Gtk2hs was installed without any problems, demos were compiled and
> launched succesfully
> 3. Yes I've removed line "widgetSetCanFocus drawArea True" in EventsView.hs
> Backtrace is:
> _cairo_win32_scaled_font_ucs4_to_index:GetGlyphIndicesW: Unknown GDI
> error>threa
> dscope: user error (out of memory)

Thank you. I will file two bug reports with this data.
The first problem may be caused by conditional compilation in gtk2hs,
which on your OS switches off some functions
that TS happens to use. Otherwise I have no explanation,
because TS compiles ok for us. Can I confirm again that
gtk2hs is 0.12.2 and TS is 0.2.1?

The second looks like an OS-specific gtk2hs crash,
but it's hard to tell anything more until an interested
gtk2hs hacker with an access to your OS is found.
Which version of GTK for your OS do you use?

Mikolaj Konarski

unread,
Jan 26, 2012, 5:20:55 PM1/26/12
to kolodyazh...@gmail.com, parallel...@googlegroups.com
I've experimented a bit with the code and I think there may be
a problem in RTS, but I won't go any further without help
of a more experienced person.

When I exchange what 'f' and 'e' compute as follows

where f = sumEuler b
e = fib a

I get

SPARKS: 1 (1 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

So perhaps you've also got one converted spark, after all.
But anyway, there is no speedup and TS shows it's because
the 'f' spark is executed only after the 'e' computation finishes.

OTOH, when I make both 'f' and 'e' compute the same thing
(with similar arguments), I get the expected speedup
and TS shows both cores are really busy.

It seems GHC 7.* RTS does not like to compute such functions
as fib and sumEuler in parallel. It can be a bug or it can even
be intentional and beneficial for real life examples with many sparks,
where RTS can afford to be picky about what to run in parallel.
But it's certainly awkward that the example from the par tutorial
no longer works as intended.

Johannes Waldmann

unread,
Jan 27, 2012, 3:38:12 AM1/27/12
to parallel...@googlegroups.com

> the 'f' spark is executed only after the 'e' computation finishes.

I am getting the same behaviour here (x86_64, linux, ghc-7.2.2)

my guess is that this program does not allocation (mkList is fused) (?)
so it does not want to context-switch while executing the tight inner loops.

A fun observation is that it you compile with "-O0" (no fusion then?)
then you get a speedup (on my machine -N1 : 19 sec, -N2 : 11 sec)

J.W.


Simon Marlow

unread,
Jan 27, 2012, 3:59:02 AM1/27/12
to kolodyazh...@gmail.com, parallel...@googlegroups.com

GHC will optimise the fib function so that it does no allocation, and
computations that do no allocation cause problems for the RTS because
the scheduler never gets to run and do load-balancing (which is
necessary for parallelism).

Regarding this tutorial, I strongly recommend using my tutorial instead
which is more up to date and has examples that work:

http://community.haskell.org/~simonmar/par-tutorial.pdf

Cheers,
Simon

Kir

unread,
Jan 27, 2012, 4:34:19 AM1/27/12
to parallel...@googlegroups.com
Yes I've used gtk2hs is 0.12.2 and TS is 0.2.1 and GTK 2.24.8 in installed in Windows 7.

Mikolaj Konarski

unread,
Jan 27, 2012, 5:51:40 AM1/27/12
to kolodyazh...@gmail.com, parallel...@googlegroups.com
Kir, than you very much for the feedback and the bug details.

Tickets are at
http://trac.haskell.org/ThreadScope/ticket/21
and
http://trac.haskell.org/ThreadScope/ticket/22

Kir

unread,
Jan 27, 2012, 9:21:11 AM1/27/12
to parallel...@googlegroups.com
Thanks for the answer, yours tutorial really helps me to solve parallelization problems in my program.

Mikolaj Konarski

unread,
Jan 27, 2012, 9:33:10 AM1/27/12
to kolodyazh...@gmail.com, Eric Kow, parallel...@googlegroups.com
On Fri, Jan 27, 2012 at 15:21, Kir <kolodyazh...@gmail.com> wrote:
> Thanks for the answer, yours tutorial really helps me to solve
> parallelization problems in my program.

I'm sure Eric Kow of Well-Typed, the author of the tutorial. will be glad
to hear that. Generally, we desperately need more feedback about
the tutorial and ThreadScope so, parallel hackers, please give them a try
and let us know!

http://www.haskell.org/haskellwiki/ThreadScope_Tour

Kir

unread,
Jan 27, 2012, 9:52:12 AM1/27/12
to parallel...@googlegroups.com, Eric Kow
I've successfully repeated all sudoku examples from this source https://github.com/simonmar/par-tutorial.git, and they showed parallel speedup.
Reply all
Reply to author
Forward
0 new messages