There is a step-by-step tutorial about diagnosing
parallel performance problems here:
http://www.haskell.org/haskellwiki/ThreadScope_Tour
It focuses on ThreadScope and problems in the tested code,
not in GHC or library code, but it can still prove general enough
to help you see what's going on. And of course you are welcome
to actually use ThreadScope, with our help if needed.
Any feedback about the tutorial and/or TS would be appreciated.
http://www.haskell.org/haskellwiki/ThreadScope_Tour/Statistics
> Thanks for advise but there is a problem with current version of ThreadScope
> under windows (i did not try under Linux)
I think we've had no problems under Linux with GHC 7.0.4,
though I'm not sure we've tested with exactly the HP set of libraries.
> - it does not compiled on my
> system. Compiler say that widgetSetCanFocus is out of scope in Gtk2hs
> 0.12.2
That happens when installing Gtk2hs alone, too, doesn't it?
> (I can't install previous version because of missing dependencies).
0.12.1? 0.12.0? Or previous version of TS?
> I've installed ThreadScope from sources and remove widgetSetCanFocus call,
> but now ThreadScope crashes every time i try to open the event log.
You mean you've removed the line
"widgetSetCanFocus drawArea True"?
The removal should not cause any problems,
so crash must have yet different causes.
No backtrace of the crash, by any chance?
It would help if you could show us the exact code for your example. Perhaps you typed something in wrong?
Chris.
I've run your code and I've got no speedup either.
On GHC 7.4.x I've got the following sparks summary:
SPARKS: 1 (0 converted, 0 overflowed, 0 dud, 1 GC'd, 0 fizzled)
which shows there was no real parallelism at all,
either between several sparks (as there is only one)
nor even between the one spark and the main execution thread,
because the single spark proved unnecessary and was garbage
collected before it got executed. I haven't analyzed yet why the code
exhibits such a behaviour.
It's possible we actually have the same spark profile,
but 7.4 is more accurate or less buggy when it reports sparks.
Do you get any speedup on the sudoku test
from the TS tutorial?
> 2. Gtk2hs was installed without any problems, demos were compiled and
> launched succesfully
> 3. Yes I've removed line "widgetSetCanFocus drawArea True" in EventsView.hs
> Backtrace is:
> _cairo_win32_scaled_font_ucs4_to_index:GetGlyphIndicesW: Unknown GDI
> error>threa
> dscope: user error (out of memory)
Thank you. I will file two bug reports with this data.
The first problem may be caused by conditional compilation in gtk2hs,
which on your OS switches off some functions
that TS happens to use. Otherwise I have no explanation,
because TS compiles ok for us. Can I confirm again that
gtk2hs is 0.12.2 and TS is 0.2.1?
The second looks like an OS-specific gtk2hs crash,
but it's hard to tell anything more until an interested
gtk2hs hacker with an access to your OS is found.
Which version of GTK for your OS do you use?
When I exchange what 'f' and 'e' compute as follows
where f = sumEuler b
e = fib a
I get
SPARKS: 1 (1 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
So perhaps you've also got one converted spark, after all.
But anyway, there is no speedup and TS shows it's because
the 'f' spark is executed only after the 'e' computation finishes.
OTOH, when I make both 'f' and 'e' compute the same thing
(with similar arguments), I get the expected speedup
and TS shows both cores are really busy.
It seems GHC 7.* RTS does not like to compute such functions
as fib and sumEuler in parallel. It can be a bug or it can even
be intentional and beneficial for real life examples with many sparks,
where RTS can afford to be picky about what to run in parallel.
But it's certainly awkward that the example from the par tutorial
no longer works as intended.
I am getting the same behaviour here (x86_64, linux, ghc-7.2.2)
my guess is that this program does not allocation (mkList is fused) (?)
so it does not want to context-switch while executing the tight inner loops.
A fun observation is that it you compile with "-O0" (no fusion then?)
then you get a speedup (on my machine -N1 : 19 sec, -N2 : 11 sec)
J.W.
GHC will optimise the fib function so that it does no allocation, and
computations that do no allocation cause problems for the RTS because
the scheduler never gets to run and do load-balancing (which is
necessary for parallelism).
Regarding this tutorial, I strongly recommend using my tutorial instead
which is more up to date and has examples that work:
http://community.haskell.org/~simonmar/par-tutorial.pdf
Cheers,
Simon
Tickets are at
http://trac.haskell.org/ThreadScope/ticket/21
and
http://trac.haskell.org/ThreadScope/ticket/22
I'm sure Eric Kow of Well-Typed, the author of the tutorial. will be glad
to hear that. Generally, we desperately need more feedback about
the tutorial and ThreadScope so, parallel hackers, please give them a try
and let us know!