exhaustivity and reachability: perhaps amenable to task-parallelism?

36 views
Skip to first unread message

Miguel Garcia

unread,
May 30, 2012, 8:24:24 AM5/30/12
to scala-i...@googlegroups.com

Thinking aloud, in case exhaustivity and reachability can wait till typing is done, looks like they could run task-parallel style, ie much like the prototype "parallel dead code elimination" [1]

For DCE, the main limiting factor was locking upon invoking Tree.tpe or Symbol.info . Perhaps exhaustivity and reachability can be broken down into two steps (SAT setup and SAT solving), where only the first asks typer. Provided SAT solving dominates, then the sub-linear speedup might still be profitable (different working threads will lock on typer non-overlapping).


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/


[1] discussion, https://groups.google.com/d/topic/scala-internals/m8yBC7XxLSs/discussion
    sources     https://github.com/magarciaEPFL/scala/tree/parallelDCE


Adriaan Moors

unread,
May 30, 2012, 9:24:57 AM5/30/12
to scala-i...@googlegroups.com
Hi Miguel,

Yes! I think that's a great idea. I've been profiling the code and adding statistics, taking care of the obvious hot spots, but unreachability is still quite expensive.

Today is the last time I can work on this, though, since I need to finish other stuff that's on the critical path to M4.

My current plan is to merge what I have in trunk, but implement a cop out command-line switch for those who don't want the overhead. (Note: you won't see this overhead in interactive uses any way, since eclipse stops after typers.)
I think these analyses are too useful to be off by default. (Of course i'd say that, since I've been slaving over them over the last weeks.)
Happy to hear you opinion.

Here's a breakdown for compiling the top-20 users of pattern matching in the compiler sources (decided by grepping for 'match')

for reference:
[scalacfork] time spent typechecking    : 1 spans, 19.033845000s (100.0%)

[scalacfork] *** Cumulative statistics at phase patmat
[scalacfork] #typechecked identifiers : 132479
[scalacfork] #typechecked selections  : 92802
[scalacfork] #typechecked applications: 67565

[scalacfork] time spent in patmat       : 1989 spans,   3.005118000s
[scalacfork]        of which DPLL       : 32140 spans,   0.492350000s (16.4%)
[scalacfork] of which variable equality : 2899 spans,  0.323131000s (10.8%)  -- this is the most expensive part of setting up the propositions, translating T <: C constraints into boolean propositions
[scalacfork] of which in exhaustivity   : 338 spans,    0.190358000s (6.3%)
[scalacfork] of which in unreachability : 1298 spans,  1.302371000s (43.3%)

Miguel Garcia

unread,
May 30, 2012, 9:46:39 AM5/30/12
to scala-i...@googlegroups.com, adriaa...@epfl.ch

Yes, I didn't mean anything for M4. Besides, developing a task-parallel version is easier once a sequential version is available.

I'll take a look at M4 (of course :) As usual, the first step towards parallelizing is untangling (now shared) mutable state. Hopefully the SAT solver is already thread-safe.


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/

Adriaan Moors

unread,
May 30, 2012, 6:53:05 PM5/30/12
to Miguel Garcia, scala-internals
I took a first stab at parallelizing the functional bits of the analyses (i.e., those that don't interact with types/symbols)

on my machine at home it doesn't help much, but it doesn't hurt either
[quick.comp.timer: 4:11.571 sec] -Xno-patmat-analysis
[quick.comp.timer: 5:10.987 sec] with future-based warning generation (https://github.com/adriaanm/scala/commit/07c04b1b5a?w=1)
[quick.comp.timer: 5:20.596 sec] before (https://github.com/adriaanm/scala/commit/82485108a6?w=1)

(-optimise was on both for quick.comp and locker.comp)

On Wed, May 30, 2012 at 3:50 PM, Adriaan Moors <adriaa...@epfl.ch> wrote:
As usual, the first step towards parallelizing is untangling (now shared) mutable state. Hopefully the SAT solver is already thread-safe.
it's all functional :-) 

i think most of the gain would be gotten from parallellizing unreachableCase -- it has to explore quite a few possibilities (none of which mutate shared state)


Miguel Garcia

unread,
May 31, 2012, 5:29:15 AM5/31/12
to scala-i...@googlegroups.com, Miguel Garcia, adriaa...@epfl.ch

That's a useful data point. After M4 I'll give a try to tackling this problem. The rules of thumb I follow:

(a) feedback on per-thread CPU utilization via jvisualvm [1] or similar

(b) for CPU intensive tasks, the following no-frills setup has worked best for me:
      - priority queue (containing units for work, longest works first, as coarse-grained jobs as possible); and
      - fixed-size pool for worker threads (not larger than available cores).

For (b) I've used plain java.util.concurrent (as per Goetz et al book), and the resulting scheduling seems just right (not sure how to tune that when using Futures).


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/


References,

[1] http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/index.html
Reply all
Reply to author
Forward
0 new messages