On Tue, Oct 25, 2011 at 4:15 AM, Simon Marlow <marlo
...@gmail.com> wrote:
> Ok, so clearly the sparks scheduler has much lower overhead - as we expect,
> given that it is basically the Eval monad. However, the sparks scheduler
> has subtly different semantics than the Trace scheduler. For example, try
> this with the Sparks and the Trace schedulers:
> main = print (runPar (spawn_ (error "help!") >> return 42))
> The Sparks semantics is perfectly fine (better even), but it is not
> implementable in the Trace scheduler.
> Also, the Sparks scheduler will be affected by the fixed-size spark pools.
> On the other hand, the new spark tracing in ThreadScope can be used for
> debugging performance issues when using the Sparks scheduler.
> Cheers,
> Simon
> On 25/10/2011 00:41, Ryan Newton wrote:
>> FYI, my prior comments about a bug resulting from the merge weren't
>> really true. It was just a problem with the parfib benchmark itself.
>> It's fixed and I did a little regression testing (attached below).
>> Because there was no performance regression at all on parfib I went
>> ahead an merged branch "nested" into "master". If there are any
>> objections I'll roll it back.
>> -Ryan
>> [2011.10.24] {Timing nested scheduler version}
>> ------------------------------**----------------
>> Checking for performance regression. This is on a 3.1 GHz Westmere
>> with hyperthreading disabled. First a plain fib on the nested branch:
>> Data Schema: User, system, productivity, alloc
>> fib(38) 1 thread : 20.2 19.7 94.1% 82GB -- TraceNested
>> fib(38) 4 threads: 6.23 24.2 90.6% 85GB -- TraceNested
>> And for arguments sake with a cutoff of 10:
>> fib(42) 1 thread : 5.5 5.5 89.2% 8.2GB -- TraceNested
>> fib(42) 4 threads: 1.72 6.38 87.5% 8.4GB -- TraceNested
>> And with the Sparks scheduler:
>> fib(38) 1 thread : 2.2
>> fib(38) 4 threads: .75 2.7 69.0% 7.5GB
>> fib(42) 1 thread : 14.8 14.5 82.8% 52GB
>> fib(42) 4 threads: 4.7 18.3 71.1% 52GB
>> fib(42) 4 threads: 1.0 3.8 100% 11MB -- cutoff 10
>> And the plain par/pseq version:
>> fib(42) 1 thread : 8.7 8.6 86.2% 17GB
>> fib(42) 4 threads: 2.8 10.5 73.9% 17GB
>> And then for regression testing the ORIGINAL Trace scheduler (no nesting
>> support):
>> fib(38) 1 thread : 22.1 21.5 93.8% 97GB -- TraceOrig
>> fib(38) 4 threads: 7.5 28.6 90.4% 97GB -- TraceOrig
>> Indeed, rather than regression, it would seem that Daniel improved the
>> parfib performance!
>> Super-nested parfib:
>> -----------------------
>> And the perversely Nested parfib:
>> nfib(38) 1 thread : 3.3 3.2 82.7% 12G -- nested but
>> Sparks.hs
>> nfib(38) 4 threads: 1.1 4.1 70.9% 12.9GB -- nested but
>> Sparks.hs
>> Oops! That was with the sparks scheduler! Here's the actual
>> Trace/nested:
>> nfib(30) 4 threads: 1.3 4.8 93.5% 7GB -- super nested fib
>> / trace
>> nfib(32) 4 threads: 3.26 11.7 92.9% 18GB
>> nfib(42) 4 threads: 6.5 23.5 94.7% 29.6GB -- cutoff 10:
>> (Note, those only used 376% cpu.)
>> Finally, this is the original Trace scheduler on the perversely nested
>> parfib:
>> nfib(30) 1 thread : 1.8 1.8 92.1% 5GB
>> nfib(32) 1 threads: 4.9 4.8 91.7% 14.9GB
>> nfib(32) 4 threads: -- memory explosion
>> nfib(28) 4 threads: 9.7 37.2 33.8% 5.8GB -- 2GB ram usage
>> One interesting consequence here is that while the Sparks scheduler
>> has an 8X advantage over Trace (and par/pseq an additional 60%
>> advantage, 13.8X total), that advantage widens to over 256X in the
>> case of the perversely nested parfib!!!
>> On Mon, Oct 24, 2011 at 11:30 AM, Simon Marlow <marlo...@gmail.com
>> <mailto:marlo...@gmail.com>> wrote:
>> > On 24/10/2011 15:43, Ryan Newton wrote:
>> >> Ah, if only the only problem with version control merges were the
>> >> explicit conflicts (and not the non-conflicting bug introductions).
>> >> The nested branch seems to have worked on "parfib nested" before
>> pulling
>> >> new changes from master this morning. Now it works for non-nested
>> >> tests, but it gets a stack overflow on parfib nested 2 (yes even of
>> 2!).
>> >> I'm just putting this out there in case anyone else wants to take a
>> >> look. After it's fixed it would be nice to run benchmark.hs on the
>> >> nested branch to look for performance regressions.
>> >> Speaking of performance regressions we've been seeing some pretty bad
>> >> results on older Intel and AMD architectures (see results/). Simon
>> M's
>> >> 24 core machine always did well with monad-par -- what was its
>> >> configuration again? It would be nice to get results from there in
>> the
>> >> results/ collection.
>> > 4x Intel Xeon E7450 (2.4GHz), Windows Server 2008.
>> > I think I used +RTS -A1m
>> > Make sure you're using at least GHC 7.2.1, because there's a little
>> > optimisation in runPar_internal that affects the initial scheduling of
>> > workers to OS threads.
>> > Cheers,
>> > Simon