AMD Bulldozer modules and Haskell parallelism

Herbert Valerio Riedel

unread,

Oct 13, 2011, 4:48:52 AM10/13/11

to parallel-haskell

Hello Parallel Haskellers,

With the recently made available Bulldozer processors, I'm wondering
how well GHC's parallel runtime is able to exploit the 8-cores
contained in e.g. a FX-8120 processor, as it seems it would provide an
affordable playground for experimenting with Haskell's parallelism
features.

The reason I'm a bit worried though, is that the the 8-cores are not
as independent as in previous multi-core processors, but grouped in
pairs[1]:

> AMD has introduced a new microarchitecture building block called module. In terms of hardware complexity and functionality, a module is midway between a dual-core processor (in which each core is fully independent) and a single processor core that has two SMT threads (in which each thread shares most of the hardware resources with the other thread).

Does this pose any (significant) runtime penalties (due to resource
contention issues) for Haskell's HECs, if I want to run 8 HECs on such
a 8-core FX-8120 or does such a Bulldozer based 8-core processor
behave good enough to represent a conventional 8-core system as far as
Haskell's RTS is concerned?

PS: Or to put it differently: Should I buy a FX-8120 if I expect to
get a 8-core platform to use for developing Haskell applications
exploiting up to 8 cores? Or does this just get me a 4-core system
with a slightly better SMT-capablity (which I assume Haskell's RTS
isn't tuned for), in which case I might as well rather stay with the
quad-core processor system I already have?

[1]: http://en.wikipedia.org/wiki/Bulldozer_%28processor%29

Simon Marlow

unread,

Oct 13, 2011, 6:31:46 AM10/13/11

to parallel...@googlegroups.com

The answer is I have no idea. Someone needs to get hold of a box with one of these and run some benchmarks.

One issue is that you want the scheduler (both the OS scheduler and the Haskell scheduler) to know about the architecture, so that it can preferentially use cores from distinct pairs first. We don't bother doing this with hyperthreading so far, because in most cases it seems that trying to use hyperthreaded cores with GHC doesn't work well, so we use real cores only and assume that the OS scheduler gives us real cores in preference (Which it usually does).

Cheers,
Simon

Duncan Coutts

unread,

Oct 13, 2011, 6:47:57 AM10/13/11

to parallel...@googlegroups.com

On 13 October 2011 11:31, Simon Marlow <simo...@microsoft.com> wrote:
>> With the recently made available Bulldozer processors, I'm wondering how
>> well GHC's parallel runtime is able to exploit the 8-cores contained in
>> e.g. a FX-8120 processor, as it seems it would provide an affordable
>> playground for experimenting with Haskell's parallelism features.
>>
>> The reason I'm a bit worried though, is that the the 8-cores are not as
>> independent as in previous multi-core processors, but grouped in
>> pairs[1]:
>

> The answer is I have no idea. Someone needs to get hold of a box with one of these and run some benchmarks.

>
> One issue is that you want the scheduler (both the OS scheduler and the Haskell scheduler) to know about the architecture, so that it can preferentially use cores from distinct pairs first. We don't bother doing this with hyperthreading so far, because in most cases it seems that trying to use hyperthreaded cores with GHC doesn't work well, so we use real cores only and assume that the OS scheduler gives us real cores in preference (Which it usually does).

My understanding from reading about the architecture over the last few
months is that it's all fine. It's not at all like hyperthreading.
Each core in the module has its own integer pipeline and L1 data
cache. The things that are shared are the FP units and instruction
decode and L1 instruction cache. In fact I understand that a good
strategy for the OS scheduler is to fill up both cores of a module,
especially if the two threads share memory.

So yes, of course someone needs to run benchmarks, but I think the
prospects are pretty positive. More worrying than the module sharing
is the increased cache and memory latency and the deeper pipelines so
greater branch misprediction penalties (but that just makes it slower
in a single threaded way, doesn't affect threading).

Duncan

Reply all

Reply to author

Forward