Functional programming considered bad for parallelism

Jon Harrop

unread,

Nov 15, 2009, 11:39:23 AM11/15/09

to

Avoiding mutation is widely hailed as a design of choice for parallel
programming and this is used to advocate the adoption of functional
programming. However, functional programming seems to have some serious
disadvantages in the context of shared-memory parallelism for multicores.

Firstly, functional data structures are more allocation intensive that
imperative data structures. Allocation is access to a global mutable
resource (the shared heap). Even with a mostly-concurrent collector, any
thread allocating heavily degrades the performance of all other threads and
destroys scalability across the entire system.

Secondly, functional data structures are more memory intensive than
imperative data structures. This places more stress on memory bandwidth
which, again, is a shared resource and stressing it degrades scalability
across the entire system.

So it seems the eschewing mutation may be a fundamentally bad idea in the
context of parallelism.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Nobody

unread,

Nov 15, 2009, 1:37:45 PM11/15/09

to

On Sun, 15 Nov 2009 16:39:23 +0000, Jon Harrop wrote:

> Avoiding mutation is widely hailed as a design of choice for parallel
> programming and this is used to advocate the adoption of functional
> programming. However, functional programming seems to have some serious
> disadvantages in the context of shared-memory parallelism for multicores.
>
> Firstly, functional data structures are more allocation intensive that
> imperative data structures.

What's a "functional data structure"?

Or are you conflating existing implementations of functional languages
with functional languages and/or functional programming in general?

> Allocation is access to a global mutable
> resource (the shared heap). Even with a mostly-concurrent collector, any
> thread allocating heavily degrades the performance of all other threads and
> destroys scalability across the entire system.

There's no reason why you can't give each thread a separate arena.

> Secondly, functional data structures are more memory intensive than
> imperative data structures. This places more stress on memory bandwidth
> which, again, is a shared resource and stressing it degrades scalability
> across the entire system.

This claim cannot be addressed without explaining what you consider to be
a functional data structure.

> So it seems the eschewing mutation may be a fundamentally bad idea in the
> context of parallelism.

Even if it's true in some cases, which cases depend upon the type of
parallelism you're talking about. With NUMA, a functional approach
with higher total memory bandwidth distributed evenly between banks may
beat an imperative approach with a lower total bandwidth concentrated on a
single bank.

Jon Harrop

unread,

Nov 15, 2009, 6:24:19 PM11/15/09

to

Nobody wrote:
> On Sun, 15 Nov 2009 16:39:23 +0000, Jon Harrop wrote:
>> Avoiding mutation is widely hailed as a design of choice for parallel
>> programming and this is used to advocate the adoption of functional
>> programming. However, functional programming seems to have some serious
>> disadvantages in the context of shared-memory parallelism for multicores.
>>
>> Firstly, functional data structures are more allocation intensive that
>> imperative data structures.
>
> What's a "functional data structure"?

Immutable:

http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

> Or are you conflating existing implementations of functional languages
> with functional languages and/or functional programming in general?

I am basing this on what purely functional data structures are currently
available.

>> Allocation is access to a global mutable
>> resource (the shared heap). Even with a mostly-concurrent collector, any
>> thread allocating heavily degrades the performance of all other threads
>> and destroys scalability across the entire system.
>
> There's no reason why you can't give each thread a separate arena.

Threads do have their own nursery generations in production quality
collectors but heavy allocation still kills scalability.

>> So it seems the eschewing mutation may be a fundamentally bad idea in the
>> context of parallelism.
>
> Even if it's true in some cases, which cases depend upon the type of
> parallelism you're talking about. With NUMA, a functional approach
> with higher total memory bandwidth distributed evenly between banks may
> beat an imperative approach with a lower total bandwidth concentrated on a
> single bank.

Can you give a concrete example?

Vend

unread,

Nov 15, 2009, 8:00:38 PM11/15/09

to

On Nov 16, 12:24 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> Threads do have their own nursery generations in production quality
> collectors but heavy allocation still kills scalability.

If have n cores with one thread per core, and you divide the heap in n
areas, one per thread, how does heavy allocation affect scalability?

Jon Harrop

unread,

Nov 15, 2009, 8:48:34 PM11/15/09

to

Vend wrote:
> On Nov 16, 12:24 am, Jon Harrop <j...@ffconsultancy.com> wrote:

>> Nobody wrote:

>>> Jon Harrop wrote:
>>>> Allocation is access to a global mutable resource (the shared heap).
>>>

>>> ...

>>
>> Threads do have their own nursery generations in production quality
>> collectors but heavy allocation still kills scalability.
>
> If have n cores with one thread per core, and you divide the heap in n
> areas, one per thread, how does heavy allocation affect scalability?

You are describing distributed parallelism without a shared heap which is a
different scenario. In that case, allocation does scale but communication
does not. Specifically, the advantage of a shared heap is that one thread
can pass data to another thread by reference. Without a shared heap you
must resort to copying the data (usually into a message), which stresses
memory bandwidth and introduces contention for the thread that owns the
data and both kill scalability.

In essence, that Erlang-style design is great for distributed parallelism
but fails to take advantage of the shared-memory nature of a multicore.
Consequently, it is very slow in absolute terms.

Vend

unread,

Nov 16, 2009, 4:01:13 AM11/16/09

to

On Nov 16, 2:48 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> Vend wrote:
> > On Nov 16, 12:24 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> >> Nobody wrote:
> >>> Jon Harrop wrote:
> >>>> Allocation is access to a global mutable resource (the shared heap).
>
> >>> ...
>
> >> Threads do have their own nursery generations in production quality
> >> collectors but heavy allocation still kills scalability.
>
> > If have n cores with one thread per core, and you divide the heap in n
> > areas, one per thread, how does heavy allocation affect scalability?
>
> You are describing distributed parallelism without a shared heap which is a
> different scenario.

No.
Maybe I didn't express myself clearly, but I was thinking of threads
sharing the same heap but each allocating into a different areas.

> introduces contention for the thread that owns the
> data

What do you mean?

> In essence, that Erlang-style design is great for distributed parallelism
> but fails to take advantage of the shared-memory nature of a multicore.
> Consequently, it is very slow in absolute terms.

I suppose that in Erlang when you send messages between threads
running on the same vm, it actually passes references, doesn't it?

Torben Ægidius Mogensen

unread,

Nov 16, 2009, 8:09:42 AM11/16/09

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Avoiding mutation is widely hailed as a design of choice for parallel
> programming and this is used to advocate the adoption of functional
> programming. However, functional programming seems to have some serious
> disadvantages in the context of shared-memory parallelism for multicores.
>
> Firstly, functional data structures are more allocation intensive that
> imperative data structures. Allocation is access to a global mutable
> resource (the shared heap). Even with a mostly-concurrent collector, any
> thread allocating heavily degrades the performance of all other threads and
> destroys scalability across the entire system.

You can solve this in various ways. One is to use both local and shared
heaps and allocate shared data structures on the shared heap. You need
to know which data structures are shared, which you can do in several
ways:

1. Copy data to the shared heap when it becomes shared (i.e., when you
pass a pointer to another thread). This costs, but only once.
Since data structures are immutable, the local threads can use their
own local copies even after copying to the shared heap. You can do
the copying lazily by having nodes in the shared heap which are
"pointers" into local heaps and then request copying of more nodes
when accessed. This is most easily done in languages that already
support lazy evaluation.

2. Use a program analysis to predict sharing. This requires
whole-program analysis and can be fragile.

3. Let types distinguish shared and local data. This is simple and
efficient, and you can let type inference handle most of the
analysis, except in module interfaces.

> Secondly, functional data structures are more memory intensive than
> imperative data structures.

How so? A functional array does not take more space than an imperative
ditto, nor does a functional struct take up more space than an
imperative ditto.

> So it seems the eschewing mutation may be a fundamentally bad idea in the
> context of parallelism.

Shared memory doesn't scale, so you are essentally looking at a model
that will go away in the future anyway.

In any case, you have "forgotten" the benefits of immutability to
parallelism:

- You can have multiple local copies of a data structure without
worrying about inconsistency.

- Evaluation order is much more flexible.

Torben

Nobody

unread,

Nov 16, 2009, 12:31:08 PM11/16/09

to

On Mon, 16 Nov 2009 01:48:34 +0000, Jon Harrop wrote:

>>> Threads do have their own nursery generations in production quality
>>> collectors but heavy allocation still kills scalability.
>>
>> If have n cores with one thread per core, and you divide the heap in n
>> areas, one per thread, how does heavy allocation affect scalability?
>
> You are describing distributed parallelism without a shared heap which is a
> different scenario. In that case, allocation does scale but communication
> does not. Specifically, the advantage of a shared heap is that one thread
> can pass data to another thread by reference. Without a shared heap you
> must resort to copying the data (usually into a message), which stresses
> memory bandwidth and introduces contention for the thread that owns the
> data and both kill scalability.

Having multiple memory banks tied to distinct processors doesn't mean that
you can't access a bank from another processor, just that it may be less
efficient (but it's unlikely to be less efficient than unconditionally
copying the data).

Similarly, if you have a single CPU and a single memory bank but give each
thread its own arena, you can pass data between threads while avoiding
contention for allocation.

> In essence, that Erlang-style design is great for distributed parallelism
> but fails to take advantage of the shared-memory nature of a multicore.
> Consequently, it is very slow in absolute terms.

Avoiding in-place modification may reduce cache coherence, but nothing
prohibits a functional language implementation from using in-place
modification when it can detect that this is safe. The "Clean" language
explicitly supports this through uniqueness typing, but it's still
considered to be a functional language.

OTOH, even in imperative languages, it's common to use "functional" data
structures to simplify atomic updates. I.e. rather than locking a complex
data structure while it is updated in-place, create a modified version
then update a pointer (which can be done atomically).

Ultimately, it's entirely common for parallelisation to incur a
performance penalty (i.e. N cores don't provide an N-fold speed increase),
whether due to locking, communication, duplication of effort or whatever.

Jon Harrop

unread,

Nov 16, 2009, 9:22:39 PM11/16/09

to

Vend wrote:
> On Nov 16, 2:48 am, Jon Harrop <j...@ffconsultancy.com> wrote:
>> Vend wrote:
>> > On Nov 16, 12:24 am, Jon Harrop <j...@ffconsultancy.com> wrote:
>> >> Nobody wrote:
>> >>> Jon Harrop wrote:
>> >>>> Allocation is access to a global mutable resource (the shared heap).
>>
>> >>> ...
>>
>> >> Threads do have their own nursery generations in production quality
>> >> collectors but heavy allocation still kills scalability.
>>
>> > If have n cores with one thread per core, and you divide the heap in n
>> > areas, one per thread, how does heavy allocation affect scalability?
>>
>> You are describing distributed parallelism without a shared heap which is
>> a different scenario.
>
> No.
> Maybe I didn't express myself clearly, but I was thinking of threads
> sharing the same heap but each allocating into a different areas.

In what sense is that a shared heap?

>> introduces contention for the thread that owns the data
>
> What do you mean?

If many threads want to read that data then they all have to ask permission
from the owner (who sends the message containing the data) which is
contention on a single global shared resource (the owner) and that doesn't
scale.

>> In essence, that Erlang-style design is great for distributed parallelism
>> but fails to take advantage of the shared-memory nature of a multicore.
>> Consequently, it is very slow in absolute terms.
>
> I suppose that in Erlang when you send messages between threads
> running on the same vm, it actually passes references, doesn't it?

No, it copies everything. Indeed, that is precisely why Erlang is so slow
and unsuitable for parallel programming.

Paul Rubin

unread,

Nov 16, 2009, 9:35:56 PM11/16/09

to

Jon Harrop <j...@ffconsultancy.com> writes:
> > I suppose that in Erlang when you send messages between threads
> > running on the same vm, it actually passes references, doesn't it?
>
> No, it copies everything. Indeed, that is precisely why Erlang is so slow
> and unsuitable for parallel programming.

I thought that HIPE optimizes this.

Jon Harrop

unread,

Nov 16, 2009, 10:55:37 PM11/16/09

to

Torben Ægidius Mogensen wrote:
> You can solve this in various ways. One is to use both local and shared
> heaps and allocate shared data structures on the shared heap. You need
> to know which data structures are shared, which you can do in several
> ways:
>
> 1. Copy data to the shared heap when it becomes shared (i.e., when you
> pass a pointer to another thread). This costs, but only once.
> Since data structures are immutable, the local threads can use their
> own local copies even after copying to the shared heap. You can do
> the copying lazily by having nodes in the shared heap which are
> "pointers" into local heaps and then request copying of more nodes
> when accessed. This is most easily done in languages that already
> support lazy evaluation.
>
> 2. Use a program analysis to predict sharing. This requires
> whole-program analysis and can be fragile.
>
> 3. Let types distinguish shared and local data. This is simple and
> efficient, and you can let type inference handle most of the
> analysis, except in module interfaces.

Very interesting, thank you.

>> Secondly, functional data structures are more memory intensive than
>> imperative data structures.
>
> How so? A functional array does not take more space than an imperative
> ditto, nor does a functional struct take up more space than an
> imperative ditto.

Assuming you mean an immutable array, it doesn't offer the same
functionality. Specifically, fast update. A purely functional data
structure with fast update (e.g. a balanced binary tree) would not be an
array and would take up more space than an array.

>> So it seems the eschewing mutation may be a fundamentally bad idea in the
>> context of parallelism.
>
> Shared memory doesn't scale, so you are essentally looking at a model
> that will go away in the future anyway.

I don't think so. Imagine a future where we have millions of cores and a
dozen levels of caches. They will still be hierarchical and there will
still be clusters of local cores wherein shared memory will be the most
efficient solution for many problems within each cluster. This is not
unlike a cluster of multicore machines today.

> In any case, you have "forgotten" the benefits of immutability to
> parallelism:
>
> - You can have multiple local copies of a data structure without
> worrying about inconsistency.
>
> - Evaluation order is much more flexible.

Why are those beneficial?

Michael Ekstrand

unread,

Nov 16, 2009, 11:11:58 PM11/16/09

to

The first is beneficial because it lets you cache without worrying about
cache coherency. Forbidding mutations means the cache will never be
invalidated by the actions of another thread/task/process.

One benefit of flexible evaluation order is the potential for
micro-optimization and perhaps parallelization. If the language doesn't
guarantee evaluation order, then code also shouldn't notice that
expressions happen to be evaluated in parallel. This would happen if
you combine flexible evaluation order with the way lazy values are
implemented in Mozart/Oz - forcing a lazy value spawns a thread to
compute the value and blocks on the resulting computation. If you force
all parameters in parallel, a fully concurrent runtime could then
compute them in parallel and get away with it because there are not
evaluation order problems.

With the caveat of array mutations, I think Torben makes a good case for
a parallel/concurrent programming method built around immutable state.
In fact, it follows very closely with some of my own thoughts on how to
build such a system.

- Michael

Jon Harrop

unread,

Nov 16, 2009, 11:16:37 PM11/16/09

to

Nobody wrote:
> On Mon, 16 Nov 2009 01:48:34 +0000, Jon Harrop wrote:
>>>> Threads do have their own nursery generations in production quality
>>>> collectors but heavy allocation still kills scalability.
>>>
>>> If have n cores with one thread per core, and you divide the heap in n
>>> areas, one per thread, how does heavy allocation affect scalability?
>>
>> You are describing distributed parallelism without a shared heap which is
>> a different scenario. In that case, allocation does scale but
>> communication does not. Specifically, the advantage of a shared heap is
>> that one thread can pass data to another thread by reference. Without a
>> shared heap you must resort to copying the data (usually into a message),
>> which stresses memory bandwidth and introduces contention for the thread
>> that owns the data and both kill scalability.
>
> Having multiple memory banks tied to distinct processors doesn't mean that
> you can't access a bank from another processor, just that it may be less
> efficient (but it's unlikely to be less efficient than unconditionally
> copying the data).
>
> Similarly, if you have a single CPU and a single memory bank but give each
> thread its own arena, you can pass data between threads while avoiding
> contention for allocation.

Sure but communicating updates to those data structures through the owning
thread is now the source of contention.

>> In essence, that Erlang-style design is great for distributed parallelism
>> but fails to take advantage of the shared-memory nature of a multicore.
>> Consequently, it is very slow in absolute terms.
>
> Avoiding in-place modification may reduce cache coherence, but nothing
> prohibits a functional language implementation from using in-place
> modification when it can detect that this is safe.

I appreciate the dream but even state-of-the-art FPL implementations are
nowhere near reaching it. Just the two line idiomatic Haskell "quicksort"
is 6,000x slower than it should be precisely because GHC is too dumb to
work out how to use in-place modification.

> The "Clean" language explicitly supports this through uniqueness typing,
> but it's still considered to be a functional language.

Aren't uniqueness types very limited in what they can express? For example,
could you reimplement one of the performant scalable concurrent Java hash
tables touted by Azul in Clean using uniqueness types?

> OTOH, even in imperative languages, it's common to use "functional" data
> structures to simplify atomic updates. I.e. rather than locking a complex
> data structure while it is updated in-place, create a modified version
> then update a pointer (which can be done atomically).

Yes.

> Ultimately, it's entirely common for parallelisation to incur a
> performance penalty (i.e. N cores don't provide an N-fold speed increase),
> whether due to locking, communication, duplication of effort or whatever.

Sure.

Nobody

unread,

Nov 17, 2009, 1:55:48 PM11/17/09

to

On Tue, 17 Nov 2009 04:16:37 +0000, Jon Harrop wrote:

>> Similarly, if you have a single CPU and a single memory bank but give each
>> thread its own arena, you can pass data between threads while avoiding
>> contention for allocation.
>
> Sure but communicating updates to those data structures through the owning
> thread is now the source of contention.

To the extent that's true, it's an argument against mutable data
structures and in favour of immutable ones.

With a "functional" structure, the writer creates a new version of the
structure then communicates the updated "root" to the reader(s). With a
mutable structure, the readers have to be locked out while modification is
in progress to prevent them from seeing an intermediate state.

Jon Harrop

unread,

Nov 17, 2009, 8:07:49 PM11/17/09

to

Nobody wrote:
> On Tue, 17 Nov 2009 04:16:37 +0000, Jon Harrop wrote:
>>> Similarly, if you have a single CPU and a single memory bank but give
>>> each thread its own arena, you can pass data between threads while
>>> avoiding contention for allocation.
>>
>> Sure but communicating updates to those data structures through the
>> owning thread is now the source of contention.
>
> To the extent that's true, it's an argument against mutable data
> structures and in favour of immutable ones.
>
> With a "functional" structure, the writer creates a new version of the
> structure then communicates the updated "root" to the reader(s).

How do you do that without having pointers between your heaps and without
having contention on the thread that owns the heap containing the data
structure?

> With a mutable structure, the readers have to be locked out while
> modification is in progress to prevent them from seeing an intermediate
> state.

Not true in general. Concurrent hash tables are a counter example.

Vend

unread,

Nov 17, 2009, 6:56:08 PM11/17/09

to

On Nov 17, 3:22 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> Vend wrote:
> > On Nov 16, 2:48 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> >> Vend wrote:
> >> > On Nov 16, 12:24 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> >> >> Nobody wrote:
> >> >>> Jon Harrop wrote:
> >> >>>> Allocation is access to a global mutable resource (the shared heap).
>
> >> >>> ...
>
> >> >> Threads do have their own nursery generations in production quality
> >> >> collectors but heavy allocation still kills scalability.
>
> >> > If have n cores with one thread per core, and you divide the heap in n
> >> > areas, one per thread, how does heavy allocation affect scalability?
>
> >> You are describing distributed parallelism without a shared heap which is
> >> a different scenario.
>
> > No.
> > Maybe I didn't express myself clearly, but I was thinking of threads
> > sharing the same heap but each allocating into a different areas.
>
> In what sense is that a shared heap?

Any thread can read from objects allocated on the heap by any other
thread.

> >> introduces contention for the thread that owns the data
>
> > What do you mean?
>
> If many threads want to read that data then they all have to ask permission
> from the owner (who sends the message containing the data) which is
> contention on a single global shared resource (the owner) and that doesn't
> scale.

I don't get it. If shared data is immutable, there is no concept of
'ownership', at least not intuitively defined.

I think that message-based programs usually are designed so that
threads wakeup when new data is available (some sort of dataflow
design), rather than actively asking for data and waiting for the
answer.

> >> In essence, that Erlang-style design is great for distributed parallelism
> >> but fails to take advantage of the shared-memory nature of a multicore.
> >> Consequently, it is very slow in absolute terms.
>
> > I suppose that in Erlang when you send messages between threads
> > running on the same vm, it actually passes references, doesn't it?
>
> No, it copies everything. Indeed, that is precisely why Erlang is so slow
> and unsuitable for parallel programming.

Then it is a flaw of the implementation rather than the language,
since it would be quite easy and obvious to pass only references when
sending immutable data between threads running on the same node.

Vend

unread,

Nov 17, 2009, 7:04:23 PM11/17/09

to

On Nov 16, 6:31 pm, Nobody <nob...@nowhere.com> wrote:
> Avoiding in-place modification may reduce cache coherence, but nothing
> prohibits a functional language implementation from using in-place
> modification when it can detect that this is safe. The "Clean" language
> explicitly supports this through uniqueness typing, but it's still
> considered to be a functional language.

Yes, but obviously shared structures can't be unique (single-
referenced).

Jon Harrop

unread,

Nov 18, 2009, 1:50:38 PM11/18/09

to

Vend wrote:
>> If many threads want to read that data then they all have to ask
>> permission from the owner (who sends the message containing the data)
>> which is contention on a single global shared resource (the owner) and
>> that doesn't scale.
>
> I don't get it. If shared data is immutable, there is no concept of
> 'ownership', at least not intuitively defined.

Who is responsible for freeing the data? That is the owner. If you have
separate "arenas" for each thread then surely that thread must be
responsible for freeing data in its own arena, otherwise you have a shared
heap.

>> > I suppose that in Erlang when you send messages between threads
>> > running on the same vm, it actually passes references, doesn't it?
>>
>> No, it copies everything. Indeed, that is precisely why Erlang is so slow
>> and unsuitable for parallel programming.
>
> Then it is a flaw of the implementation rather than the language,
> since it would be quite easy and obvious to pass only references when
> sending immutable data between threads running on the same node.

That is only easy and obvious from the programmers point of view. From the
language implementors point of view, it is probably the single most
difficult challenge they face: writing a concurrent garbage collector that
can collect values that are not reachable from any threads in the face of
programs that pass references between threads.

Vend

unread,

Nov 19, 2009, 6:14:54 AM11/19/09

to

On Nov 18, 7:50 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> Vend wrote:
> >> If many threads want to read that data then they all have to ask
> >> permission from the owner (who sends the message containing the data)
> >> which is contention on a single global shared resource (the owner) and
> >> that doesn't scale.
>
> > I don't get it. If shared data is immutable, there is no concept of
> > 'ownership', at least not intuitively defined.
>
> Who is responsible for freeing the data? That is the owner. If you have
> separate "arenas" for each thread then surely that thread must be
> responsible for freeing data in its own arena, otherwise you have a shared
> heap.

I'm not an expert on concurrent and parallel garbage collection, but I
suppose that algorithms that scale with the number of cores, at least
in average case scenarios, exist.

Anyway, if you use mutable shared memory, you still have the same
problem, unless you strictly separe local heap from shared heap (using
processes instead of threads, for instance) and manage the shared heap
manually.

Jon Harrop

unread,

Nov 19, 2009, 11:14:14 AM11/19/09

to

Vend wrote:
> On Nov 18, 7:50 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
>> Who is responsible for freeing the data? That is the owner. If you have
>> separate "arenas" for each thread then surely that thread must be
>> responsible for freeing data in its own arena, otherwise you have a
>> shared heap.
>
> I'm not an expert on concurrent and parallel garbage collection, but I
> suppose that algorithms that scale with the number of cores, at least
> in average case scenarios, exist.

Allocating heavily on any thread stalls allocation on all other threads
with .NET. I haven't tested other production-quality GCs but I assume they
are the same.

> Anyway, if you use mutable shared memory, you still have the same
> problem, unless you strictly separe local heap from shared heap (using
> processes instead of threads, for instance) and manage the shared heap
> manually.

Purely functional data structures fill the heap with huge numbers of
pointers compared to imperative code, and that places far more stress on
the GC.

Vend

unread,

Nov 19, 2009, 11:31:17 AM11/19/09

to

On Nov 19, 5:14 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> Vend wrote:
> > On Nov 18, 7:50 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> >> Who is responsible for freeing the data? That is the owner. If you have
> >> separate "arenas" for each thread then surely that thread must be
> >> responsible for freeing data in its own arena, otherwise you have a
> >> shared heap.
>
> > I'm not an expert on concurrent and parallel garbage collection, but I
> > suppose that algorithms that scale with the number of cores, at least
> > in average case scenarios, exist.
>
> Allocating heavily on any thread stalls allocation on all other threads
> with .NET. I haven't tested other production-quality GCs but I assume they
> are the same.

What gc algorithm does .NET use?
The Sun JVM offers several, with different theoretical performances.

I can't test them since I don't have access to a multicore machine.
But, if I understand correctly, parallel scavenge should scale
properly in the scenario you describe.

> > Anyway, if you use mutable shared memory, you still have the same
> > problem, unless you strictly separe local heap from shared heap (using
> > processes instead of threads, for instance) and manage the shared heap
> > manually.
>
> Purely functional data structures fill the heap with huge numbers of
> pointers compared to imperative code, and that places far more stress on
> the GC.

I'm not sure this is correct:

Using immutable data structures causes more allocation, but unless
threads keep references to old versions of the same structure, the
overhead in number of reachable objects, and thus pointers, should be
linear in the number of threads.

Moreover immutable data structures perform better under generational
collection since they can't have old-to-young pointers.

Vend

unread,

Nov 19, 2009, 12:49:10 PM11/19/09

to

On Nov 18, 2:07 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> > With a mutable structure, the readers have to be locked out while
> > modification is in progress to prevent them from seeing an intermediate
> > state.
>
> Not true in general. Concurrent hash tables are a counter example.

The problem with shared mutable data structures is that atomicity of
operations doesn't compose:

Say you have two concurrent shared hash tables.
You want to remove a (key, value) pair from one table and insert in
the other one, atomicily (no other thread should see a state with the
pair in both tables or in none).
How do you do that without using locks or transactional memory?

Casey Hawthorne

unread,

Nov 19, 2009, 5:20:21 PM11/19/09

to

On Thu, 19 Nov 2009 16:14:14 +0000, Jon Harrop <j...@ffconsultancy.com>
wrote:

>Purely functional data structures fill the heap with huge numbers of
>pointers compared to imperative code, and that places far more stress on
>the GC.

I think this is a problem with linked data structures in general;
however, in Haskell one can use immutable (unboxed) arrays, where the
ordering between the elements is implicit, and therefore has no
pointer overhead.

I would include an immutable array as a purely functional data
structure.

--
Regards,
Casey