STM and useful concurrency

Mark Volkmann

unread,

Mar 22, 2009, 10:12:20 PM3/22/09

to clo...@googlegroups.com

I'm trying to understand the degree to which Clojure's STM provides
more concurrency than Java's blocking approach. I know it's difficult
to make generalizations and that specific applications need to be
measured, but I'll give it a go anyway.

Clearly using STM (dosync with Refs) makes code easier to write than
using Java synchronization because you don't have to determine up
front which objects need to be locked. In the Clojure approach,
nothing is locked. Changes in the transaction happen to in-transaction
values and there is only a small amount of blocking that occurs at the
end of the transaction when changes are being committed. Score one for
Clojure!

What concerns me though is how often the work done in two transactions
running in separate threads turns out to be useful work. It seems that
it will typically be the case that when two concurrent transactions
access the same Refs, one of them will commit and the other will
retry. The retry will discard the in-transaction changes that were
made to the Refs, essentially rendering the work it did of no value.
So there was increased concurrency, but not useful concurrency.

Of course there is a chance that the transaction contains some
conditional logic that makes it so the Refs to be accessed aren't
always the same, but my speculation is that that's are rare
occurrence. It's probably more typical that a transaction always
accesses the same set of Refs every time it executes.

This makes it seem that Java's locking approach isn't so bad. Well,
it's bad that I have to identify the objects to lock, but it's good
that it doesn't waste cycles doing work that will just be thrown away.

I hope I'm missing some important details and will be set straight by someone!

--
R. Mark Volkmann
Object Computing, Inc.

e

unread,

Mar 22, 2009, 11:29:32 PM3/22/09

to clo...@googlegroups.com

i think it would be awesome if someone worked up a scenario . . . started using regular old java . . . and we could walk into a few gotchas, publicly/transparently, where a serious java guy could chime in ... and really see what the stm buys. Then we'd have an even richer appreciation for the STM model. Specifically, how learning STM is easier than learning concurrency as taught in a typical cs curriculum.

Cosmin Stejerean

unread,

Mar 23, 2009, 12:19:19 PM3/23/09

to clo...@googlegroups.com

On Sun, Mar 22, 2009 at 9:12 PM, Mark Volkmann <r.mark....@gmail.com> wrote:

I'm trying to understand the degree to which Clojure's STM provides
more concurrency than Java's blocking approach. I know it's difficult
to make generalizations and that specific applications need to be
measured, but I'll give it a go anyway.

Clearly using STM (dosync with Refs) makes code easier to write than
using Java synchronization because you don't have to determine up
front which objects need to be locked. In the Clojure approach,
nothing is locked. Changes in the transaction happen to in-transaction
values and there is only a small amount of blocking that occurs at the
end of the transaction when changes are being committed. Score one for
Clojure!

What concerns me though is how often the work done in two transactions
running in separate threads turns out to be useful work. It seems that
it will typically be the case that when two concurrent transactions
access the same Refs, one of them will commit and the other will
retry. The retry will discard the in-transaction changes that were
made to the Refs, essentially rendering the work it did of no value.
So there was increased concurrency, but not useful concurrency.

In the case where two transactions need to modify the same Ref they definitely to be serialized, either by explicitly using locks in Java, or by letting Clojure automatically retry one of them. In either case it about the same thing happens. Transaction A starts and finishes, then Transaction B starts and finishes.

Of course there is a chance that the transaction contains some
conditional logic that makes it so the Refs to be accessed aren't
always the same, but my speculation is that that's are rare
occurrence. It's probably more typical that a transaction always
accesses the same set of Refs every time it executes.

Which Refs your transactions modify will depend heavily based on the specific application you are working on. For example I can imagine that an application dealing with bank accounts and transferring money between them the probability of two transactions concurrently hitting the same account is pretty low. In other applications where a lot of transactions modify the same global state the chances of conflicts are much higher.

This makes it seem that Java's locking approach isn't so bad. Well,
it's bad that I have to identify the objects to lock, but it's good
that it doesn't waste cycles doing work that will just be thrown away.

There's a reason concurrent programming is notoriously hard in most languages, because it takes a lot of effort and skill to get right. Between having to correctly identify which objects need to be locked and trying to avoid deadlocks dealing with explicit locks can be pretty messy and dangerous. That doesn't mean Java's approach is bad, after all the internals of Clojure are implemented using Java locks. But explicit management of locks is often too low level and unnecessarily complex, and Clojure provides a higher level way of dealing with concurrency that makes it easier and safer to work with most of the time.

--
Cosmin Stejerean
http://offbytwo.com

Mark Volkmann

unread,

Mar 23, 2009, 3:36:43 PM3/23/09

to clo...@googlegroups.com

I don't think the same thing happens. In the case of Clojure, both
transaction A and B start. Suppose A finishes first and commits. Then
transaction B retries, finishes and commits. That's what I was
referring to as non-useful work. I'm not saying it's the wrong
approach, but it is different.

>> Of course there is a chance that the transaction contains some
>> conditional logic that makes it so the Refs to be accessed aren't
>> always the same, but my speculation is that that's are rare
>> occurrence. It's probably more typical that a transaction always
>> accesses the same set of Refs every time it executes.
>
> Which Refs your transactions modify will depend heavily based on the
> specific application you are working on. For example I can imagine that an
> application dealing with bank accounts and transferring money between them
> the probability of two transactions concurrently hitting the same account is
> pretty low. In other applications where a lot of transactions modify the
> same global state the chances of conflicts are much higher.

Agreed.

>> This makes it seem that Java's locking approach isn't so bad. Well,
>> it's bad that I have to identify the objects to lock, but it's good
>> that it doesn't waste cycles doing work that will just be thrown away.
>
> There's a reason concurrent programming is notoriously hard in most
> languages, because it takes a lot of effort and skill to get right. Between
> having to correctly identify which objects need to be locked and trying to
> avoid deadlocks dealing with explicit locks can be pretty messy and
> dangerous. That doesn't mean Java's approach is bad, after all the internals
> of Clojure are implemented using Java locks. But explicit management of
> locks is often too low level and unnecessarily complex, and Clojure provides
> a higher level way of dealing with concurrency that makes it easier and
> safer to work with most of the time.

I agree that Clojure makes the programming much easier, but is there a
downside? Maybe the downside is performance. If I found out that a
particular transaction was commonly being retried many times, is that
a sign that I need to write the code differently? How would I find out
that was happening? I know I could insert my own code to track that,
but it seems like this may be a commonly needed tool for Clojure to
detect excessive conflicts/retries in transactions. Maybe we could set
a special variable like *track-retries* that would cause Clojure to
produce a text file that describes all the transaction retries that
occurred in a particular run of an application. If such a tool isn't
needed or wouldn't be useful, I'd like to understand why.

Raoul Duke

unread,

Mar 23, 2009, 3:41:18 PM3/23/09

to clo...@googlegroups.com

> that was happening? I know I could insert my own code to track that,
> but it seems like this may be a commonly needed tool for Clojure to
> detect excessive conflicts/retries in transactions. Maybe we could set
> a special variable like *track-retries* that would cause Clojure to
> produce a text file that describes all the transaction retries that
> occurred in a particular run of an application. If such a tool isn't
> needed or wouldn't be useful, I'd like to understand why.

given what i've heard about Azul running threaded Java code (Clojure
might be different, of course!), i think there are insufficient
guarantees to make such tools useful. running the same threaded Java
code on a different machine has a not-insignificant chance of having
quite different scheduling.

sincerely.

Cosmin Stejerean

unread,

Mar 23, 2009, 3:49:55 PM3/23/09

to clo...@googlegroups.com

On Mon, Mar 23, 2009 at 2:36 PM, Mark Volkmann <r.mark....@gmail.com> wrote:

> In the case where two transactions need to modify the same Ref they
> definitely to be serialized, either by explicitly using locks in Java, or by
> letting Clojure automatically retry one of them. In either case it about the
> same thing happens. Transaction A starts and finishes, then Transaction B
> starts and finishes.

I don't think the same thing happens. In the case of Clojure, both
transaction A and B start. Suppose A finishes first and commits. Then
transaction B retries, finishes and commits. That's what I was
referring to as non-useful work. I'm not saying it's the wrong
approach, but it is different.

The fact that B is tried once concurrently with A, and is then aborted and retried is in my opinion the same as transaction B being stuck waiting on a lock while A is being processed, but I can see how trying B concurrently with A the first time might waste more resources, and that perhaps for certain applications it locks might have better performance.

I agree that Clojure makes the programming much easier, but is there a
downside? Maybe the downside is performance. If I found out that a
particular transaction was commonly being retried many times, is that
a sign that I need to write the code differently? How would I find out
that was happening? I know I could insert my own code to track that,
but it seems like this may be a commonly needed tool for Clojure to
detect excessive conflicts/retries in transactions. Maybe we could set
a special variable like *track-retries* that would cause Clojure to
produce a text file that describes all the transaction retries that
occurred in a particular run of an application. If such a tool isn't
needed or wouldn't be useful, I'd like to understand why.

I can imagine how in certain situations a profile mode where Clojure keeps track of transaction retries, and maybe even the reason why they happened might be useful.

Luc Prefontaine

unread,

Mar 23, 2009, 7:27:29 PM3/23/09

to clo...@googlegroups.com

On Mon, 2009-03-23 at 14:49 -0500, Cosmin Stejerean wrote:

The fact that B is tried once concurrently with A, and is then aborted and retried is in my opinion the same as transaction B being stuck waiting on a lock while A is being processed, but I can see how trying B concurrently with A the first time might waste more resources, and that perhaps for certain applications it locks might have better performance.

Coming from years of intensive use of libthread, if I use it as a comparison basis, a mutex waiting for a resource access to be granted essentially spins.... waiting. Your thread is not doing anything useful. Of course the spinning itself consumes CPU cycles...

Assuming you need to lock several resources, you may wait a significant amount of time getting each mutex on
each resource AND the more resources you have to lock (the more complex is your transaction),
the greater your dead lock chances are as concurrency increases.

If you hit a deadlock situation in a concurrent application written in the usual libthread model, what are your options ?
Abort or ... retry on behalf of the user/event/whatever started the transaction.

Now would you rather do this yourself or have a model that takes care of that for you ?

Guess what, I prefer someone else to deal with this transparently

Clojure has a big advantages over the classical libthread model:

a) It makes everything immutable by default. You do not have to care about the whole universe, just about the
      stuff your transaction alters. Less bugs to discover, simpler STM implementation.
      Other languages look like single-legged runners...

b) It's it's clear in the code which resources are part of a transaction and it allows you finer control on them. Changing stuff
    is a lot simpler. Compared that with changing the granularity level of your locks when using libthread or even in Java...

c) You are isolated from the code that deals with contention, dead locks, restarts, ... The implementation can change
    to accommodate different environmental conditions while keeping your code sane.

You have two choices, the empirical approach which I went through and with some experience you eventually learn the pitfalls
OR you use a "canned" approach which removes from you all these concerns.

Which one will be more efficient with the hardware ? These days we have much more powerful hardware than in the 1980/1990s.
In these times we had to be very careful about resource usage and code was hand crafted very carefully.
We had 2/4/6 processors SMP machines and we were already facing concurrency problems already with the libthread model.

The absence of a simple concurrency model had a severe impact for years on our ability to spit out concurrent applications.
Too complex for the average programmer, hard to test, ...
The disproportion of the number of concurrent applications versus the single threaded ones did not allow
us to learn much about scaling to environments with hundreds of CPUs. It's been done for some mathematical models and other
bleeding edge applications but these are in very narrow application domains.

What if STM was "waisting" cycles given the hardware power we have today ? What if each transaction had to be retried twice ?
What if for some application concurrency situations there was a glass ceiling that was preventing us from scaling above 100 CPUs ?
Would that be a hindrance if concurrent applications become common because of a simpler model ?

I do not think so, the implementations could help with these problems. You should have the ability to tune the concurrency outside
of your application. Better implementations will be created over time.
It's a bit like garbage collectors, they have existed for a long time and have been improved over several iterations.

Having a simpler model is essential before anything else can be done.

I can imagine how in certain situations a profile mode where Clojure keeps track of transaction retries, and maybe even the reason why they happened might be useful.

That kind of tool should become as common as today's profiler over time.

Frankly, I would not be too much concerned with the efficiency of STM today and all the discussions you can read about it
on the Web. Without real complex concurrent application experience written with STM and Clojure, speculations about behaviours
are as reliable as your daily horoscope.

Clojure is a key factor here, I do not think you can compare other STM applications written in other languages
with Clojure equivalents. It's looks much more complex to implement and use STM in other languages because of the
lack of the default immutability rule that Clojure provides.

I apologize to those who believe very hard in their daily horoscope :))))

Luc

--
Cosmin Stejerean
http://offbytwo.com

Luc Préfontaine

Armageddon was yesterday, today we have a real problem...

Mark Volkmann

unread,

Mar 24, 2009, 7:43:12 AM3/24/09

to clo...@googlegroups.com

As was described in a "Software Engineering Radio" podcast on
transactional memory, there are many parallels between garbage
collection and transactional memory. Some parallels that aren't in
place yet though include:

1) There are tools to measure the impact of GC on runs of an
application, but we don't yet have similar tools for TM.

2) The operation of GC can be tuned through command-line options, but
no such options are available yet for TM.

I don't know if these are really needed for TM, but I suspect they
would be useful and would at least make people feel more comfortable
about building large applications using TM.

I believe everything you say about programming with STM being much
easier than programming with locks. My concern is only about
performance measurement and tuning, either through options or code
refactoring.

--

cliffc

unread,

Mar 24, 2009, 8:56:57 AM3/24/09

to Clojure

Some generic STM bits of wisdom:

- STMs in "standard" languages (e.g. C, C++) suffer from having to
track all memory references. THis overhead means the STM code
typically starts 3x to 10x slower than non-STM code, which is a pretty
stiff penalty to recover from. If you're only getting a 3x speedup on
your 4-core machine, but you're starting 3x down then you're not
ahead.
- Abort/retry rates can commonly exceed 90% (e.g. 1-in-10 attempts
succeeds)
- There are very few languages w/real TM support, so "good" TM writing
styles are still being figured out
- THere are very few TM implementations, so "good" STM implementations
are still being figured out - even the "good" feature set isn't really
well known.

- Clojure dodges several of these bullets (no need to track
everything; already paying a high abstraction penalty (commonly), so
the STM slowdown isn't really interestingly large (e.g. it's 3x over
some base speed, not 3x on top of Clojure)

- I DO have a Clojure program where the retry/fail rate approaches
99.99%
- There are NOT good tools to figure out why TM's fail - if you have a
high fail rate it's often hard to figure out even that!

Cliff

Bradbev

unread,

Mar 24, 2009, 2:27:47 PM3/24/09

to Clojure

On Mar 24, 5:56 am, cliffc <cli...@acm.org> wrote:
> Some generic STM bits of wisdom:
>
> - STMs in "standard" languages (e.g. C, C++) suffer from having to
> track all memory references. THis overhead means the STM code
> typically starts 3x to 10x slower than non-STM code, which is a pretty
> stiff penalty to recover from. If you're only getting a 3x speedup on
> your 4-core machine, but you're starting 3x down then you're not
> ahead.
> - Abort/retry rates can commonly exceed 90% (e.g. 1-in-10 attempts
> succeeds)
> - There are very few languages w/real TM support, so "good" TM writing
> styles are still being figured out
> - THere are very few TM implementations, so "good" STM implementations
> are still being figured out - even the "good" feature set isn't really
> well known.
>
> - Clojure dodges several of these bullets (no need to track
> everything; already paying a high abstraction penalty (commonly), so
> the STM slowdown isn't really interestingly large (e.g. it's 3x over
> some base speed, not 3x on top of Clojure)

I don't think I quite understand the abstraction penalty here. Are
you saying that Clojure is on the order of 3x slower than Java & the
STM contributes little to that? Of the overhead, which parts do you
think are the most significant (ie, dynamic typing, forced
immutability, etc)?

>
> - I DO have a Clojure program where the retry/fail rate approaches
> 99.99%

Would you mind describing the scenario a little for this? I guess it
must be a highly contended transaction, but is the transaction large?
Is it possible to optimize it, or are you at the stage where 99% retry
is the best you can get?

I also wanted to comment on the inefficiencies of retrying a
transaction: If you have more hardware threads than logical threads,
retrying is probably free (memory contention might make the whole
system slow down though, so maybe not). If you have more logical
threads than hardware threads, then the work you lose from having to
retry may not be free - if you were to block instead of retry another
logical thread would be allowed to run. I can imagine if I were
tuning transactions I'd like the ability to mark a transaction as
exclusive - ie if I had a heavily contended large transaction & lots
of logical threads that were wanting to run, it might be better to
explicitly lock that guy, or perhaps only allow a couple of active
threads in the transaction.

Cheers,
Brad

Tom Davies

unread,

Mar 24, 2009, 6:31:22 PM3/24/09

to Clojure

I think Cliff may mean that each STM-participating memory access is 3x
slower, but in Clojure, because of immutability, a very small
proportion of accesses are Ref accesses, whereas in C/C++ *all* memory
accesses would be.

I agree that there's lots of room for adding instrumentation to
Clojure's STM, and having done some STM in Haskell I'd like to see
'retry' and 'orElse' available too.

'retry' says "The prerequisites for running this tx haven't been met,
so retry -- but wait until at least one of the Refs I've read so far
has changed"
'orElse t1 t2' attempts t1, and if it retries doesn't retry it, but
runs t2 instead -- if t2 retries t1 is tried again and so on.

Howard Lewis Ship

unread,

Mar 24, 2009, 6:40:15 PM3/24/09

to clo...@googlegroups.com

A relevant question is: what is the relative cost of locking and
blocking (in the pure Java approach) vs. the cost of retrying (in the
Clojure/STM approach).

I don't want to go out on a limb, having not looked at the Clojure STM
implementation. However, I would bet that the costs are roughly equal.
Even if Clojure was 50% slower, or 100% slower, the knowlege that you
can spin up a large number of threads and not worry about deadlocks is
ultimately more valuable.

On Mon, Mar 23, 2009 at 12:36 PM, Mark Volkmann

--
Howard M. Lewis Ship

Creator Apache Tapestry and Apache HiveMind

Dan

unread,

Mar 24, 2009, 7:49:36 PM3/24/09

to clo...@googlegroups.com

I don't want to go out on a limb, having not looked at the Clojure STM
implementation. However, I would bet that the costs are roughly equal.
Even if Clojure was 50% slower, or 100% slower, the knowlege that you
can spin up a large number of threads and not worry about deadlocks is
ultimately more valuable.

It really depends on the problem. For instance in a Python program I wrote, a genetic algorithm was doing some evolution as fast as it could. The algorithm was heavily optimized, things that profiled slow were converted to C though Cython.

I had a GUI that would display the current best individual every 5 seconds (not every generation because there is a rendering cost). To keep the GUI responsive, the evolution and the GUI would each run in their own thread.

Only 2 locks were required to implement this. A stop (lock protected) variable would be set to True if the user through the GUI requested the evolution to stop. The evolution thread would look at this variable before computing the next generation (it would be unsafe to stop the evolution in an half-finished state). A best (also lock protected) variable would be set to the best individual after each new generation was computed.

A profiling run reveals that a whooping 25% of the execution time is spent acquiring locks! Had I coded this in Clojure, there would not even be a transaction! best would be asynchronously set and read and stop would be useless as I could easily discard an half-finished computation of immutable data.

I didn't convert the code yet to Clojure yet but it seems obvious that Clojure's concurrency model would have been much faster.

Christian Vest Hansen

unread,

Mar 24, 2009, 7:50:40 PM3/24/09

to clo...@googlegroups.com

On Tue, Mar 24, 2009 at 11:40 PM, Howard Lewis Ship <hls...@gmail.com> wrote:
>
> A relevant question is: what is the relative cost of locking and
> blocking (in the pure Java approach) vs. the cost of retrying (in the
> Clojure/STM approach).

I can't think, off the top of my head, of a fair way to compare the
two. You can have a single global lock, a lock per object, lock
striping. Likewise, a transaction can touch many or few refs and the
changes can be cheap or expensive to calculate versus the cost of
committing a transaction.

If runtime performance is that important, then I think a carefully
crafted lock scheme is the winner. However, this is weighted against
the dificulty of implementing it correctly; without dead-locks, and
without undersynchronized windows. Also, another problem with locking
is that, depending on the problem, it may just be too difficult to do
correctly with the skill that is available to a given project.

--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

cliffc

unread,

Mar 25, 2009, 10:57:01 AM3/25/09

to Clojure

A bunch of separate real concerns here:

---------------
On the issue of STM Slowdown:

- Clojure is typically slower than Java by an interesting amount when
doing Plain Olde Java Stuff, unless great care is taken (it's possible
to take great care). Such a slowdown matters less than the ability to
quickly produce functioning code in many cases.
- STM'd C/C++ code is *typically* slower than normal code by 3x to 10x
(depends on the STM) because of the need to track *all* variables.
- Clojure does not need to track all variables, just REFs - so it's
"STM slowdown" factor is probably 10x reduced from the typical C/C++
slowdown. This coupled with a typically lower base speed means
Clojure XTNs don't slow down Clojure programs just for existing.

---------------
On the issue of 99.9% retry:

Clojure's STM suffers from being "obstruction free" and not "lock
free" (I think; I haven't paid real close attention to the details).
This means that conflicting XTN's can obstruct one another when making
committing. This problem is typical of STM's, BTW.

Basically, I wrote a tiny microbenchmark with endless conflicting
transactions. As I add *real CPUs* (not just threads), total
throughput *drops*. With a locking implementation, total throughput
would drop some, then hit some lower bound and stay there. Note that
this is *total* throughput, not just throughput-per-CPU or throughput-
per-thread. It's a performance inversion problem: adding more CPUs
hurts! Until this is fixed, multi-threaded performance on Clojure
will remain fragile.

The problem is generic to STMs - so really I'm saying that STMs, in
general, have a more fragile performance profile than locks and this
is one of the reasons that STMs have had problems getting traction in
the Real World.

Now, just to be a pessimist I'll point out that the
java.util.concurrent Locks are *also* using obstruction-free
algorithms and when an Azul box cuts loose with a few hundred threads
there's *always* a CPU spare to "obstruct" somebody, so we hit live-
lock real quick. Given 32-way and 64-way X86 boxes rapidly
approaching I'm hopeful Doug Lea will finally have to do something
about this.

Cliff

Reply all

Reply to author

Forward