STM criticisms from Bryan Cantrill (Sun)

149 views
Skip to first unread message

srnm

unread,
Nov 4, 2008, 2:50:14 AM11/4/08
to Clojure
http://blogs.sun.com/bmc/entry/concurrency_s_shysters

I'm new to clojure... and like what I see.

Just passing this blog post on as a discussion point...

Thanks!

Christian Vest Hansen

unread,
Nov 4, 2008, 3:47:51 AM11/4/08
to clo...@googlegroups.com
Just reading through the first CACM article, they raise a number of
issues that I think are more addressed in Clojure than they give
credit to TMs in general based on whatever implementation they were
investigating.

I present my feeble attempt to join the TM debate:

> TM introduces a variety of programming issues that
> are not present in lock-basedmutual exclusion.
> For example, semantics are muddled by:
> * Interaction with non-transactional codes,
> including access to shared data from outside
> of a transaction (tolerating weak atomicity)
> and the use of locks inside a transaction
> (breaking isolation to make locking operations
> visible outside transactions);

The referencing part is at least clearly defined: outside a
transaction, the refs are read-only.

Locking is a different beast. While true that it is a side-effect that
will creep to the outside world, I don't see how a lock inside a
transaction is bad in any way that a lock outside a transaction is
not. (But they might have yet to tell me since still not through with
the paper.)

> * Exceptions and serializability: how to handle
> exceptions and propagate consistent exception
> information from within a transactional context,
> and how to guarantee that transactional execution
> respects a correct ordering of operations;

A consistent snapshot of the world is presented inside a transaction.
Exceptions and serialization will be based on that world view.

I don't get what they mean by correct ordering of operations. Ordering
inside a transaction would matter just as much as it does in a
single-threaded scenario, would it not?

> * Interaction with code that cannot be
> transactionalized, due to either communication
> with other threads or a requirement barring
> speculation;

Just don't do that. Transactions are not a silver bullet. Use an agent instead.

> * Livelock, or the system guarantee that all
> transactions make progress even in the
> presence of conflicts.

True, TMs don't solve this for you. Just like locking doesn't solve
deadlocks for you.

A key assumption of the paper seems to be that TMs are the solution to
all things concurrency. But perhaps this is exactly the myth they are
trying to debunk?

My view is that TMs are another tool in the box, just like CAS, locks,
agents, volatile, thread-locals, immutables, atomics and all the rest
of java.util.concurrent. And I believe that people who think that
magical mechanism to solve all concurrency woes will drop in their
lap, are wrong and will (with high probability) be proven so
(hopefully before their systems go live).

--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Stuart Halloway

unread,
Nov 4, 2008, 5:26:02 AM11/4/08
to clo...@googlegroups.com
Interesting stuff. If you follow through to the linked paper, it seems
to me that many of the issues are much less relevant for Clojure. In
particular, the sections about weak atomicity, privatization, and
memory reclamation (p.42) seem not to be a problem.

On the other hand, interactions with non-TX code and livelock (p.41)
strike me as potential problems.

Bryan Cantrill concludes that "STM is a dog." That may be true, but I
am *sure* that locking is a bitch. :-)

Stuart

Kelsin

unread,
Nov 4, 2008, 6:43:22 AM11/4/08
to Clojure
On the same line as these other comments it seems like the paper picks
on many problems that immutability and functional programming solve
for us. Being a recent viewer of many of the clojure screencasts it
now makes more sense how much emphasis is placed on these two items
instead of the actual STM system when discussing and presenting
clojure's concurrency support.

Chris G

Rich Hickey

unread,
Nov 4, 2008, 8:15:17 AM11/4/08
to Clojure


On Nov 4, 3:47 am, "Christian Vest Hansen" <karmazi...@gmail.com>
wrote:
Exactly my problem with these articles - they sum up to "STM is bad
because it's not a panacea".

I don't actually see many people claiming it is.

Furthermore, STM is not a single thing and more than is GC. There will
be many flavors with different tradeoffs and properties.

> My view is that TMs are another tool in the box, just like CAS, locks,
> agents, volatile, thread-locals, immutables, atomics and all the rest
> of java.util.concurrent.

Precisely.

RIch

J. McConnell

unread,
Nov 4, 2008, 8:46:19 AM11/4/08
to clo...@googlegroups.com
I just started reading through that article a couple days ago and a
couple things jumped out at me right away with respect to Clojure. The
most important, it seems, is that the STM system in Clojure is not the
concurrency hammer for the language, immutable data structures are.
All of the articles I've read in opposition to TM focus on the current
performance overhead that they all tack on, which is fair. (Although
it is clearly still relatively early days in the research and
dismissing TM now because of its current performance characteristics
is just foolish ... more research is necessary.) However, comparing
the performance of a mutable state, lock-based system and a mutable
state, TM-based system has nothing whatsoever to do with Clojure.

If Clojure's TM system tacks on a 20% overhead (note, I just made that
number up), that can pretty safely be ignored. The reason is that you
are only using TM for 20% of your concurrency support, if that. The
immutable data structures do the rest of the work for you with great
performance characteristics. Immutable data structures are the jack
hammer of Clojure's concurrency support. The STM is more like a
ball-peen hammer [1], taking care of the cases where shared state is,
indeed, necessary.

Regards,

- J.

[1] http://en.wikipedia.org/wiki/Ball-peen_hammer

Raoul Duke

unread,
Nov 4, 2008, 12:35:59 PM11/4/08
to clo...@googlegroups.com
> On the other hand, interactions with non-TX code and livelock (p.41)
> strike me as potential problems.

the scary thing to me about the change from a world of possible
deadlock to a work of possible livelock is that the former is *really
easy to see* when it happens. :-)

Christian Vest Hansen

unread,
Nov 4, 2008, 12:51:30 PM11/4/08
to clo...@googlegroups.com

I think this one comes down to the maturity of the implementations.
Deadlocks are only easy to see (in Java land) because we have tools
that make it so; ie. you can send SIGQUIT to the JVM and have all the
relevant information show up on stdout. Similarly, TMs could have some
sort of mechanism by which you could get a list of transactions with
high retry counts, and their most volatile (in the non-java-keyword
sense of the word) references.

Though I don't know much about TMs, I will grant that I have not
actually seen such tools for TMs, so your argument probably still
holds, but I don't think it will hold forever.

Raoul Duke

unread,
Nov 4, 2008, 1:22:10 PM11/4/08
to clo...@googlegroups.com
hi,

> Though I don't know much about TMs, I will grant that I have not
> actually seen such tools for TMs, so your argument probably still
> holds, but I don't think it will hold forever.

it is an interesting thing: the ecosystem of tools and things in the
crappy shared mutable state world is large and advanced, especially if
you have $$$ to throw at the problem (e.g. Azul's tools). which might
make it harder for the new idea to get fast and wide adoption. still,
one can hope that somehow the time + energy + resources somehow get
applied to making TM tools.

sincerely.

p.s. in java you can write a little thread which detects deadlock in
any other threads, so you can see it right away. we have such a thing
in our server code at the moment.

Rich Hickey

unread,
Nov 4, 2008, 1:53:20 PM11/4/08
to Clojure
Once detected, a deadlock can still be a bear to reproduce/debug, and
often does not appear until the worst possible time - production.
What's even more insidious is the memory not accessed under a lock
that should have been, and the ensuing corruption.

Contrast this with STM+refs, where there is no avoiding using
transactions for writes. As far as livelock, if you get it to happen
it is usually due to long-running transactions competing with short
ones, and you can readily see/reproduce it under test loads.

Rich

Krukow

unread,
Nov 5, 2008, 4:08:41 AM11/5/08
to Clojure
I agree. However, in pure Java the locks, volatility, immutability,
atomics, java.util.concurrent are all accessible and can all interact,
e.g., you can create your own synchronizer based on locks and
volatiles, combined java.util.concurrent classes. Is this the case in
Clojure?

In Clojure the locks are not accessible to the user which is generally
a good thing. However the 'toolbox argument' is saying: "OK,
*sometimes* you do want locks," and saying "STM is no silver bullet so
go ahead and use locks in those (rare cases)." Now, I have yet to see
an actual example that the STM is not good enough;^1 however, for this
particular argument to hold we need to consider how to combine these
tools in an actual Clojure application.

So really, I am wondering whether it is possible combine all these
concurrency tools in a Clojure program. For example you can (from pure
Clojure) create a ConcurrentHashMap and hammer on it from different
threads -- works out of the box, no need for locks or Java classes.
But what about volatiles and acquiring locks? Can you have a decent
Clojure program that uses locks in one component, together with STM
and agents in others? What about interaction between such components?

/krukow

^1 Even STM opponent and concurrency guru Cliff Click seems to be
impressed with Clojure ;-)
http://blogs.azulsystems.com/cliff/2008/09/jvm-language-su.html

Comment: [[
I'm not doing Clojure enough justice here. [...] The purely functional
data-structures also look really nice; and the STM scaled surprisingly
well. I definitely need to do some coding with it. [...]

Cliff
]]

cliffc

unread,
Nov 5, 2008, 12:40:32 PM11/5/08
to Clojure


On Nov 4, 10:53 am, Rich Hickey <richhic...@gmail.com> wrote:
> Once detected, a deadlock can still be a bear to reproduce/debug, and
> often does not appear until the worst possible time - production.

So far, my experience (both direct & observed from others) with
deadlocks has been:
- indeed you get nailed in production
- the post-mortem stack traces are good enough to figure out what
happened
- the fix forces you to think through the concurrency aspects of your
code, but if you manage to do that then the bug stays fixed (if you
fail to think it thru, e.g. "I'll just yank this lock" you are
generally in for a world of hurt)
- HotSpot is a ~500KLoC highly concurrent program with about ~100
named unique locks; we use aggressive lock-ranking asserts and in all
the years of hacking HotSpot I've only ever seen 2 deadlocks (there
were lots before the aggressive lock-ranking asserts - and those 2
deadlocks were because some more junior engineer skirted the asserts
rather than fix the potential deadlock).


> What's even more insidious is the memory not accessed under a lock
> that should have been, and the ensuing corruption.

No argument there. #1 bug for Azul (and we train all our SE's to look
for it) is HashMap corruption leading to a closed-cycle linked list,
and threads stuck forever spinning down the infinite list.


> As far as livelock, if you get it to happen
> it is usually due to long-running transactions competing with short
> ones, and you can readily see/reproduce it under test loads.

Here I'll disagree. Clojure uses a particular STM implementation, but
does not dictate at the language level the implementation. Different
STM implementations have wildly different performance characteristics,
and there's a rich academic literature on how bad it gets. "In
practice" (for such little "practice" as there is; Clojure might well
rapidly have the most "practice") STM's indeed "get bad" unless they
are pampered well (i.e. expert tuning). The problem is: what "gets
bad" varies wildly by STM implementations - so fixing the "long
transactions are getting live-locked by short runs" via hacking the
STM will surely performance-break some other program.

Here's where I think the root of my anguish lies: the STM
runtime&performance is opaque, so changing either it OR my program to
get performance is a hit-or-miss affair. At least with locks I can
tell you something about what's going on, and make some kind of
engineering guess as to what hacks will help & why.

Random almost-unrelated correlation: making a long transaction into a
series of short ones for performance looks a whole lot like making
fine-grained locking from coarse-grained locking:
- You are breaking the atomic/locked region into parts
- You wouldn't bother expect performance sucks otherwise
- The danger is in missing a path that needs to be atomic
- Failure will be rare, unpredictable (untestable), unreproducible and
generally only under heavy load (i.e., production).
- Clojure's Immutibility won't save you here; you still must transact
around the correct set of Ref's.


> Rich

Don't get me wrong: I think Clojure is on to something good here -
Immutability IS the Big Hammer For Concurrency here, and the STM is
just one of several flavors of "Big Hammer isn't working so I need
another approach". Given the Immutability-By-Default, STM has a much
better chance of being performant than in other languages so it makes
sense to give it a go.

Cliff

Raoul Duke

unread,
Nov 5, 2008, 1:02:53 PM11/5/08
to clo...@googlegroups.com
> No argument there. #1 bug for Azul (and we train all our SE's to look
> for it) is HashMap corruption leading to a closed-cycle linked list,
> and threads stuck forever spinning down the infinite list.

if you can share, i'd find it very interesting to learn about the top
5 or 10 such bugs.

sincerely.

Reply all
Reply to author
Forward
0 new messages