STM criticism from Azul Systems

927 views
Skip to first unread message

MikeM

unread,
May 23, 2008, 1:14:32โ€ฏPM5/23/08
to Clojure
I found the following comment on an Azul Systems blog (http://
blogs.azulsystems.com/cliff/2008/05/javaone.html) :

>Performance in an STM is not transparent: it requires a runtime to do abort/retry, and as the >runtimes get fancy & complex the behavior under load of STM's gets.... weird and >unpredictable. Not something you can put into a production environment.

Any comments or counter-examples to this?

Randall R Schulz

unread,
May 23, 2008, 1:29:07โ€ฏPM5/23/08
to clo...@googlegroups.com

Are explicit locking designs free from such anomalies? I wouldn't think
so.


Randall Schulz

Raoul Duke

unread,
May 23, 2008, 2:07:27โ€ฏPM5/23/08
to clo...@googlegroups.com
> Are explicit locking designs free from such anomalies? I wouldn't think
> so.

paraphrase: the behaviour under load gets weird and tricky.

well, in some ways maybe that doesn't happen with locks: the question
of (depending on which STM system we're looking at) figuring out what
caused all the rollbacks vs. with locks, you can at least generally
quickly see that a given lock has 90% of all threads waiting on it
kind of thing. whereas you don't know what variable in the venn
diagram of overlapping transactions caused the retry?

tho i believe Rich previously said that Clojure would actually have a
way of finding that out!
http://groups.google.com/group/clojure/browse_thread/thread/2f73b80fdd91705a

cliffc

unread,
May 23, 2008, 2:13:05โ€ฏPM5/23/08
to Clojure
You got it - under load, performance is unpredictable.

With locks, you can see which locks are loaded, generally figure out
why, and plan a strategy (stripe the lock, shorter hold time, switch
to the j.u.concurrent datastructures, etc).

Not so (at least not yet) with STM. I've talked to a few people
who've tried STM out in a larger scale than the usual academic papers
- and the results are downright disappointing. Unless you become an
expert in the STM runtime you're using (where "expert" means "not just
grok it, but can build & tweak it") anytime performance is not good
enough - you get stuck. This is an open research problem at best,
with no clear indication that the problem can be solved at all.

Cliff
> way of finding that out!http://groups.google.com/group/clojure/browse_thread/thread/2f73b80fd...

Rich Hickey

unread,
May 23, 2008, 6:00:35โ€ฏPM5/23/08
to Clojure
One could as generically argue that systems with manual locking are
usually broken, and therefore their behavior itself, never mind their
performance, is unpredictable. That's been my experience with manual
locking in the hands of mere mortal developers. It can be as difficult
to understand the behavior of any single manually-concurrent program
as to understand the performance characteristics of an STM. In the
latter case, at least the STM is handling correctness for you, and all
users of the same STM can share knowledge (and bug fixes and
performance improvements), vs each application being its own Gordian
knot. And in any case in which the STM is failing you performance-
wise, you can opt out and use manual locks outside of transactions. To
the extent you can use it, STM can drastically reduce the complexity
of a system.

I'm wary of unifying the discussion of concurrency with one about
performance, as the problems of concurrency start with correctness,
where the manual locking story, due to its complexity, is quite bad.
Scalability is not a universal problem - some systems need to handle
thousands of simultaneous connections - most don't, some need to
leverage hundreds of CPUs in a single application - most don't. But
every application that wants to leverage multiple cores needs to be
correct.

It would be no problem to instrument the Clojure STM references with
fault counters that would allow one to know the exact per-reference
contention level. Once known, the answers in both cases are similar -
share less mutable state, don't have both long and short-lived
transactions contend for the same data, check your success conditions
ASAP etc.

STM designs differ from one another quite a bit, so any general
statements are difficult to assess. I think the level of granularity
of the STM comes into play. Most of the STM papers I've read
concentrate on creating lots of transactional cells with which they
envision constructing concurrent data structures like the ones in
java.util.concurrent. I don't think that's a good idea at all.

Clojure encourages the use of immutable persistent data structures,
which need no locking or coordination whatsoever in order to perform
correctly in a concurrent situation, and STM references to support the
appearance of identity-preserving change. It also encourages the use
of java.util.concurrent and the like for caches and work queues - the
best tools for the job.

As far as I know, Clojure is the first STM that uses snapshot MVCC,
avoiding both read logging and transaction restarts due to read
invalidations. Clojure's STM also gives you the ability to 'ensure' a
reference you later intend to read or write, thus providing some
manual control over resource acquisition order (rather than just
relying on the side effect of access). It also supports explicit
commute, further reducing retries for commutative writes. I'll not
claim its performance characteristics are at all proven, but its
contention due to, and for, reading is lower than in programs that use
manual locking, and lower than in STMs in which write transactions can
cause read transactions to retry.

Once you get to record-level STM granularity, like Clojure's, it's
also a bit easier to draw parallels to the performance characteristics
of the database systems it mimics, which have been doing similar
things for decades.

I don't consider STM performance any more or less of a research
problem than determining the correctness of programs that do manual
locking - there's still work to do.

And of course, neither STM, nor any other mechanism, is a panacea.

Rich

Brett Morgan

unread,
May 23, 2008, 7:30:55โ€ฏPM5/23/08
to clo...@googlegroups.com
Given that I am a mere developer, actually with MT i'd consider myself a rank newbie, the most important thing to me is visibility. I want to be able to see what problems I am creating with my design, so that I can change my design, and see how the problems change. In effect, I am looking for an environment that teaches me what I am doing wrong, if only by showing me which references are hotly contended.

In keeping with that, it would probably be helpful for both refs and transactions to be namable, such that debugging output from the STM runtime can actually be easily inspected. The reason I say this is that the data i need to work over is non uniform, so I need to be able to label the hot reference to know which particular chunks of my data set are causing issues.

thoughts?
--

Brett Morgan http://brett.morgan.googlepages.com/

cliffc

unread,
May 24, 2008, 8:11:11โ€ฏAM5/24/08
to Clojure
The problem with manual locking - and it applies equally well to
transactions - is that the there's no name-space-control over what
needs to be locked/transacted. The failure people have with locks
(and limited experience with transactions) is that they can't properly
name their shared variables. They *think* they can, until we start
looking hard, and then we keep finding more... and more that need to
be locked/transacted atomically. In short, people don't know where to
put the lock/unlock or where to bound the transactional region.

Having said that, locks have some advantages over transactions:
transparent performance being one of them.
Programs are busted if the locks are wrong (or transaction regions
don't cover all the variables they need to; Clojure isn't immune here)
AND they are busted if performance sucks. For 2-4 way machines the
issue is probably moot; for 8-32 way machines (standard servers these
days) - it matters. In a year every el-cheapo linux blade server out
there will be looking at 16-64 cpus. And of course for Azul, 100+
active cpus is the norm. I claim that in a year, most people will
either care (get bit by lack of scaling), or see that their need to
care within another year.

Requiring shared variables to have a special syntax, ala Ref, is a big
step forward here for both locks & transactions. You're reducing the
complexity of the problem via name-space control (only Ref's are
shared) and promoting immutable shared state. The STM issue is
orthogonal.

As evidence of that, suppose you replace your STM implementation with
a trivial single global lock. Are the programs more or less complex
than their locking counterparts? The program is as-correct and as-
complex as before (before you swap the STM implementation), and indeed
is exactly equal to the locking solution using a single global lock.
Suppose I can get a correct program using STM - then I also have a
correct program using a single global lock. STM only varies from
locking as I try to make the implementation more performant: for the
STM I trust the language implementor to "stripe" the locks under the
hood. For locks, I start using more than one lock. In both cases I
can also try to shrink the lock-hold-time (or reduce the size of the
STM region), or switch algorithms.

Summary: STM doesn't solve the Big Problem (lack of name-space
control). Ref's might solve the Big Problem - and solving the Big
Problem will be a big help to both locks & STM. Locks have
transparent performance and are thus easy to optimize. Optimizing
without good name-space control is a buggy process for both locks &
STM. I don't know how to optimize an STM implementation without
becoming an expert in the particulars of the STM runtime (and there
are so many to choose from!)

Cliff

Rich Hickey

unread,
May 24, 2008, 10:51:15โ€ฏAM5/24/08
to Clojure
I agree that identifying the things that can change is one key issue,
and Clojure's Refs, Agents, and Vars all do that. But I think a large
number of your arguments reflect a lack of understanding of how (at
least Clojure's) STM works.

Refs are not merely markers, they do enforcement. There is simply no
mutating a ref outside of a transaction - it's not an allowed mistake.
Any attempt to change a ref outside a transaction throws an exception.
So, there is no such thing as "transaction regions don't cover all the
variables they need to" in Clojure.

Your thought exercise with a single lock is a good one. Let's say
Clojure's implementation of STM used a single global lock (in fact it
has no global locks at all), and contrast it with a manual locking
program that used a single lock for everything.

Which is more correct? The Clojure one, because it is not subject to
undetected 'forgetting to obtain the lock'. Your assertion that
"Suppose I can get a correct program using STM - then I also have a
correct program using a single global lock" implies nothing about
programs written with a single manual lock where no STM system is
helping ensure correctness. All Clojure STM programs (that don't throw
Ref mutation out-of-transaction exceptions) are correct with a global
lock, but not all programs using a global lock (that don't throw
exceptions) are correct STM (or otherwise) programs.

Which one is has better performance? Constant factors aside,
potentially the Clojure one, because readers do not wait for writers
in MVCC.

As we add locks, performance of both approaches improves, but a new
source of errors comes into play for the manual approach - lock
acquisition order and the resultant deadlock risk. At this point STM
becomes dramatically less complex than manual locking.

I'll grant that it is likely that, in the hands of experts, it is
possible to write a correct program with a custom crafted locking
strategy that will outperform any general-purpose STM strategy. But I
still contend that, for any moderately complex program, that strategy
will approach the complexity of an STM, and that most people will get
it wrong in subtle, intractable ways.

Analogies to databases are telling. DBMS's took over locking long ago
and most people were happy to have them do so. Ditto memory management
and GC. In each case, people claimed superior control and performance
in the manual solution, but the cost and complexity proved too high
for most applications.

So, mutable entity identification is a problem, but it is a subproblem
of correctness enforcement. It is likely that, given well-identified
sharable entities, some form of static correctness analysis might be
brought to bear on the manual locking case, and I'm sure people are
working on that. STMs can provide correctness enforcement today, and
therefore the two are not equivalent. Any opacity in STM operation or
performance is an area for improvement, not necessarily a reason for
avoidance.

Rich

cliffc

unread,
May 25, 2008, 11:04:09โ€ฏAM5/25/08
to Clojure
On May 24, 10:51 am, Rich Hickey <richhic...@gmail.com> wrote:
> I agree that identifying the things that can change is one key issue,
> and Clojure's Refs, Agents, and Vars all do that. But I think a large
> number of your arguments reflect a lack of understanding of how (at
> least Clojure's) STM works.
>
> Refs are not merely markers, they do enforcement. There is simply no
> mutating a ref outside of a transaction - it's not an allowed mistake.
> Any attempt to change a ref outside a transaction throws an exception.
> So, there is no such thing as "transaction regions don't cover all the
> variables they need to" in Clojure.

Alas - there still is. :-(
Example down below, and this is my primary complaint against both
locks & STM.
However, your other arguments are more quickly answered - mostly
because I agree with you.

>
> Your thought exercise with a single lock is a good one. Let's say
> Clojure's implementation of STM used a single global lock (in fact it
> has no global locks at all), and contrast it with a manual locking
> program that used a single lock for everything.
>
> Which is more correct? The Clojure one, because it is not subject to
> undetected 'forgetting to obtain the lock'. Your assertion that
> "Suppose I can get a correct program using STM - then I also have a
> correct program using a single global lock" implies nothing about
> programs written with a single manual lock where no STM system is
> helping ensure correctness. All Clojure STM programs (that don't throw
> Ref mutation out-of-transaction exceptions) are correct with a global
> lock, but not all programs using a global lock (that don't throw
> exceptions) are correct STM (or otherwise) programs.

Good one. I agree. Clojure wins this round.


> Which one is has better performance? Constant factors aside,
> potentially the Clojure one, because readers do not wait for writers
> in MVCC.

Not really relevant; nobody writes the 'single global lock' version,
including Clojure.
It was just a thought exercise.


> As we add locks, performance of both approaches improves, but a new
> source of errors comes into play for the manual approach - lock
> acquisition order and the resultant deadlock risk. At this point STM
> becomes dramatically less complex than manual locking.

Not interesting *in practice*. Because in practice, deadlocks are
easy to find & fix.
You get a crash dump, crawl the stacks, discover the lock cycle &
reorganize.
Sure the situation could be better, but deadlocks are a 'crash once'
kind of bug.
Dev's don't like 'em, but they don't lose sleep over them either.


> I'll grant that it is likely that, in the hands of experts, it is
> possible to write a correct program with a custom crafted locking
> strategy that will outperform any general-purpose STM strategy. But I
> still contend that, for any moderately complex program, that strategy
> will approach the complexity of an STM, and that most people will get
> it wrong in subtle, intractable ways.
>
> Analogies to databases are telling. DBMS's took over locking long ago
> and most people were happy to have them do so. Ditto memory management
> and GC. In each case, people claimed superior control and performance
> in the manual solution, but the cost and complexity proved too high
> for most applications.

Actually, I was thinking of the compiler-vs-ASM analogy that took
place in the 60's & 70's. But point well taken: I agree that we need
*some* kind of support here. Maybe in the very long run STM runtimes
get better & STM does win. Meanwhile life sucks. If you can only
solve the problem with STM's I'll point out below then I'll agree with
you, and maybe even try my hand at an STM implementation.

>
> So, mutable entity identification is a problem, but it is a subproblem
> of correctness enforcement. It is likely that, given well-identified
> sharable entities, some form of static correctness analysis might be
> brought to bear on the manual locking case, and I'm sure people are
> working on that. STMs can provide correctness enforcement today, and
> therefore the two are not equivalent. Any opacity in STM operation or
> performance is an area for improvement, not necessarily a reason for
> avoidance.
>
> Rich

Ok, the long sought after counter example.: STM's do not compose w/
correctness.
Bad Java syntax follows, because I don't 'speak' Clojure. I'm using
the classic example.

class Account {
long _money;
add( long m ) { atomic { _money = _money + m; } }
}

transfer( Account a1, Account a2, long m ) {
a1.add( m);
a2.add(-m);
}

While 'add' alone is STM'd & safe, the two calls to 'add' that need to
be wrapped in an 'atomic' are not - so there is a race where the
accounts do not balance. Deciding at what level to add the next
'atomic' (should it be around 'transfer' or at a higher level or not
at all?) demands the programmer have domain knowledge not already
encapsulated in the Account class. THIS is the source of vast numbers
of bugs.

Less trivial examples quickly spiral out of control. Sure I know
Ref's A, B & C are all Ref's and thus only updated in 'dosync' blocks
- but can I split any collection of Ref updates into more than one
atomic region? The ad-absurdum answer is to wrap a dosync around
*all* Ref's - which means the whole program, and that means a single-
threaded program. Imagine the program unrolled in front of you, as an
endless sequence of actions (all control flow flattened out)...
interspersed with other actions are updates to Ref's. Now: dissect
this list into groups with 'atomic' or 'lock' as you see fit,
preserving the program meaning as another unrolled thread of execution
is interspersed. How do you decide where it's OK to split the stream
and where it's not OK?

Why, in the Grand Scheme of things is: "... { read _money ... write
_money } ... " a correct splitting of the action stream,
while "... { read _money ... write _money } ... { read _money ...
write _money } ..." is broken,
but "... {{ read _money ... write _money } ... { read _money ... write
_money }} ..." is correct again?


Cliff
> ...
>
> read more ยป

Phil Jordan

unread,
May 25, 2008, 12:31:31โ€ฏPM5/25/08
to clo...@googlegroups.com
I'm new to STM, I've stumbled into it after doing some
explicit/low-level lock-free programming and systems that are
synchronised classically with mutex/semaphore-based locking. I
especially don't know what goes on under the hood in Clojure's STM
implementation, or how powerful the JVM is in terms of memory guarantees.

I'm chipping in out of interest and to improve my understanding,
apologies if I sound like an idiot:

cliffc wrote:
> On May 24, 10:51 am, Rich Hickey <richhic...@gmail.com> wrote:
>> As we add locks, performance of both approaches improves, but a new
>> source of errors comes into play for the manual approach - lock
>> acquisition order and the resultant deadlock risk. At this point STM
>> becomes dramatically less complex than manual locking.
>
> Not interesting *in practice*. Because in practice, deadlocks are
> easy to find & fix.

From personal experience, this is far from the truth in complex
systems. Deadlocks happening only on "in the wild" systems, appearing in
the form of heisenbugs, etc. Not fun at all. There's too much in the way
of implicit contracts going on, which blows up in your face if you're
trying to extend undocumented software that was written by someone who
left the company before you arrived. (and we all know the "well then
document it" approach just doesn't happen in practice)

Maybe it's just the implementations I've used (pthreads, Win32, OpenMP)
and others give you higher-level diagnostics, but at the level I was
working it could get very painful.

> You get a crash dump, crawl the stacks, discover the lock cycle &
> reorganize.
> Sure the situation could be better, but deadlocks are a 'crash once'
> kind of bug.

You need a reasonable amount of infrastructure in place for that, plus
you're relying on absence of evidence rather than proof that it can't
deadlock.

> Dev's don't like 'em, but they don't lose sleep over them either.

The people who lose sleep over software quality are probably the kind
who try to avoid complex locking schemes like the plague in the first
place. :)

> Ok, the long sought after counter example.: STM's do not compose w/
> correctness.
> Bad Java syntax follows, because I don't 'speak' Clojure. I'm using
> the classic example.
>
> class Account {
> long _money;
> add( long m ) { atomic { _money = _money + m; } }
> }
>
> transfer( Account a1, Account a2, long m ) {
> a1.add( m);
> a2.add(-m);
> }
>
> While 'add' alone is STM'd & safe, the two calls to 'add' that need to
> be wrapped in an 'atomic' are not - so there is a race where the
> accounts do not balance. Deciding at what level to add the next
> 'atomic' (should it be around 'transfer' or at a higher level or not
> at all?) demands the programmer have domain knowledge not already
> encapsulated in the Account class. THIS is the source of vast numbers
> of bugs.

My understanding is that this is exactly the kind of situation where STM
excels: you wrap the two add calls in a transaction rather than making
them individually atomic. The way Clojure handles this (I've been
spending 99.9% of my time in Clojure on non-threaded things, so I could
easily have missed something) is that your _money would be a ref, and
any attempt at modifying it will fail unless you're in a transaction.
Wrapping the 'add' around the transaction would be the anti-pattern
here, you want to make the 'transfer' a transaction.

> Less trivial examples quickly spiral out of control. Sure I know
> Ref's A, B & C are all Ref's and thus only updated in 'dosync' blocks
> - but can I split any collection of Ref updates into more than one
> atomic region? The ad-absurdum answer is to wrap a dosync around
> *all* Ref's - which means the whole program, and that means a single-
> threaded program. Imagine the program unrolled in front of you, as an
> endless sequence of actions (all control flow flattened out)...
> interspersed with other actions are updates to Ref's. Now: dissect
> this list into groups with 'atomic' or 'lock' as you see fit,
> preserving the program meaning as another unrolled thread of execution
> is interspersed. How do you decide where it's OK to split the stream
> and where it's not OK?
>
> Why, in the Grand Scheme of things is: "... { read _money ... write
> _money } ... " a correct splitting of the action stream,
> while "... { read _money ... write _money } ... { read _money ...
> write _money } ..." is broken,
> but "... {{ read _money ... write _money } ... { read _money ... write
> _money }} ..." is correct again?

Okay, you kind of lost me with what you're trying to say here.

I get the impression you're mixing up atomic operations on the low level
with a high-level STM implementation like Clojure's. The latter
presumably won't be efficient unless the implementation uses the former,
but my understanding is that the programmer using an STM implementation
is largely supposed to be isolated from such details, as long as he/she
is able to determine what operations should be wrapped in a single
transaction.

~phil

Rich Hickey

unread,
May 25, 2008, 8:47:21โ€ฏPM5/25/08
to Clojure


On May 25, 11:04 am, cliffc <cli...@acm.org> wrote:
> On May 24, 10:51 am, Rich Hickey <richhic...@gmail.com> wrote:

> > As we add locks, performance of both approaches improves, but a new
> > source of errors comes into play for the manual approach - lock
> > acquisition order and the resultant deadlock risk. At this point STM
> > becomes dramatically less complex than manual locking.
>
> Not interesting *in practice*. Because in practice, deadlocks are
> easy to find & fix.

I think you'll find plenty of dissent on that point.

> Actually, I was thinking of the compiler-vs-ASM analogy that took
> place in the 60's & 70's. But point well taken: I agree that we need
> *some* kind of support here. Maybe in the very long run STM runtimes
> get better & STM does win. Meanwhile life sucks. If you can only
> solve the problem with STM's I'll point out below then I'll agree with
> you, and maybe even try my hand at an STM implementation.
>
> Ok, the long sought after counter example.: STM's do not compose w/
> correctness.
> Bad Java syntax follows, because I don't 'speak' Clojure. I'm using
> the classic example.
>
> class Account {
> long _money;
> add( long m ) { atomic { _money = _money + m; } }
>
> }
>
> transfer( Account a1, Account a2, long m ) {
> a1.add( m);
> a2.add(-m);
>
> }
>
> While 'add' alone is STM'd & safe, the two calls to 'add' that need to
> be wrapped in an 'atomic' are not - so there is a race where the
> accounts do not balance. Deciding at what level to add the next
> 'atomic' (should it be around 'transfer' or at a higher level or not
> at all?) demands the programmer have domain knowledge not already
> encapsulated in the Account class. THIS is the source of vast numbers
> of bugs.
>

That's a problem with the Account class and OOP - classes are simply
the wrong granularity for so many program semantics. Yet, programmers
demonstrate they can handle this, and set appropriate transaction
boundaries, because they do so in their database work. There is a
pretty close analogy there, since many DBMSs will treat every
statement in a batch as an independent transaction unless collected in
an explicit transaction. In spite of the same risks, work precisely
like this is getting done correctly, and relatively easily, I think in
no small part due to the simplicity of wrapping a bunch of arbitrary
statements in BEGIN TRANSACTION.

> Less trivial examples quickly spiral out of control. Sure I know
> Ref's A, B & C are all Ref's and thus only updated in 'dosync' blocks
> - but can I split any collection of Ref updates into more than one
> atomic region? The ad-absurdum answer is to wrap a dosync around
> *all* Ref's - which means the whole program, and that means a single-
> threaded program. Imagine the program unrolled in front of you, as an
> endless sequence of actions (all control flow flattened out)...
> interspersed with other actions are updates to Ref's. Now: dissect
> this list into groups with 'atomic' or 'lock' as you see fit,
> preserving the program meaning as another unrolled thread of execution
> is interspersed. How do you decide where it's OK to split the stream
> and where it's not OK?
>
> Why, in the Grand Scheme of things is: "... { read _money ... write
> _money } ... " a correct splitting of the action stream,
> while "... { read _money ... write _money } ... { read _money ...
> write _money } ..." is broken,
> but "... {{ read _money ... write _money } ... { read _money ... write
> _money }} ..." is correct again?
>

What you are asking here is a philosophical question, which computers
may never answer, since the answer depends on the semantics of the
program, and such semantics will probably never be conveyed to the
system more easily than:

(defn transfer [a1 a2 m]
(dosync
(add a1 m)
(sub a2 m)))

or the Java-esque:

transfer( Account a1, Account a2, long m ) {
atomic{
a1.add( m);
a2.add(-m);
}
}

Absent some similar declaration of relatedness, the program will have
to understand itself (or one program another), since while it may look
like debit/credit to us, it just looks like foo/bar to it.

I hope you don't wait for an answer to this (hard AI, potentially
impossible) problem before dipping your toes in STM - it's likely you
could make a significant contribution.

Rich

Raoul Duke

unread,
May 25, 2008, 8:53:26โ€ฏPM5/25/08
to clo...@googlegroups.com
> I hope you don't wait for an answer to this (hard AI, potentially
> impossible) problem before dipping your toes in STM - it's likely you
> could make a significant contribution.

it would be pretty nifty if you two had the chance to get together to
try out & blog about Clojure STM on an Azul box :-)

Rich Hickey

unread,
May 26, 2008, 8:51:15โ€ฏAM5/26/08
to Clojure
That could be fun.

Towards that end, and more generally, it would be nice if there were
metrics for success other than anecdotes from the field. There should
be some meaningful problem statement STM and other solutions could
take aim at, something more real-world than the typical STM paper's
correctness test or Erlang's 'pass messages around in a circle' and
'accept connections until you die' benchmarks, and more attainable for
a new technology than the million-line systems Dr. Click and Brian
Goetz mentioned in their final Java One talk, or Erlang's nine-9s
phone switches. Then everyone could run proposed solutions on their
own 2/16/multi-hundred-processor systems and say - nope, not quite,
this works, this doesn't.

Unfortunately, describing such a problem in an architecture-neutral
way is difficult. I think there can be a certain presumption that any
concurrency solution should allow people to architect systems much as
they do now, as a graph of directly-connected mutable stateful
objects. Dropping STM (and many of the other concurrency mechanisms)
into such an architecture is likely to forever disappoint. IMO, if
you're not serious about immutability, you're likely to struggle with
concurrency.

On my side, points were taken re: transparency and control for STMs.
The Clojure STM's current architecture is certainly capable of both
better reporting and control.
So I'll be working on adding the necessary diagnostic counters, and
exposing the many aspects of the Clojure STM that could become 'knobs'
for performance tuning - timeout windows, retry counts, history cache
sizes etc. So, I have some work to do before I'd like that kind of
scrutiny :)

Rich

cliffc

unread,
May 27, 2008, 10:57:14โ€ฏAM5/27/08
to Clojure
On May 26, 5:51 am, Rich Hickey <richhic...@gmail.com> wrote:
> On May 25, 8:53 pm, "Raoul Duke" <rao...@gmail.com> wrote:
>
> > it would be pretty nifty if you two had the chance to get together to try out & blog about Clojure STM on an Azul box :-)
>
> That could be fun.

Send me an email; Azul hands out 'academic' accounts all the time.
You get a remote login; you upload your code & run it.

Cliff

cliffc

unread,
May 27, 2008, 11:43:32โ€ฏAM5/27/08
to Clojure
On May 26, 5:51 am, Rich Hickey <richhic...@gmail.com> wrote:
>
> On my side, points were taken re: transparency and control for STMs.
> The Clojure STM's current architecture is certainly capable of both
> better reporting and control.
> So I'll be working on adding the necessary diagnostic counters, and
> exposing the many aspects of the Clojure STM that could become 'knobs'
> for performance tuning - timeout windows, retry counts, history cache
> sizes etc. So, I have some work to do before I'd like that kind of
> scrutiny :)
>
> Rich

I'm going to be a sourpuss again here. :-(
This is exactly the trap MPI fell into; and *you* have to do it
anyways. Double-unsmiley. :-( :-(

Here's the deal:
I write a Large Complex Program, one that Really Needs STM to get it
right.
But performance sucks.
So I do a deep dive into the STM runtime, and discover it has warts.
So I hack my code to work around the warts in the STM.
Crap like: at an average of 5 cpus in this atomic block the STM
'works', but at an average of 7 cpus in the same atomic block I get a
continous fail/retry rate that's so bad I might as well not bother.
So I guard the STM area with a "5-at-a-time" lock and a queue of
threads waiting to enter. Bleah (been there; done that - for a DB not
an STM but same-in-priniciple situation). A thousand thousand
variations of the same crap happens, each requiring a different hack
to my code to make it performant.
Meanwhile the STM author (You: Rich) hacks out some warts & hands me a
beta version.
I hack my code to match the STM's new behavior, and discover some new
warts.

Back & Forth we go - and suddenly: my app's "performance correctness"
is intimately tied to the exact STM implementation. Any change to the
STM's behavior kills my performance - and you, Rich, have learned a
lot about the building of a robust good STM. You (now) know the
mistakes you made and know it's time to restart the STM from scratch.
*My* options limited:
- Scream at you to Not Change A Thing. Thus C/C++/Clojure Standards
Committees are born.
- Cling tenaciously to the abandoned version, and recognize my code is
now abandon'd ware. No new features or support for me. But maybe the
App is running fine and I can forget about it.
- Rewrite my app from scratch Yet Again, to match the new STM's warts/
features.

MPI is in this position. Every large parallel scientific App Out
There is intimately tied to the marriage of parallelizing_compiler +
MPI_runtime + computer_archicture. Changing any one of those pieces
requires the app to be rewritten from scratch in order to achieve a
fraction of the required performance.

The parallelizing compilers have 'stablized' in a performance way.
There's a coding style which will auto-vectorize/stripe/etc and
there's some codes "on the edge" (some compilers yes, some no), and
there's a well known "don't go here, the compiler can't hack it". The
compiler support for STM is very much lacking and/or in-flux. Your
heading into this terrain now, as you add Clojure compiler support to
your STM. Me, as an STM user, can't know what's in store for me as
you go through the learning process. I *know* that I *don't know*
what STM coding styles will be performant and what will not.

The MPI_runtime has now stabelized (I believe, not sure) after 20+
years. Again there's the "this is good, this is marginal, this is
bad" folk wisdom that drives coding. Again, for STM's, this area is
very much in flux. Go read some of the bazillion academic papers Out
There. Everybody's got their own take, every STM is good in this
domain & bad in some other domain - and all the domains are different;
god forbid I write a large program dependent on STM 'A' and later try
to switch over to STM 'B'.

The computer_arch *has* been stable for message-passing for some
time. For STM, I believe it's trying to ramp-up. i.e., the
computer_arch for STM is "all the world's an X86" *right now*, and all
hardware vendors are furiously studying what it would take to add some
kind of STM/HTM/hybrid support.

Thus, for STM to make it to the 'Promised Land' - the STM industry
needs to figure out:
- what belongs in the compiler
- what belongs in the runtime
- where there are Dragons and where there are Green Pastures
- teach the STM users which paths lead to Dragons or Green Pastures.

If we BOTH don't go through the excercise, then STM will never hit the
promised land.

MPI never really "made it"; the 'Green Pasture' area was too small,
and it was always surrounded by steep performance cliffs leading to
Dragons when you slipped off the edge. Upgrade your hardware: rewrite
your app. Upgrade your compiler: 2 months of performance debugging.
Upgrade your MPI: 2 months of performance debugging to discover you
need to rewrite your app.

GC "made it"; the Green Pasture gradually got bigger & bigger; Dragons
remain lurking - but only after you filled up an ever-growing Green
Pasture and needed to poke at the edges of stability.

And, if you don't mind, I'm going to edit this entire thread for
clarity and echo it on my blog.
This is good stuff (this conversation), and I definitely wish you the
best of luck.

Cliff

cliffc

unread,
May 27, 2008, 12:17:09โ€ฏPM5/27/08
to Clojure
On May 25, 9:31 am, Phil Jordan <li...@philjordan.eu> wrote:
> I'm new to STM, I've stumbled into it after doing some
> explicit/low-level lock-free programming and systems that are
> synchronised classically with mutex/semaphore-based locking. I
> especially don't know what goes on under the hood in Clojure's STM
> implementation, or how powerful the JVM is in terms of memory guarantees.
>
> I'm chipping in out of interest and to improve my understanding,

Let's see if I can do you some justice...


> From personal experience, this is far from the truth in complex
> systems. Deadlocks happening only on "in the wild" systems, appearing in
> the form of heisenbugs, etc. Not fun at all. There's too much in the way
> of implicit contracts going on, which blows up in your face if you're
> trying to extend undocumented software that was written by someone who
> left the company before you arrived.

Yup. So the deadlock happened. Ouch.


> (and we all know the "well then document it" approach just doesn't happen in practice)

Yup, but You could make a difference. HotSpot probably has the
highest level of 'asserts' per line of code (and a pretty darned high
level of comments per line of code) of anything Out There. Docs are
cheaper than debugging. But it's an aside.... back to deadlocks-in-
practice:


> Maybe it's just the implementations I've used (pthreads, Win32, OpenMP) and others give you higher-level diagnostics, but at the level I was
> working it could get very painful.
>
> > You get a crash dump, crawl the stacks, discover the lock cycle & reorganize.
> > Sure the situation could be better, but deadlocks are a 'crash once' kind of bug.
>
> You need a reasonable amount of infrastructure in place for that,

Crash dump? Core file? 'ctrl-backslash' to a JVM to get a thread-
dump?
This stuff is commonly available.
If you don't have a proper tool chain, then Job One for you should be
to go get one - and I'm very serious here. Drop everything else until
you get a functional 'path' (protocol, implementation, business
process, etc) that allows you do debug a crash from a customer in the
field. You might need a willing beta-partner for this, hand him a
broken program & let it crash, figure out he needs disk-space to
capture the core & logs, needs training to hit 'ctrl-backslash' when
suspecting a deadlock, etc - plus FTP the files back to Home Base,
plus you need to be able to capture the files per-customer-per-bug
(ala Bugzilla), then decrypt the file (gdb for core, eyeballs or a
little perl for stack-trace logs), etc, etc, etc, .... No Rocket
Science, but annoying bits of Engineering along with some Customer
Management.


> plus you're relying on absence of evidence rather than proof that it can't deadlock.

Err.... No.

I'm not shooting for perfection here; I'm shooting for "good enough to
ship". Which means that if - In Practice - each possible deadlock
happens once, then gets fixed - that's almost always Good Enough.
Maybe not to run a nuclear reactor, but definitely good enough to run
large businesses. In practice, most possible deadlocks never happen -
but a few that do require several go-'rounds before getting fixed
properly. And then the deadlock is fixed, and remains fixed.

>
> > Dev's don't like 'em, but they don't lose sleep over them either.
>
> The people who lose sleep over software quality are probably the kind who try to avoid complex locking schemes like the plague in the first place. :)

Yup... and those folks are generally stuck anyways. But if they are
thinking about the problem ahead of time they are going to be way way
ahead in the long run.


> My understanding is that this is exactly the kind of situation where STM
> excels: you wrap the two add calls in a transaction rather than making
> them individually atomic. The way Clojure handles this (I've been
> spending 99.9% of my time in Clojure on non-threaded things, so I could
> easily have missed something) is that your _money would be a ref, and
> any attempt at modifying it will fail unless you're in a transaction.
> Wrapping the 'add' around the transaction would be the anti-pattern
> here, you want to make the 'transfer' a transaction.

> Okay, you kind of lost me with what you're trying to say here.

Sigh - we mentally missed here.

Trivial examples are.... trivial. They can be fixed in a thousand
obvious ways. We need to extrapolate to the non-trivial, because
that's the only place where the STM-vs-Locks argument becomes
interesting. So lets pretend that instead of a single 'Ref _money'
and two classes, I've got 500 Ref's and a million lines of code - ala
HotSpot (Sun's JVM). Easily >500 shared concurrent variables, about
750KLOC. About ~100 unique locks (some are classes of striped locks)
guarding them in very complex locking patterns. Now replace all those
locks with an STM & 'atomic'. Is my program any more correct? Not
really....

...I might have avoided some potential deadlocks (HotSpot uses lock
ranking asserts to avoid deadlock; deadlock rarely happens at the
engineers desk and maybe once/year in the field across all HS users).
The set of deadlocks-in-the-field avoided was miniscule. I'll concede
that HotSpot is a rarely-well-engineered piece of concurrent code, and
that deadlocks-in-the-field appear to happen more often to other large
programs. But still, fixing after the fact is reasonable when the
deadlock rate is so low and each fix 'sticks'.

Instead of deadlock, HS crashes far far more often because the locks
don't cover the right set of changes. Switching out the locks for an
STM didn't change what I was guarding; it only removed any fine-
grained-lock errors (admittedly useful... but only incrementally more
so than avoiding deadlocks). I'm still stuck with a program that's
too Big to see where the proper atomic/STM/locking boundaries need to
be. In a trivial example I can say "go up one call level and 'atomic'
there", but in the Real Program - I can't do that. Go up how many
layers and add 'atomic'? 1 layer? 10 layers? 100 layers? Yes, I
see unique call-stacks with >100 layers. I can't put 'atomic' around
'main' because that makes my program single-threaded.

Here's where I want some kind of compiler support. 'Ref' helps -
because it at least demands I have an 'atomic' around the 'Ref'. But
'Ref' is sufficient, because a syntactally correct program simply
wraps an 'atomic' around each Ref - exact what my trivial example
did. I'd like to be able to specify classes & groupings of Refs that
have to be wrapped in 'atomic' - that the compiler will complain
about. Yes I'm still responsible for the semantics - I have to tell
the compiler which groupings of Refs are interesting - but I'd like
some kind of automatic support, so that as my program goes from 10
lines to 10MLOC I can be told "you touched both Ref
Person._checking._money and Ref Person._savings._money without
wrapping both Ref accesses in a single atomic, thats a datarace
error".

> ~phil

Cliff

Raoul Duke

unread,
May 27, 2008, 12:46:56โ€ฏPM5/27/08
to clo...@googlegroups.com
> happens once, then gets fixed - that's almost always Good Enough.

http://dslab.epfl.ch/pubs/dimmunix-algo/

Phil Jordan

unread,
May 27, 2008, 1:33:20โ€ฏPM5/27/08
to clo...@googlegroups.com
cliffc wrote:
> On May 25, 9:31 am, Phil Jordan <li...@philjordan.eu> wrote:
>> I'm new to STM, I've stumbled into it after doing some
>> explicit/low-level lock-free programming and systems that are
>> synchronised classically with mutex/semaphore-based locking. I
>> especially don't know what goes on under the hood in Clojure's STM
>> implementation, or how powerful the JVM is in terms of memory guarantees.
>>
>> I'm chipping in out of interest and to improve my understanding,
>
> Let's see if I can do you some justice...

Thanks

>> (and we all know the "well then document it" approach just doesn't happen in practice)
>
> Yup, but You could make a difference. HotSpot probably has the
> highest level of 'asserts' per line of code (and a pretty darned high
> level of comments per line of code) of anything Out There. Docs are
> cheaper than debugging. But it's an aside.... back to deadlocks-in-
> practice:

Yeah, as far as I can tell, the concurrency infrastructure of the
JVM/Java is quite robust.

>> Maybe it's just the implementations I've used (pthreads, Win32, OpenMP) and others give you higher-level diagnostics, but at the level I was
>> working it could get very painful.
>>
>>> You get a crash dump, crawl the stacks, discover the lock cycle & reorganize.
>>> Sure the situation could be better, but deadlocks are a 'crash once' kind of bug.
>> You need a reasonable amount of infrastructure in place for that,
>
> Crash dump? Core file? 'ctrl-backslash' to a JVM to get a thread-
> dump?
> This stuff is commonly available.
> If you don't have a proper tool chain, then Job One for you should be
> to go get one - and I'm very serious here. Drop everything else until
> you get a functional 'path' (protocol, implementation, business
> process, etc) that allows you do debug a crash from a customer in the
> field. You might need a willing beta-partner for this, hand him a
> broken program & let it crash, figure out he needs disk-space to
> capture the core & logs, needs training to hit 'ctrl-backslash' when
> suspecting a deadlock, etc - plus FTP the files back to Home Base,
> plus you need to be able to capture the files per-customer-per-bug
> (ala Bugzilla), then decrypt the file (gdb for core, eyeballs or a
> little perl for stack-trace logs), etc, etc, etc, .... No Rocket
> Science, but annoying bits of Engineering along with some Customer
> Management.

I absolutely agree with you there, and yeah, some environments make this
a lot easier than others. 'Environments' can be in the context of either
programming or work environment of course...

I guess most of my "I've been bitten by this too often" cases in
concurrency have been those where the environment has been unhelpful in
some way, and I've mostly been debugging other people's code.

The stuff I'm doing right now isn't actually using any concurrency
features yet, although it probably will at some point, so I'm interested
in learning how to do it right when I do get round to introduce it.

>> plus you're relying on absence of evidence rather than proof that it can't deadlock.
>
> Err.... No.
>
> I'm not shooting for perfection here; I'm shooting for "good enough to
> ship". Which means that if - In Practice - each possible deadlock
> happens once, then gets fixed - that's almost always Good Enough.
> Maybe not to run a nuclear reactor, but definitely good enough to run
> large businesses. In practice, most possible deadlocks never happen -
> but a few that do require several go-'rounds before getting fixed
> properly. And then the deadlock is fixed, and remains fixed.

This is true, there's a reason proper scientific program verification
isn't exactly mainstream. Although at with an STM system you probably
ought to stand a better chance of getting it right than with explicit
locking. The question then becomes what you're giving up for it in
return for that safety. Which is what this thread is all about, right.

>> Okay, you kind of lost me with what you're trying to say here.
>
> Sigh - we mentally missed here.
>
> Trivial examples are.... trivial. They can be fixed in a thousand
> obvious ways. We need to extrapolate to the non-trivial, because
> that's the only place where the STM-vs-Locks argument becomes
> interesting.

Agreed.

Okay, that makes a lot more sense. So really, what you want is a way to
formalise, say, invariants, and detect violations as early as possible.
I think the Clojure STM has a couple of training wheels hinting at this
kind of thing already. We would probably also want some kinds of other
"ref relationship" primitives, because invariants are awful at
describing most things. Is there maybe something comparable in database
systems?

I wonder, though, how well this can work. When laying down the rules
comes close to being as much work as describing what to do within those
rules, the temptation will always be not to formalise any rules. Complex
systems will probably breed complex rules, so how do you make the
specification of the rules straight-forward?

If it's possible to pull this off effectively, maybe it could even aid
the performance issue. Given some kind of relationship graph of the
various parts of mutable data, the STM runtime could attempt to arrange
and represent things optimally in memory. You know, the "more
information, more scope for optimisation" argument that never really
worked when it was applied to high-level languages. ;)

~phil

Raoul Duke

unread,
May 27, 2008, 1:42:50โ€ฏPM5/27/08
to clo...@googlegroups.com
> Complex
> systems will probably breed complex rules, so how do you make the
> specification of the rules straight-forward?

all things have a range of options; there are some approaches (e.g.
'skeletons', i think) where you have a very limited way to do
concurrency, the supposed advantage being that it is much harder to do
something broken in the long run. they won't let you go 1000 mph so to
speak, but maybe they help prevent you from going 1000 mpg into a
concrete wall, so to speak. http://www.youtube.com/watch?v=AB4IEa7jTJw

Rich Hickey

unread,
May 27, 2008, 1:56:06โ€ฏPM5/27/08
to Clojure


On May 27, 11:43 am, cliffc <cli...@acm.org> wrote:

> And, if you don't mind, I'm going to edit this entire thread for
> clarity and echo it on my blog.

Sure.

> This is good stuff (this conversation), and I definitely wish you the
> best of luck.
>

I agree, and thanks.

Rich
Reply all
Reply to author
Forward
0 new messages