One could as generically argue that systems with manual locking are
usually broken, and therefore their behavior itself, never mind their
performance, is unpredictable. That's been my experience with manual
locking in the hands of mere mortal developers. It can be as difficult
to understand the behavior of any single manually-concurrent program
as to understand the performance characteristics of an STM. In the
latter case, at least the STM is handling correctness for you, and all
users of the same STM can share knowledge (and bug fixes and
performance improvements), vs each application being its own Gordian
knot. And in any case in which the STM is failing you performance-
wise, you can opt out and use manual locks outside of transactions. To
the extent you can use it, STM can drastically reduce the complexity
of a system.
I'm wary of unifying the discussion of concurrency with one about
performance, as the problems of concurrency start with correctness,
where the manual locking story, due to its complexity, is quite bad.
Scalability is not a universal problem - some systems need to handle
thousands of simultaneous connections - most don't, some need to
leverage hundreds of CPUs in a single application - most don't. But
every application that wants to leverage multiple cores needs to be
correct.
It would be no problem to instrument the Clojure STM references with
fault counters that would allow one to know the exact per-reference
contention level. Once known, the answers in both cases are similar -
share less mutable state, don't have both long and short-lived
transactions contend for the same data, check your success conditions
ASAP etc.
STM designs differ from one another quite a bit, so any general
statements are difficult to assess. I think the level of granularity
of the STM comes into play. Most of the STM papers I've read
concentrate on creating lots of transactional cells with which they
envision constructing concurrent data structures like the ones in
java.util.concurrent. I don't think that's a good idea at all.
Clojure encourages the use of immutable persistent data structures,
which need no locking or coordination whatsoever in order to perform
correctly in a concurrent situation, and STM references to support the
appearance of identity-preserving change. It also encourages the use
of java.util.concurrent and the like for caches and work queues - the
best tools for the job.
As far as I know, Clojure is the first STM that uses snapshot MVCC,
avoiding both read logging and transaction restarts due to read
invalidations. Clojure's STM also gives you the ability to 'ensure' a
reference you later intend to read or write, thus providing some
manual control over resource acquisition order (rather than just
relying on the side effect of access). It also supports explicit
commute, further reducing retries for commutative writes. I'll not
claim its performance characteristics are at all proven, but its
contention due to, and for, reading is lower than in programs that use
manual locking, and lower than in STMs in which write transactions can
cause read transactions to retry.
Once you get to record-level STM granularity, like Clojure's, it's
also a bit easier to draw parallels to the performance characteristics
of the database systems it mimics, which have been doing similar
things for decades.
I don't consider STM performance any more or less of a research
problem than determining the correctness of programs that do manual
locking - there's still work to do.
And of course, neither STM, nor any other mechanism, is a panacea.
Rich
On May 23, 2:13 pm, cliffc <cli...@acm.org> wrote:
> You got it - under load, performance is unpredictable.
> With locks, you can see which locks are loaded, generally figure out
> why, and plan a strategy (stripe the lock, shorter hold time, switch
> to the j.u.concurrent datastructures, etc).
> Not so (at least not yet) with STM. I've talked to a few people
> who've tried STM out in a larger scale than the usual academic papers
> - and the results are downright disappointing. Unless you become an
> expert in the STM runtime you're using (where "expert" means "not just
> grok it, but can build & tweak it") anytime performance is not good
> enough - you get stuck. This is an open research problem at best,
> with no clear indication that the problem can be solved at all.
> Cliff
> On May 23, 11:07 am, "Raoul Duke" <rao...@gmail.com> wrote:
> > > Are explicit locking designs free from such anomalies? I wouldn't think
> > > so.
> > paraphrase: the behaviour under load gets weird and tricky.
> > well, in some ways maybe that doesn't happen with locks: the question
> > of (depending on which STM system we're looking at) figuring out what
> > caused all the rollbacks vs. with locks, you can at least generally
> > quickly see that a given lock has 90% of all threads waiting on it
> > kind of thing. whereas you don't know what variable in the venn
> > diagram of overlapping transactions caused the retry?
> > tho i believe Rich previously said that Clojure would actually have a
> > way of finding that out!http://groups.google.com/group/clojure/browse_thread/thread/2f73b80fd...