On Sat, May 24, 2008 at 8:00 AM, Rich Hickey <richhic
...@gmail.com> wrote:
> One could as generically argue that systems with manual locking are
> usually broken, and therefore their behavior itself, never mind their
> performance, is unpredictable. That's been my experience with manual
> locking in the hands of mere mortal developers. It can be as difficult
> to understand the behavior of any single manually-concurrent program
> as to understand the performance characteristics of an STM. In the
> latter case, at least the STM is handling correctness for you, and all
> users of the same STM can share knowledge (and bug fixes and
> performance improvements), vs each application being its own Gordian
> knot. And in any case in which the STM is failing you performance-
> wise, you can opt out and use manual locks outside of transactions. To
> the extent you can use it, STM can drastically reduce the complexity
> of a system.
> I'm wary of unifying the discussion of concurrency with one about
> performance, as the problems of concurrency start with correctness,
> where the manual locking story, due to its complexity, is quite bad.
> Scalability is not a universal problem - some systems need to handle
> thousands of simultaneous connections - most don't, some need to
> leverage hundreds of CPUs in a single application - most don't. But
> every application that wants to leverage multiple cores needs to be
> correct.
> It would be no problem to instrument the Clojure STM references with
> fault counters that would allow one to know the exact per-reference
> contention level. Once known, the answers in both cases are similar -
> share less mutable state, don't have both long and short-lived
> transactions contend for the same data, check your success conditions
> ASAP etc.
> STM designs differ from one another quite a bit, so any general
> statements are difficult to assess. I think the level of granularity
> of the STM comes into play. Most of the STM papers I've read
> concentrate on creating lots of transactional cells with which they
> envision constructing concurrent data structures like the ones in
> java.util.concurrent. I don't think that's a good idea at all.
> Clojure encourages the use of immutable persistent data structures,
> which need no locking or coordination whatsoever in order to perform
> correctly in a concurrent situation, and STM references to support the
> appearance of identity-preserving change. It also encourages the use
> of java.util.concurrent and the like for caches and work queues - the
> best tools for the job.
> As far as I know, Clojure is the first STM that uses snapshot MVCC,
> avoiding both read logging and transaction restarts due to read
> invalidations. Clojure's STM also gives you the ability to 'ensure' a
> reference you later intend to read or write, thus providing some
> manual control over resource acquisition order (rather than just
> relying on the side effect of access). It also supports explicit
> commute, further reducing retries for commutative writes. I'll not
> claim its performance characteristics are at all proven, but its
> contention due to, and for, reading is lower than in programs that use
> manual locking, and lower than in STMs in which write transactions can
> cause read transactions to retry.
> Once you get to record-level STM granularity, like Clojure's, it's
> also a bit easier to draw parallels to the performance characteristics
> of the database systems it mimics, which have been doing similar
> things for decades.
> I don't consider STM performance any more or less of a research
> problem than determining the correctness of programs that do manual
> locking - there's still work to do.
> And of course, neither STM, nor any other mechanism, is a panacea.
> Rich
> On May 23, 2:13 pm, cliffc <cli...@acm.org> wrote:
> > You got it - under load, performance is unpredictable.
> > With locks, you can see which locks are loaded, generally figure out
> > why, and plan a strategy (stripe the lock, shorter hold time, switch
> > to the j.u.concurrent datastructures, etc).
> > Not so (at least not yet) with STM. I've talked to a few people
> > who've tried STM out in a larger scale than the usual academic papers
> > - and the results are downright disappointing. Unless you become an
> > expert in the STM runtime you're using (where "expert" means "not just
> > grok it, but can build & tweak it") anytime performance is not good
> > enough - you get stuck. This is an open research problem at best,
> > with no clear indication that the problem can be solved at all.
> > Cliff
> > On May 23, 11:07 am, "Raoul Duke" <rao...@gmail.com> wrote:
> > > > Are explicit locking designs free from such anomalies? I wouldn't
> think
> > > > so.
> > > paraphrase: the behaviour under load gets weird and tricky.
> > > well, in some ways maybe that doesn't happen with locks: the question
> > > of (depending on which STM system we're looking at) figuring out what
> > > caused all the rollbacks vs. with locks, you can at least generally
> > > quickly see that a given lock has 90% of all threads waiting on it
> > > kind of thing. whereas you don't know what variable in the venn
> > > diagram of overlapping transactions caused the retry?
> > > tho i believe Rich previously said that Clojure would actually have a
> > > way of finding that out!
> http://groups.google.com/group/clojure/browse_thread/thread/2f73b80fd...