Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion STM criticism from Azul Systems
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
cliffc  
View profile  
 More options May 24 2008, 8:11 am
From: cliffc <cli...@acm.org>
Date: Sat, 24 May 2008 05:11:11 -0700 (PDT)
Local: Sat, May 24 2008 8:11 am
Subject: Re: STM criticism from Azul Systems
The problem with manual locking - and it applies equally well to
transactions - is that the there's no name-space-control over what
needs to be locked/transacted.  The failure people have with locks
(and limited experience with transactions) is that they can't properly
name their shared variables.  They *think* they can, until we start
looking hard, and then we keep finding more... and more that need to
be locked/transacted atomically.  In short, people don't know where to
put the lock/unlock or where to bound the transactional region.

Having said that, locks have some advantages over transactions:
transparent performance being one of them.
Programs are busted if the locks are wrong (or transaction regions
don't cover all the variables they need to; Clojure isn't immune here)
AND they are busted if performance sucks.  For 2-4 way machines the
issue is probably moot; for 8-32 way machines (standard servers these
days) - it matters.  In a year every el-cheapo linux blade server out
there will be looking at 16-64 cpus.  And of course for Azul, 100+
active cpus is the norm.  I claim that in a year, most people will
either care (get bit by lack of scaling), or see that their need to
care within another year.

Requiring shared variables to have a special syntax, ala Ref, is a big
step forward here for both locks & transactions.  You're reducing the
complexity of the problem via name-space control (only Ref's are
shared) and promoting immutable shared state.  The STM issue is
orthogonal.

As evidence of that, suppose you replace your STM implementation with
a trivial single global lock.  Are the programs more or less complex
than their locking counterparts?   The program is as-correct and as-
complex as before (before you swap the STM implementation), and indeed
is exactly equal to the locking solution using a single global lock.
Suppose I can get a correct program using STM - then I also have a
correct program using a single global lock.  STM only varies from
locking as I try to make the implementation more performant: for the
STM I trust the language implementor to "stripe" the locks under the
hood.  For locks, I start using more than one lock.  In both cases I
can also try to shrink the lock-hold-time (or reduce the size of the
STM region), or switch algorithms.

Summary: STM doesn't solve the Big Problem (lack of name-space
control).  Ref's might solve the Big Problem - and solving the Big
Problem will be a big help to both locks & STM.  Locks have
transparent performance and are thus easy to optimize.  Optimizing
without good name-space control is a buggy process for both locks &
STM.  I don't know how to optimize an STM implementation without
becoming an expert in the particulars of the STM runtime (and there
are so many to choose from!)

Cliff

On May 23, 6:00 pm, Rich Hickey <richhic...@gmail.com> wrote:

> One could as generically argue that systems with manual locking are
> usually broken, and therefore their behavior itself, never mind their
> performance, is unpredictable. That's been my experience with manual
> locking in the hands of mere mortal developers. It can be as difficult
> to understand the behavior of any single manually-concurrent program
> as to understand the performance characteristics of an STM. In the
> latter case, at least the STM is handling correctness for you, and all
> users of the same STM can share knowledge (and bug fixes and
> performance improvements), vs each application being its own Gordian
> knot. And in any case in which the STM is failing you performance-
> wise, you can opt out and use manual locks outside of transactions. To
> the extent you can use it, STM can drastically reduce the complexity
> of a system.

> I'm wary of unifying the discussion of concurrency with one about
> performance, as the problems of concurrency start with correctness,
> where the manual locking story, due to its complexity, is quite bad.
> Scalability is not a universal problem - some systems need to handle
> thousands of simultaneous connections - most don't, some need to
> leverage hundreds of CPUs in a single application - most don't. But
> every application that wants to leverage multiple cores needs to be
> correct.

> It would be no problem to instrument the Clojure STM references with
> fault counters that would allow one to know the exact per-reference
> contention level. Once known, the answers in both cases are similar -
> share less mutable state, don't have both long and short-lived
> transactions contend for the same data, check your success conditions
> ASAP etc.

> STM designs differ from one another quite a bit, so any general
> statements are difficult to assess. I think the level of granularity
> of the STM comes into play. Most of the STM papers I've read
> concentrate on creating lots of transactional cells with which they
> envision constructing concurrent data structures like the ones in
> java.util.concurrent. I don't think that's a good idea at all.

> Clojure encourages the use of immutable persistent data structures,
> which need no locking or coordination whatsoever in order to perform
> correctly in a concurrent situation, and STM references to support the
> appearance of identity-preserving change. It also encourages the use
> of java.util.concurrent and the like for caches and work queues - the
> best tools for the job.

> As far as I know, Clojure is the first STM that uses snapshot MVCC,
> avoiding both read logging and transaction restarts due to read
> invalidations. Clojure's STM also gives you the ability to 'ensure' a
> reference you later intend to read or write, thus providing some
> manual control over resource acquisition order (rather than just
> relying on the side effect of access). It also supports explicit
> commute, further reducing retries for commutative writes. I'll not
> claim its performance characteristics are at all proven, but its
> contention due to, and for, reading is lower than in programs that use
> manual locking, and lower than in STMs in which write transactions can
> cause read transactions to retry.

> Once you get to record-level STM granularity, like Clojure's, it's
> also a bit easier to draw parallels to the performance characteristics
> of the database systems it mimics, which have been doing similar
> things for decades.

> I don't consider STM performance any more or less of a research
> problem than determining the correctness of programs that do manual
> locking - there's still work to do.

> And of course, neither STM, nor any other mechanism, is a panacea.

> Rich

> On May 23, 2:13 pm, cliffc <cli...@acm.org> wrote:

> > You got it - under load, performance is unpredictable.

> > With locks, you can see which locks are loaded, generally figure out
> > why, and plan a strategy (stripe the lock, shorter hold time, switch
> > to the j.u.concurrent datastructures, etc).

> > Not so (at least not yet) with STM.  I've talked to a few people
> > who've tried STM out in a larger scale than the usual academic papers
> > - and the results are downright disappointing.  Unless you become an
> > expert in the STM runtime you're using (where "expert" means "not just
> > grok it, but can build & tweak it") anytime performance is not good
> > enough - you get stuck.  This is an open research problem at best,
> > with no clear indication that the problem can be solved at all.

> > Cliff

> > On May 23, 11:07 am, "Raoul Duke" <rao...@gmail.com> wrote:

> > > > Are explicit locking designs free from such anomalies? I wouldn't think
> > > > so.

> > > paraphrase: the behaviour under load gets weird and tricky.

> > > well, in some ways maybe that doesn't happen with locks: the question
> > > of (depending on which STM system we're looking at) figuring out what
> > > caused all the rollbacks vs. with locks, you can at least generally
> > > quickly see that a given lock has 90% of all threads waiting on it
> > > kind of thing. whereas you don't know what variable in the venn
> > > diagram of overlapping transactions caused the retry?

> > > tho i believe Rich previously said that Clojure would actually have a
> > > way of finding that out!http://groups.google.com/group/clojure/browse_thread/thread/2f73b80fd...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.