and Clojure's Refs, Agents, and Vars all do that. But I think a large
Refs are not merely markers, they do enforcement. There is simply no
mutating a ref outside of a transaction - it's not an allowed mistake.
Any attempt to change a ref outside a transaction throws an exception.
So, there is no such thing as "transaction regions don't cover all the
variables they need to" in Clojure.
Your thought exercise with a single lock is a good one. Let's say
Clojure's implementation of STM used a single global lock (in fact it
has no global locks at all), and contrast it with a manual locking
program that used a single lock for everything.
Analogies to databases are telling. DBMS's took over locking long ago
and most people were happy to have them do so. Ditto memory management
and GC. In each case, people claimed superior control and performance
in the manual solution, but the cost and complexity proved too high
for most applications.
> The problem with manual locking - and it applies equally well to
> transactions - is that the there's no name-space-control over what
> needs to be locked/transacted. The failure people have with locks
> (and limited experience with transactions) is that they can't properly
> name their shared variables. They *think* they can, until we start
> looking hard, and then we keep finding more... and more that need to
> be locked/transacted atomically. In short, people don't know where to
> put the lock/unlock or where to bound the transactional region.
> Having said that, locks have some advantages over transactions:
> transparent performance being one of them.
> Programs are busted if the locks are wrong (or transaction regions
> don't cover all the variables they need to; Clojure isn't immune here)
> AND they are busted if performance sucks. For 2-4 way machines the
> issue is probably moot; for 8-32 way machines (standard servers these
> days) - it matters. In a year every el-cheapo linux blade server out
> there will be looking at 16-64 cpus. And of course for Azul, 100+
> active cpus is the norm. I claim that in a year, most people will
> either care (get bit by lack of scaling), or see that their need to
> care within another year.
> Requiring shared variables to have a special syntax, ala Ref, is a big
> step forward here for both locks & transactions. You're reducing the
> complexity of the problem via name-space control (only Ref's are
> shared) and promoting immutable shared state. The STM issue is
> orthogonal.
> As evidence of that, suppose you replace your STM implementation with
> a trivial single global lock. Are the programs more or less complex
> than their locking counterparts? The program is as-correct and as-
> complex as before (before you swap the STM implementation), and indeed
> is exactly equal to the locking solution using a single global lock.
> Suppose I can get a correct program using STM - then I also have a
> correct program using a single global lock. STM only varies from
> locking as I try to make the implementation more performant: for the
> STM I trust the language implementor to "stripe" the locks under the
> hood. For locks, I start using more than one lock. In both cases I
> can also try to shrink the lock-hold-time (or reduce the size of the
> STM region), or switch algorithms.
> Summary: STM doesn't solve the Big Problem (lack of name-space
> control). Ref's might solve the Big Problem - and solving the Big
> Problem will be a big help to both locks & STM. Locks have
> transparent performance and are thus easy to optimize. Optimizing
> without good name-space control is a buggy process for both locks &
> STM. I don't know how to optimize an STM implementation without
> becoming an expert in the particulars of the STM runtime (and there
> are so many to choose from!)
> Cliff
> On May 23, 6:00 pm, Rich Hickey <richhic...@gmail.com> wrote:
> > One could as generically argue that systems with manual locking are
> > usually broken, and therefore their behavior itself, never mind their
> > performance, is unpredictable. That's been my experience with manual
> > locking in the hands of mere mortal developers. It can be as difficult
> > to understand the behavior of any single manually-concurrent program
> > as to understand the performance characteristics of an STM. In the
> > latter case, at least the STM is handling correctness for you, and all
> > users of the same STM can share knowledge (and bug fixes and
> > performance improvements), vs each application being its own Gordian
> > knot. And in any case in which the STM is failing you performance-
> > wise, you can opt out and use manual locks outside of transactions. To
> > the extent you can use it, STM can drastically reduce the complexity
> > of a system.
> > I'm wary of unifying the discussion of concurrency with one about
> > performance, as the problems of concurrency start with correctness,
> > where the manual locking story, due to its complexity, is quite bad.
> > Scalability is not a universal problem - some systems need to handle
> > thousands of simultaneous connections - most don't, some need to
> > leverage hundreds of CPUs in a single application - most don't. But
> > every application that wants to leverage multiple cores needs to be
> > correct.
> > It would be no problem to instrument the Clojure STM references with
> > fault counters that would allow one to know the exact per-reference
> > contention level. Once known, the answers in both cases are similar -
> > share less mutable state, don't have both long and short-lived
> > transactions contend for the same data, check your success conditions
> > ASAP etc.
> > STM designs differ from one another quite a bit, so any general
> > statements are difficult to assess. I think the level of granularity
> > of the STM comes into play. Most of the STM papers I've read
> > concentrate on creating lots of transactional cells with which they
> > envision constructing concurrent data structures like the ones in
> > java.util.concurrent. I don't think that's a good idea at all.
> > Clojure encourages the use of immutable persistent data structures,
> > which need no locking or coordination whatsoever in order to perform
> > correctly in a concurrent situation, and STM references to support the
> > appearance of identity-preserving change. It also encourages the use
> > of java.util.concurrent and the like for caches and work queues - the
> > best tools for the job.
> > As far as I know, Clojure is the first STM that uses snapshot MVCC,
> > avoiding both read logging and transaction restarts due to read
> > invalidations. Clojure's STM also gives you the ability to 'ensure' a
> > reference you later intend to read or write, thus providing some
> > manual control over resource acquisition order (rather than just
> > relying on the side effect of access). It also supports explicit
> > commute, further reducing retries for commutative writes. I'll not
> > claim its performance characteristics are at all proven, but its
> > contention due to, and for, reading is lower than in programs that use
> > manual locking, and lower than in STMs in which write transactions can
> > cause read transactions to retry.
> > Once you get to record-level STM granularity, like Clojure's, it's
> > also a bit easier to draw parallels to the performance characteristics
> > of the database systems it mimics, which have been doing similar
> > things for decades.
> > I don't consider STM performance any more or less of a research
> > problem than determining the correctness of programs that do manual
> > locking - there's still work to do.
> > And of course, neither STM, nor any other mechanism, is a panacea.
> > Rich
> > On May 23, 2:13 pm, cliffc <cli...@acm.org> wrote:
> > > You got it - under load, performance is unpredictable.
> > > With locks, you can see which locks are loaded, generally figure out
> > > why, and plan a strategy (stripe the lock, shorter hold time, switch
> > > to the j.u.concurrent datastructures, etc).
> > > Not so (at least not yet) with STM. I've talked to a few people
> > > who've tried STM out in a larger scale than the usual academic papers
> > > - and the results are downright disappointing. Unless you become an
> > > expert in the STM runtime you're using (where "expert" means "not just
> > > grok it, but can build & tweak it") anytime performance is not good
> > > enough - you get stuck. This is an open research problem at best,
> > > with no clear indication that the problem can be solved at all.
> > > Cliff
> > > On May 23, 11:07 am, "Raoul Duke" <rao...@gmail.com> wrote:
> > > > > Are explicit locking designs free from such anomalies? I wouldn't think
> > > > > so.
> > > > paraphrase: the behaviour under load gets weird and tricky.
> > > > well, in some ways maybe that doesn't happen with locks: the question
> > > > of (depending on which STM system we're looking at) figuring out what
> > > > caused all the rollbacks vs. with locks, you can at least generally
> > > > quickly see that a given lock has 90% of all threads waiting on it
> > > > kind of thing. whereas you don't know what variable in the venn
> > > > diagram of overlapping transactions caused the retry?
> > > > tho i believe Rich previously said that Clojure would actually have a
> > > > way of finding that out!http://groups.google.com/group/clojure/browse_thread/thread/2f73b80fd...