Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion STM criticism from Azul Systems
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
cliffc  
View profile  
 More options May 27 2008, 12:17 pm
From: cliffc <cli...@acm.org>
Date: Tue, 27 May 2008 09:17:09 -0700 (PDT)
Local: Tues, May 27 2008 12:17 pm
Subject: Re: STM criticism from Azul Systems
On May 25, 9:31 am, Phil Jordan <li...@philjordan.eu> wrote:

> I'm new to STM, I've stumbled into it after doing some
> explicit/low-level lock-free programming and systems that are
> synchronised classically with mutex/semaphore-based locking. I
> especially don't know what goes on under the hood in Clojure's STM
> implementation, or how powerful the JVM is in terms of memory guarantees.

> I'm chipping in out of interest and to improve my understanding,

Let's see if I can do you some justice...

>  From personal experience, this is far from the truth in complex
> systems. Deadlocks happening only on "in the wild" systems, appearing in
> the form of heisenbugs, etc. Not fun at all. There's too much in the way
> of implicit contracts going on, which blows up in your face if you're
> trying to extend undocumented software that was written by someone who
> left the company before you arrived.

Yup.  So the deadlock happened.  Ouch.

> (and we all know the "well then document it" approach just doesn't happen in practice)

Yup, but You could make a difference.  HotSpot probably has the
highest level of 'asserts' per line of code (and a pretty darned high
level of comments per line of code) of anything Out There.  Docs are
cheaper than debugging.  But it's an aside.... back to deadlocks-in-
practice:

> Maybe it's just the implementations I've used (pthreads, Win32, OpenMP) and others give you higher-level diagnostics, but at the level I was
> working it could get very painful.

> > You get a crash dump, crawl the stacks, discover the lock cycle & reorganize.
> > Sure the situation could be better, but deadlocks are a 'crash once' kind of bug.

> You need a reasonable amount of infrastructure in place for that,

Crash dump?  Core file?  'ctrl-backslash' to a JVM to get a thread-
dump?
This stuff is commonly available.
If you don't have a proper tool chain, then Job One for you should be
to go get one - and I'm very serious here.  Drop everything else until
you get a functional 'path' (protocol, implementation, business
process, etc) that allows you do debug a crash from a customer in the
field.  You might need a willing beta-partner for this, hand him a
broken program & let it crash, figure out he needs disk-space to
capture the core & logs, needs training to hit 'ctrl-backslash' when
suspecting a deadlock, etc - plus FTP the files back to Home Base,
plus you need to be able to capture the files per-customer-per-bug
(ala Bugzilla), then decrypt the file (gdb for core, eyeballs or a
little perl for stack-trace logs), etc, etc, etc, ....  No Rocket
Science, but annoying bits of Engineering along with some Customer
Management.

> plus you're relying on absence of evidence rather than proof that it can't deadlock.

Err.... No.

I'm not shooting for perfection here; I'm shooting for "good enough to
ship".  Which means that if - In Practice - each possible deadlock
happens once, then gets fixed - that's almost always Good Enough.
Maybe not to run a nuclear reactor, but definitely good enough to run
large businesses.  In practice, most possible deadlocks never happen -
but a few that do require several go-'rounds before getting fixed
properly.  And then the deadlock is fixed, and remains fixed.

> > Dev's don't like 'em, but they don't lose sleep over them either.

> The people who lose sleep over software quality are probably the kind who try to avoid complex locking schemes like the plague in the first place. :)

Yup... and those folks are generally stuck anyways.  But if they are
thinking about the problem ahead of time they are going to be way way
ahead in the long run.

> My understanding is that this is exactly the kind of situation where STM
> excels: you wrap the two add calls in a transaction rather than making
> them individually atomic. The way Clojure handles this (I've been
> spending 99.9% of my time in Clojure on non-threaded things, so I could
> easily have missed something) is that your _money would be a ref, and
> any attempt at modifying it will fail unless you're in a transaction.
> Wrapping the 'add' around the transaction would be the anti-pattern
> here, you want to make the 'transfer' a transaction.
> Okay, you kind of lost me with what you're trying to say here.

Sigh - we mentally missed here.

Trivial examples are.... trivial.  They can be fixed in a thousand
obvious ways.  We need to extrapolate to the non-trivial, because
that's the only place where the STM-vs-Locks argument becomes
interesting.  So lets pretend that instead of a single 'Ref _money'
and two classes, I've got 500 Ref's and a million lines of code - ala
HotSpot (Sun's JVM).  Easily >500 shared concurrent variables, about
750KLOC.  About ~100 unique locks (some are classes of striped locks)
guarding them in very complex locking patterns.  Now replace all those
locks with an STM & 'atomic'.  Is my program any more correct?  Not
really....

...I might have avoided some potential deadlocks (HotSpot uses lock
ranking asserts to avoid deadlock; deadlock rarely happens at the
engineers desk and maybe once/year in the field across all HS users).
The set of deadlocks-in-the-field avoided was miniscule.  I'll concede
that HotSpot is a rarely-well-engineered piece of concurrent code, and
that deadlocks-in-the-field appear to happen more often to other large
programs.  But still, fixing after the fact is reasonable when the
deadlock rate is so low and each fix 'sticks'.

Instead of deadlock, HS crashes far far more often because the locks
don't cover the right set of changes.  Switching out the locks for an
STM didn't change what I was guarding; it only removed any fine-
grained-lock errors (admittedly useful... but only incrementally more
so than avoiding deadlocks).  I'm still stuck with a program that's
too Big to see where the proper atomic/STM/locking boundaries need to
be.  In a trivial example I can say "go up one call level and 'atomic'
there", but in the Real Program - I can't do that.  Go up how many
layers and add 'atomic'?  1 layer?  10 layers?  100 layers?  Yes, I
see unique call-stacks with >100 layers.  I can't put 'atomic' around
'main' because that makes my program single-threaded.

Here's where I want some kind of compiler support.  'Ref' helps -
because it at least demands I have an 'atomic' around the 'Ref'.  But
'Ref' is sufficient, because a syntactally correct program simply
wraps an 'atomic' around each Ref - exact what my trivial example
did.  I'd like to be able to specify classes & groupings of Refs that
have to be wrapped in 'atomic' - that the compiler will complain
about.  Yes I'm still responsible for the semantics - I have to tell
the compiler which groupings of Refs are interesting - but I'd like
some kind of automatic support, so that as my program goes from 10
lines to 10MLOC I can be told "you touched both Ref
Person._checking._money and Ref Person._savings._money without
wrapping both Ref accesses in a single atomic, thats a datarace
error".

> ~phil

Cliff

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.