cliffc wrote:
> On May 25, 9:31 am, Phil Jordan <li
...@philjordan.eu> wrote:
>> I'm new to STM, I've stumbled into it after doing some
>> explicit/low-level lock-free programming and systems that are
>> synchronised classically with mutex/semaphore-based locking. I
>> especially don't know what goes on under the hood in Clojure's STM
>> implementation, or how powerful the JVM is in terms of memory guarantees.
>> I'm chipping in out of interest and to improve my understanding,
> Let's see if I can do you some justice...
Thanks
>> (and we all know the "well then document it" approach just doesn't happen in practice)
> Yup, but You could make a difference. HotSpot probably has the
> highest level of 'asserts' per line of code (and a pretty darned high
> level of comments per line of code) of anything Out There. Docs are
> cheaper than debugging. But it's an aside.... back to deadlocks-in-
> practice:
Yeah, as far as I can tell, the concurrency infrastructure of the
JVM/Java is quite robust.
>> Maybe it's just the implementations I've used (pthreads, Win32, OpenMP) and others give you higher-level diagnostics, but at the level I was
>> working it could get very painful.
>>> You get a crash dump, crawl the stacks, discover the lock cycle & reorganize.
>>> Sure the situation could be better, but deadlocks are a 'crash once' kind of bug.
>> You need a reasonable amount of infrastructure in place for that,
> Crash dump? Core file? 'ctrl-backslash' to a JVM to get a thread-
> dump?
> This stuff is commonly available.
> If you don't have a proper tool chain, then Job One for you should be
> to go get one - and I'm very serious here. Drop everything else until
> you get a functional 'path' (protocol, implementation, business
> process, etc) that allows you do debug a crash from a customer in the
> field. You might need a willing beta-partner for this, hand him a
> broken program & let it crash, figure out he needs disk-space to
> capture the core & logs, needs training to hit 'ctrl-backslash' when
> suspecting a deadlock, etc - plus FTP the files back to Home Base,
> plus you need to be able to capture the files per-customer-per-bug
> (ala Bugzilla), then decrypt the file (gdb for core, eyeballs or a
> little perl for stack-trace logs), etc, etc, etc, .... No Rocket
> Science, but annoying bits of Engineering along with some Customer
> Management.
I absolutely agree with you there, and yeah, some environments make this
a lot easier than others. 'Environments' can be in the context of either
programming or work environment of course...
I guess most of my "I've been bitten by this too often" cases in
concurrency have been those where the environment has been unhelpful in
some way, and I've mostly been debugging other people's code.
The stuff I'm doing right now isn't actually using any concurrency
features yet, although it probably will at some point, so I'm interested
in learning how to do it right when I do get round to introduce it.
>> plus you're relying on absence of evidence rather than proof that it can't deadlock.
> Err.... No.
> I'm not shooting for perfection here; I'm shooting for "good enough to
> ship". Which means that if - In Practice - each possible deadlock
> happens once, then gets fixed - that's almost always Good Enough.
> Maybe not to run a nuclear reactor, but definitely good enough to run
> large businesses. In practice, most possible deadlocks never happen -
> but a few that do require several go-'rounds before getting fixed
> properly. And then the deadlock is fixed, and remains fixed.
This is true, there's a reason proper scientific program verification
isn't exactly mainstream. Although at with an STM system you probably
ought to stand a better chance of getting it right than with explicit
locking. The question then becomes what you're giving up for it in
return for that safety. Which is what this thread is all about, right.
>> Okay, you kind of lost me with what you're trying to say here.
> Sigh - we mentally missed here.
> Trivial examples are.... trivial. They can be fixed in a thousand
> obvious ways. We need to extrapolate to the non-trivial, because
> that's the only place where the STM-vs-Locks argument becomes
> interesting.
Agreed.
> So lets pretend that instead of a single 'Ref _money'
> and two classes, I've got 500 Ref's and a million lines of code - ala
> HotSpot (Sun's JVM). Easily >500 shared concurrent variables, about
> 750KLOC. About ~100 unique locks (some are classes of striped locks)
> guarding them in very complex locking patterns. Now replace all those
> locks with an STM & 'atomic'. Is my program any more correct? Not
> really....
> ...I might have avoided some potential deadlocks (HotSpot uses lock
> ranking asserts to avoid deadlock; deadlock rarely happens at the
> engineers desk and maybe once/year in the field across all HS users).
> The set of deadlocks-in-the-field avoided was miniscule. I'll concede
> that HotSpot is a rarely-well-engineered piece of concurrent code, and
> that deadlocks-in-the-field appear to happen more often to other large
> programs. But still, fixing after the fact is reasonable when the
> deadlock rate is so low and each fix 'sticks'.
> Instead of deadlock, HS crashes far far more often because the locks
> don't cover the right set of changes. Switching out the locks for an
> STM didn't change what I was guarding; it only removed any fine-
> grained-lock errors (admittedly useful... but only incrementally more
> so than avoiding deadlocks). I'm still stuck with a program that's
> too Big to see where the proper atomic/STM/locking boundaries need to
> be. In a trivial example I can say "go up one call level and 'atomic'
> there", but in the Real Program - I can't do that. Go up how many
> layers and add 'atomic'? 1 layer? 10 layers? 100 layers? Yes, I
> see unique call-stacks with >100 layers. I can't put 'atomic' around
> 'main' because that makes my program single-threaded.
> Here's where I want some kind of compiler support. 'Ref' helps -
> because it at least demands I have an 'atomic' around the 'Ref'. But
> 'Ref' is sufficient, because a syntactally correct program simply
> wraps an 'atomic' around each Ref - exact what my trivial example
> did. I'd like to be able to specify classes & groupings of Refs that
> have to be wrapped in 'atomic' - that the compiler will complain
> about. Yes I'm still responsible for the semantics - I have to tell
> the compiler which groupings of Refs are interesting - but I'd like
> some kind of automatic support, so that as my program goes from 10
> lines to 10MLOC I can be told "you touched both Ref
> Person._checking._money and Ref Person._savings._money without
> wrapping both Ref accesses in a single atomic, thats a datarace
> error".
Okay, that makes a lot more sense. So really, what you want is a way to
formalise, say, invariants, and detect violations as early as possible.
I think the Clojure STM has a couple of training wheels hinting at this
kind of thing already. We would probably also want some kinds of other
"ref relationship" primitives, because invariants are awful at
describing most things. Is there maybe something comparable in database
systems?
I wonder, though, how well this can work. When laying down the rules
comes close to being as much work as describing what to do within those
rules, the temptation will always be not to formalise any rules. Complex
systems will probably breed complex rules, so how do you make the
specification of the rules straight-forward?
If it's possible to pull this off effectively, maybe it could even aid
the performance issue. Given some kind of relationship graph of the
various parts of mutable data, the STM runtime could attempt to arrange
and represent things optimally in memory. You know, the "more
information, more scope for optimisation" argument that never really
worked when it was applied to high-level languages. ;)
~phil