On May 26, 5:51 am, Rich Hickey <richhic
...@gmail.com> wrote:
> On my side, points were taken re: transparency and control for STMs.
> The Clojure STM's current architecture is certainly capable of both
> better reporting and control.
> So I'll be working on adding the necessary diagnostic counters, and
> exposing the many aspects of the Clojure STM that could become 'knobs'
> for performance tuning - timeout windows, retry counts, history cache
> sizes etc. So, I have some work to do before I'd like that kind of
> scrutiny :)
> Rich
I'm going to be a sourpuss again here. :-(
This is exactly the trap MPI fell into; and *you* have to do it
anyways. Double-unsmiley. :-( :-(
Here's the deal:
I write a Large Complex Program, one that Really Needs STM to get it
right.
But performance sucks.
So I do a deep dive into the STM runtime, and discover it has warts.
So I hack my code to work around the warts in the STM.
Crap like: at an average of 5 cpus in this atomic block the STM
'works', but at an average of 7 cpus in the same atomic block I get a
continous fail/retry rate that's so bad I might as well not bother.
So I guard the STM area with a "5-at-a-time" lock and a queue of
threads waiting to enter. Bleah (been there; done that - for a DB not
an STM but same-in-priniciple situation). A thousand thousand
variations of the same crap happens, each requiring a different hack
to my code to make it performant.
Meanwhile the STM author (You: Rich) hacks out some warts & hands me a
beta version.
I hack my code to match the STM's new behavior, and discover some new
warts.
Back & Forth we go - and suddenly: my app's "performance correctness"
is intimately tied to the exact STM implementation. Any change to the
STM's behavior kills my performance - and you, Rich, have learned a
lot about the building of a robust good STM. You (now) know the
mistakes you made and know it's time to restart the STM from scratch.
*My* options limited:
- Scream at you to Not Change A Thing. Thus C/C++/Clojure Standards
Committees are born.
- Cling tenaciously to the abandoned version, and recognize my code is
now abandon'd ware. No new features or support for me. But maybe the
App is running fine and I can forget about it.
- Rewrite my app from scratch Yet Again, to match the new STM's warts/
features.
MPI is in this position. Every large parallel scientific App Out
There is intimately tied to the marriage of parallelizing_compiler +
MPI_runtime + computer_archicture. Changing any one of those pieces
requires the app to be rewritten from scratch in order to achieve a
fraction of the required performance.
The parallelizing compilers have 'stablized' in a performance way.
There's a coding style which will auto-vectorize/stripe/etc and
there's some codes "on the edge" (some compilers yes, some no), and
there's a well known "don't go here, the compiler can't hack it". The
compiler support for STM is very much lacking and/or in-flux. Your
heading into this terrain now, as you add Clojure compiler support to
your STM. Me, as an STM user, can't know what's in store for me as
you go through the learning process. I *know* that I *don't know*
what STM coding styles will be performant and what will not.
The MPI_runtime has now stabelized (I believe, not sure) after 20+
years. Again there's the "this is good, this is marginal, this is
bad" folk wisdom that drives coding. Again, for STM's, this area is
very much in flux. Go read some of the bazillion academic papers Out
There. Everybody's got their own take, every STM is good in this
domain & bad in some other domain - and all the domains are different;
god forbid I write a large program dependent on STM 'A' and later try
to switch over to STM 'B'.
The computer_arch *has* been stable for message-passing for some
time. For STM, I believe it's trying to ramp-up. i.e., the
computer_arch for STM is "all the world's an X86" *right now*, and all
hardware vendors are furiously studying what it would take to add some
kind of STM/HTM/hybrid support.
Thus, for STM to make it to the 'Promised Land' - the STM industry
needs to figure out:
- what belongs in the compiler
- what belongs in the runtime
- where there are Dragons and where there are Green Pastures
- teach the STM users which paths lead to Dragons or Green Pastures.
If we BOTH don't go through the excercise, then STM will never hit the
promised land.
MPI never really "made it"; the 'Green Pasture' area was too small,
and it was always surrounded by steep performance cliffs leading to
Dragons when you slipped off the edge. Upgrade your hardware: rewrite
your app. Upgrade your compiler: 2 months of performance debugging.
Upgrade your MPI: 2 months of performance debugging to discover you
need to rewrite your app.
GC "made it"; the Green Pasture gradually got bigger & bigger; Dragons
remain lurking - but only after you filled up an ever-growing Green
Pasture and needed to poke at the edges of stability.
And, if you don't mind, I'm going to edit this entire thread for
clarity and echo it on my blog.
This is good stuff (this conversation), and I definitely wish you the
best of luck.
Cliff