Peyton-Jones and Harris discuss STM (1h video)

kimo

unread,

Nov 23, 2006, 8:19:39 PM11/23/06

to

http://channel9.msdn.com/Showpost.aspx?postid=231495

Programming in the Age of Concurrency: Software Transactional Memory

Recently, we visited MSR Cambridge(UK) to meet some of the great
minds working there. In this case, we were fortunate enough to get an
hour's time with Simon Peyton-Jones and Tim Harris, who are researchers
working on a very hard problem: making it easier (more predictable,
more reliable, more composable) to write concurrent applications in
this the age of Concurrency (multi-core is a reality, not a dream).

Chris Thomasson

unread,

Nov 23, 2006, 9:58:14 PM11/23/06

to

"kimo" <kimocr...@gmail.com> wrote in message
news:1164331179.8...@l39g2000cwd.googlegroups.com...
http://channel9.msdn.com/Showpost.aspx?postid=231495

> Programming in the Age of Concurrency: Software Transactional Memory

> ?Recently, we visited MSR Cambridge(UK) to meet some of the great

> minds working there. In this case, we were fortunate enough to get an
> hour's time with Simon Peyton-Jones and Tim Harris, who are researchers
> working on a very hard problem: making it easier (more predictable,
> more reliable, more composable) to write concurrent applications in
> this the age of Concurrency (multi-core is a reality, not a dream).

TM is a no go. Way too much overhead... Here are some of the reasons why:

http://groups.google.com/group/comp.arch/msg/e1e391055c7acecb

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/9c572b709248ae64

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f6399b3b837b0a40

http://groups.google.com/group/comp.arch/msg/1b9e405080e93149

http://groups.google.com/group/comp.arch/msg/bbbc035cf1a8502c

http://groups.google.com/group/comp.arch/msg/11b14c4bda2d5d82

http://groups.google.com/group/comp.arch/msg/9b00fda2752966f9

http://groups.google.com/group/comp.arch/msg/335aeb22fd6fe526

http://groups.google.com/group/comp.arch/msg/1ace9400b1b16cd4

http://groups.google.com/group/comp.arch/msg/995379a16beb3b69
(simply excellent Wheeler post)

...

Any thoughts?

;^)

Chris Thomasson

unread,

Nov 23, 2006, 10:06:13 PM11/23/06

to

They compare solutions' that are not TM, bananas'! This includes locks!!

Thanks Kimo! Very entertaining!

Chris Thomasson

unread,

Nov 23, 2006, 10:07:18 PM11/23/06

to

They depend on contention managers!

Chris Thomasson

unread,

Nov 23, 2006, 10:08:36 PM11/23/06

to

Joe... We got em good!

Chris Thomasson

unread,

Nov 23, 2006, 10:43:08 PM11/23/06

to

I posted a quick reply... Lets see if the Microsoft "hot-shots" can survive
the wraith of c.p.t and c.a....

Muhhahahahah!

;^)

kimo

unread,

Nov 24, 2006, 2:54:02 AM11/24/06

to

I get the sense that the promise of STM has faded and it probably needs
a hardware assist.

They did some interesting stuff with composibility and retry but you
could see them freeze when the interviewer asked them about multicore
stuff

I liked Tim Harris - looking at his site he seems to have continued
into other areas. Simon Peyton Jones struck me as a bit of a self
promoter, trying to get as much mileage out of STM as he can.

but those are just my limited impressions, i hope i'm wrong

Chris Thomasson

unread,

Nov 24, 2006, 4:44:13 AM11/24/06

to

If you watch the video you will see the part whether interviewer ask about
the overheads... They immediately shifts a comparison against
locks!!!~!!~!!!

Okay Kimo.... If Tim or Peyton-Jones venture into this realm, you will be in
for a good show...

Joe and I will saturate them with facts on TM... They will be in for the
ride of their careers...

Chris Thomasson

unread,

Nov 24, 2006, 4:55:43 AM11/24/06

to

ACK!

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:AuWdnZ_xG60lJvvY...@comcast.com...

> If you watch the video you will see the part whether interviewer ask about

^^^^^^^^^^^^^^^^^^^^^^^^^

see the part when the interviewer asks them about then ""overheads""

Joe Seigh

unread,

Nov 24, 2006, 9:00:15 PM11/24/06

to

I haven't had a chance to look at the video yet. It seemed like
one approach was going to turn it into a scheduling problem, sort of
like what the out of order pipelined processors to now with limited
success. Try to work out the dependencies in the transactions and
schedule them so as to not conflict. The other approach was to
conditionally try the transaction and make the retries preemptive,
i.e. don't wait until the commit if you know the transaction will
fail. This is if you look at transactions as a multi location load
reserved/store conditional.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Chris Thomasson

unread,

Nov 24, 2006, 11:22:49 PM11/24/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:wOOdnV1UBvCdPvrY...@comcast.com...

> kimo wrote:
>> I get the sense that the promise of STM has faded and it probably needs
>> a hardware assist.
>>
>> They did some interesting stuff with composibility and retry but you
>> could see them freeze when the interviewer asked them about multicore
>> stuff
>>
>> I liked Tim Harris - looking at his site he seems to have continued
>> into other areas. Simon Peyton Jones struck me as a bit of a self
>> promoter, trying to get as much mileage out of STM as he can.
>>
>> but those are just my limited impressions, i hope i'm wrong
>>
>
>
> I haven't had a chance to look at the video yet.

Well, they are doing exactly what I thought they would do:

http://groups.google.com/group/comp.arch/msg/1b9e405080e93149

And of course, they leave out any major alternatives'... Like PDR. They seem
to focus on comparing their work to a very naive 100% lock-based approach.
Unimpressed. I got a response to my post in the Channel9 fourm:

http://channel9.msdn.com/Showpost.aspx?postid=231495
(go to the end of the page...)

I am currently drafting my response... Humm...

[...]

> This is if you look at transactions as a multi location load
> reserved/store conditional.

IMO, it's an expensive and livelock prone approach... This includes
multi-word CAS emulation (MCAS)... If they put it in hardware, it will force
the cache coherency mechanism to be way to strict...

Joe Seigh

unread,

Nov 25, 2006, 10:05:52 AM11/25/06

to

Ok, I got a chance to view the video.

Firstly, it appears that the main motivation for STM is the competence
of your average programmer. They want something that is less likely to
get screwed up. Especially since Java combined threading and OO and
somehow let the impression that you could solve shared acess by using
synchronized methods on everything leading to a lot of deadlock. And
of course trying to impose a lock hierarchy after the fact is a pain.
I think the scalability claims are made more in that context (of bad
threaded OO programs) than other methods of acheiving scalability.

Is STM a silver bullet for multi-threaded programs written by
incompetent programmers? It might help with deadlock issues.
That may be important if your definition of scalability is not
deadlocking.

They seem to be using GC as a PDR mechanism.

And they seem to have some dependence on things that are language
features rather than properties of synchronization mechansims.
Heck, I could do some really interesting things in C/C++ if it
knew the difference between shared and non shared memory as a
start.

And I find it highly ironic that they're using banking transactions
as an example since banking transactions are anything but atomic.

Chris Thomasson

unread,

Nov 25, 2006, 10:39:25 AM11/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:BImdndpz7bmCxvXY...@comcast.com...

I completely agree. I think you nailed it Joe. Bravo!

:^)

Chris Thomasson

unread,

Nov 25, 2006, 11:59:57 AM11/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:BImdndpz7bmCxvXY...@comcast.com...

> kimo wrote:
>> http://channel9.msdn.com/Showpost.aspx?postid=231495
>>
>> Programming in the Age of Concurrency: Software Transactional Memory
>>
>> Recently, we visited MSR Cambridge(UK) to meet some of the great
>> minds working there. In this case, we were fortunate enough to get an
>> hour's time with Simon Peyton-Jones and Tim Harris, who are researchers
>> working on a very hard problem: making it easier (more predictable,
>> more reliable, more composable) to write concurrent applications in
>> this the age of Concurrency (multi-core is a reality, not a dream).
>>
>
> Ok, I got a chance to view the video.

Notice the part when Simon says the overheads in STM is about 5%. Then Tim
goes humm, that's optimistic thinking Simon... Then Simon goes, well if you
want your program to work at all, you will take the performance hit...

First of all, if Simon thinks that %5 is the overhead then he is living a
pipedream. They must be testing against very crappy lock-based
implementations... The overheads involved are an orders of magnitude. Then
he says 'well if you want your program to work at all, you will take the
performance hit...', which is utter nonsense. They seem to be misleading
themselves.

So I agree with Kimo that Tim might have an open mind...

I think I am going to try to get Tim and Simon to post here... Humm...

Joe Seigh

unread,

Nov 25, 2006, 1:21:51 PM11/25/06

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:BImdndpz7bmCxvXY...@comcast.com...
>
>>kimo wrote:
>>
>>>http://channel9.msdn.com/Showpost.aspx?postid=231495
>>>
>>>Programming in the Age of Concurrency: Software Transactional Memory
>>>
>>>
>>

>>Ok, I got a chance to view the video.
>
>
> Notice the part when Simon says the overheads in STM is about 5%. Then Tim
> goes humm, that's optimistic thinking Simon... Then Simon goes, well if you
> want your program to work at all, you will take the performance hit...
>
> First of all, if Simon thinks that %5 is the overhead then he is living a
> pipedream. They must be testing against very crappy lock-based
> implementations... The overheads involved are an orders of magnitude. Then
> he says 'well if you want your program to work at all, you will take the
> performance hit...', which is utter nonsense. They seem to be misleading
> themselves.
>
> So I agree with Kimo that Tim might have an open mind...
>
> I think I am going to try to get Tim and Simon to post here... Humm...
>
>

They are testing against crappy implementations actually. That's
the whole point. They're trying to do for threaded programming
what GC did for memory management. GC didn't get rid of memory
errors. It just made them silent. For STM, overly long transactions
that don't complete might be considered the equivalent of dangling
pointers in GC.

Simon and Tim don't claim STM is a silver bullet. They're just
trying for a market segment that wants to dumb down programming
so employers don't have to hire skilled programmers. They can
hire less skilled ones and pay them even less. That's the subtext
I'm picking up anyway.

Chris Thomasson

unread,

Nov 25, 2006, 1:48:13 PM11/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:tamdnV5uF_KSFPXY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:BImdndpz7bmCxvXY...@comcast.com...
>

> Simon and Tim don't claim STM is a silver bullet.

Simon contradicts himself... He says it not a silver bullet, then says this:

'well if you want your program to work at all, you will take the
performance hit...'

Humm...

Chris Thomasson

unread,

Nov 25, 2006, 3:59:45 PM11/25/06

to

"kimo" <kimocr...@gmail.com> wrote in message
news:1164331179.8...@l39g2000cwd.googlegroups.com...

http://channel9.msdn.com/Showpost.aspx?postid=231495

Programming in the Age of Concurrency: Software Transactional Memory

Check out the The Accelerator Project... Pretty cool. I haven't done a lot
of GPU work, but I know a couple of experts in this area... Interesting. It
has its tradeoffs. It mostly for number crunching applications. This is
different than PDR. They can exist side by side...

Chris Thomasson

unread,

Nov 25, 2006, 4:02:01 PM11/25/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:26ydnVWG0YUbNvXY...@comcast.com...

I know I can use the GPU in the playstation for number crunching... Okay.
The accelerator project is same as this:

http://groups.google.com/group/comp.sys.super/msg/c9503373d8d1c90b

Chris Thomasson

unread,

Nov 25, 2006, 4:14:26 PM11/25/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:QbqdnaSDmdaTMfXY...@comcast.com...

Actually, its more like this:

Chris Thomasson

unread,

Nov 25, 2006, 4:14:57 PM11/25/06

to

ARGH! Forgot the link!:

http://groups.google.com/group/comp.arch/browse_frm/thread/1f22f81585424fef

Joe Seigh

unread,

Nov 25, 2006, 6:50:24 PM11/25/06

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:tamdnV5uF_KSFPXY...@comcast.com...
>
>>Chris Thomasson wrote:
>>
>>>"Joe Seigh" <jsei...@xemaps.com> wrote in message
>>>news:BImdndpz7bmCxvXY...@comcast.com...
>>
>>Simon and Tim don't claim STM is a silver bullet.
>
>
> Simon contradicts himself... He says it not a silver bullet, then says this:
>
> 'well if you want your program to work at all, you will take the
> performance hit...'
>
> Humm...
>
>

Deadlock is probably what he is referring to.

Herlihy coined the term obstruction-free to describe some of the
STM stuff he worked on. But how obstruction-free will depend on
the individual programmer using STM. Things might get interesting
depending on who is doing what.

A key part of STM is the PDR part of it. GC is pretty efficient
for some circumstances but high transaction rate systems is not
one of them. Otherwise Java would not have needed all of what
came in with JSR-166. STM is going to need a good PDR scheme
to make it work efficiently. Also, STM is going to need people
who know how to implement lock-free data structures. They can't
simply do copy on write on everything. They'd need to know how
to do PCOW for some stuff at least. Somebody should ask
Simon and Tim how they'd do balanced binary trees with STM.

So maybe STM is a good opportunity. :)

Chris Thomasson

unread,

Nov 25, 2006, 9:11:00 PM11/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:m62dnRmqD64kS_XY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:tamdnV5uF_KSFPXY...@comcast.com...
>>
>>>Chris Thomasson wrote:
>>>
>>>>"Joe Seigh" <jsei...@xemaps.com> wrote in message
>>>>news:BImdndpz7bmCxvXY...@comcast.com...
>>>
>>>Simon and Tim don't claim STM is a silver bullet.
>>
>>
>> Simon contradicts himself... He says it not a silver bullet, then says
>> this:
>>
>> 'well if you want your program to work at all, you will take the
>> performance hit...'
>>
>> Humm...
> Deadlock is probably what he is referring to.

If you deadlock, you need to learn the fundamentals of locking. That's why I
referred somebody over on Channel 9 to David Butenhof's excellent book.

[...]

> A key part of STM is the PDR part of it. GC is pretty efficient
> for some circumstances but high transaction rate systems is not
> one of them. Otherwise Java would not have needed all of what
> came in with JSR-166. STM is going to need a good PDR scheme
> to make it work efficiently. Also, STM is going to need people
> who know how to implement lock-free data structures. They can't
> simply do copy on write on everything. They'd need to know how
> to do PCOW for some stuff at least. Somebody should ask
> Simon and Tim how they'd do balanced binary trees with STM.

I could rig the writer part, but I would have way too much overhead for the
readers. A transaction announcement/commitment usually implies memory
barriers for the readers... Yeah. The TM has to track many readers, it craps
out under contention manager security blanket. Have you been following my
posts at msdn Channel 9?

http://channel9.msdn.com/Showpost.aspx?postid=231495

Kind of entertaining.

> So maybe STM is a good opportunity. :)

Perhaps. I have implemented several in-house lock-free transactional
memories, however they were "minimalist". Some of the features from Haskell
force you to use memory barriers; I did lean STM instead.

Humm. I know I can use the Virtually Zero-Overhead Memory Allocator I
invented for atomic reference counted transaction descriptors. The allocator
even works on the Cell. Its 100% private memory; it 100% per-thread. On the
Cell, the atomic operations in the algorithm have to be emulated behind a
DMA abstraction. A bit tedious, but it works... I think I can get my
allocator to have "restricted writer" lock-free STM. The STM would be used
SOLELY for the writer side of the vZOOM PDR logic. Nothing else. My reader
threads would never have to use the STM, yet they can stay in sync via PDR
logic. I use a 32-bit transactional anchor. The ABA problem is handled by
the memory allocators reference counting logic.

A commit can be handled a couple of ways... I experimented with deferred
commit... The threads call PDR defer, and enqueue their commit. The PDR
become the transaction coordinator then. You can do it a number of other
ways as well...

Any thoughts?

Joe Seigh

unread,

Nov 25, 2006, 9:58:21 PM11/25/06

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message

> news:m62dnRmqD64kS_XY...@comcast.com...

>
>
>>A key part of STM is the PDR part of it. GC is pretty efficient
>>for some circumstances but high transaction rate systems is not
>>one of them. Otherwise Java would not have needed all of what
>>came in with JSR-166. STM is going to need a good PDR scheme
>>to make it work efficiently. Also, STM is going to need people
>>who know how to implement lock-free data structures. They can't
>>simply do copy on write on everything. They'd need to know how
>>to do PCOW for some stuff at least. Somebody should ask
>>Simon and Tim how they'd do balanced binary trees with STM.
>
>
> I could rig the writer part, but I would have way too much overhead for the
> readers. A transaction announcement/commitment usually implies memory
> barriers for the readers... Yeah. The TM has to track many readers, it craps
> out under contention manager security blanket. Have you been following my
> posts at msdn Channel 9?
>
> http://channel9.msdn.com/Showpost.aspx?postid=231495
>
> Kind of entertaining.
>
>
>

Ah, never mind. I figured out how they're doing lock-free
collections with STM. All the pointers have a extra level
of indirection. I think generational GC has that.

Chris Thomasson

unread,

Nov 25, 2006, 10:24:25 PM11/25/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:l9SdnXZZjOaEMvXY...@comcast.com...

> ARGH! Forgot the link!:
>
> http://groups.google.com/group/comp.arch/browse_frm/thread/1f22f81585424fef

Well, alls they do is turn a vector into a texture. You can give the texture
height and width. Then they force the GPU to process every pixel in the
texture. GPU can sort/advanced arithmetic the vector, or a group of vectors.

Even if you don't know a lot of OpenGL programming. Well you can read some
tutorials' on the Web, and learn everything up to the texture level. You
learn how to render a texture on a 3d-surface. I even know how to do that,
simple DirectX. Everybody in this group has skills to learn basic texture
stuff with Direct3D for sure.

You create the texture, (x,y) plot on the texture can map to a matrix,
vector of database indexes, or something...

You program the GPU, basically you tell it exactly how to render the texture
(e.g., your array). You have to resort to telling the GPU to process every
pixel in the texture. This is not efficient use of GPU, its necessary for
tricking the GPU to work on the entire array. You can use pixel shaders to
drill down on the array for more advanced arithmetic in the index sort, or
whatever.

Very interesting indeed.

Think of a lock-free database server, that had to have a high-end graphics
card because it used the data-parallel GPU to sort its indexes' for the
shared memory *fast-path of the database structures themselves.

*--

Fast-pathing a database means keeping a extremely large shared memory block
alive with PDR and read-only shared memory accesses, interprocess proxy
smart pointers. Joe Seigh and I have had very brief conversation on this.

Any thoughts?

Please... Somebody comment. Comp.programming.threads seems dead lately.
Where Mr Hopwood? I need you to test drive the message passing
implementation I am thinking about providing under strict non-commercial
academic style license????

Seriously. If functional programming is being rammed down programmers
thoughts in college, well if you can't beat em' join em? I can give them
great performing message passing, for all of the functional programming
addictions'?

Any thoughts?

Chris Thomasson

unread,

Nov 25, 2006, 10:27:13 PM11/25/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:EKidnVWCDZQwmPTY...@comcast.com...

>
> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:l9SdnXZZjOaEMvXY...@comcast.com...
>> ARGH! Forgot the link!:
>>
>> http://groups.google.com/group/comp.arch/browse_frm/thread/1f22f81585424fef
>
> Well, alls they do is turn a vector into a texture. You can give the
> texture height and width. Then they force the GPU to process every pixel
> in the texture. GPU can sort/advanced arithmetic the vector, or a group of
> vectors.
>
> Even if you don't know a lot of OpenGL programming. Well you can read
> some tutorials' on the Web, and learn everything up to the texture level.
> You learn how to render a texture on a 3d-surface. I even know how to do
> that, simple DirectX. Everybody in this group has skills to learn basic
> texture stuff with Direct3D for sure.
>
>
> You create the texture, (x,y) plot on the texture can map to a matrix,
> vector of database indexes, or something...
>
> You program the GPU, basically you tell it exactly how to render the
> texture (e.g., your array). You have to resort to telling the GPU to
> process every pixel in the texture. This is not efficient use of GPU, its
> necessary for tricking the GPU to work on the entire array. You can use
> pixel shaders to drill down on the array for more advanced arithmetic in
> the index sort, or whatever.
>
>

Forgot to mention how to force the GPU to do its thing... You setup the
rendering process, and actually tell OpenGL to render that sucker. Then you
copy the results right out of the framebuffer.

Believe me, this is worth learning about basic texture stuff in OpenGL or
DirectX. You can use it to sort database indexes.

Heck, basic texture stuff with OpenGL is simple. Tons and tons of example
code on the net.

Chris Thomasson

unread,

Nov 26, 2006, 1:52:35 AM11/26/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:ZJudnc--ff1an_TY...@comcast.com...

That indirection has overhead. Humm...

Chris Thomasson

unread,

Nov 26, 2006, 3:38:33 AM11/26/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:m62dnRmqD64kS_XY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:tamdnV5uF_KSFPXY...@comcast.com...
>>
>>>Chris Thomasson wrote:
>>>
>>>>"Joe Seigh" <jsei...@xemaps.com> wrote in message
>>>>news:BImdndpz7bmCxvXY...@comcast.com...
>>>
>>>Simon and Tim don't claim STM is a silver bullet.
>>
>>
>> Simon contradicts himself... He says it not a silver bullet, then says
>> this:
>>
>> 'well if you want your program to work at all, you will take the
>> performance hit...'
>>
>> Humm...
> Deadlock is probably what he is referring to.
>
> Herlihy coined the term obstruction-free to describe some of the
> STM stuff he worked on. But how obstruction-free will depend on
> the individual programmer using STM. Things might get interesting
> depending on who is doing what.
>
> A key part of STM is the PDR part of it. GC is pretty efficient
> for some circumstances but high transaction rate systems is not
> one of them. Otherwise Java would not have needed all of what
> came in with JSR-166. STM is going to need a good PDR scheme
> to make it work efficiently.

Good PDR scheme indeed!.

Well... Humm... I think we can help them Joe? Na... Let them figure things
out for themselvs... Well, STM with a kick ass PDR might be interesting...

...