Lock vs Lock-free programming

Michel

unread,

Oct 23, 2005, 9:30:03 AM10/23/05

to

I just have a question what a reasonable approach to lock-free
programming would be. Do you need to make any specific high to mid level
design considerations if using locks as opposed to lock-free
programming? Is a reasonable approach to use lock based programming as a
default and then for hot spots or contended locks detected by measuremnt
on typical loads see if a lock free algo would fit better? Does it make
sense in some cases to make the choice on lock/lock free strategy
dynamic based on hardware, configuration or some other external
dependency? How do you approach this problem as it seems not to be black
and white but rather a sliding range depending on a lot of factors? A
rather open ended question which I hope could spur some interesting
insights and discussions.

Regards
/Michel

Joe Seigh

unread,

Oct 23, 2005, 10:19:13 AM10/23/05

to

The problem with lock-free is that the api's and design/usage patterns
for them are not well established enough yet to really compare them
to lock based solutions, unless you've been using lock-free for a while.

So for the time being, this puts lock-free into the expert/special usage/
experimental category.

But lock-free does scale better than lock based solutions, so if you have
a scalability problem, lock-free is a consideration.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Michel

unread,

Oct 23, 2005, 12:22:46 PM10/23/05

to

Joe Seigh wrote:
> The problem with lock-free is that the api's and design/usage patterns
> for them are not well established enough yet to really compare them
> to lock based solutions, unless you've been using lock-free for a while.
>
> So for the time being, this puts lock-free into the expert/special usage/
> experimental category.

Yes I understand that. And thats roughly how I understand it, having a
reasonably long MT programming background (around 7 years server side MT
programming) it isn't until the last year it actually has gotten my
attention. And I'm mostly at the application level but can still see the
benefit for some of the simpler ones that been a round for a while eg
lock free queues and such primitives.

> But lock-free does scale better than lock based solutions, so if you have
> a scalability problem, lock-free is a consideration.

Well contention seems to come up reasonably often in both systems iI
design and that I have been involved in troubleshooting, and usually it
warrants a redesign to reduce contention and the scalability concerns I
run into more often comes to other resources such as disk and network
I/O bandwidth. Then you could argue that I'm not serious enough with my
designs from the beginning but often shortcuts is taken with known
issues for sake of simplicity, schedule demands and initial scalability
requirements (which quite often is passed after the system has been in
production for a while). And I guess the redesign to reduce contention
has it's merits even when and if switching to lock free but it usually
leads to harder to understand and more complex code. But I'm just
surveying lock free to have another to add to my arsenal when and if the
need rises. I also understand there is a discussion to wether lock free
only should be used in very special and expert places such as inside the
OS kernel and such for very critical structures. But I think as time
comes with multicore cpus on the desktop and maybe even more cores at
the server even application level server programmers will have to look
into to it to scale better and cram even more performance out of the
hardware without resorting to scale-out which isn't always feasible.

/Michel

David Hopwood

unread,

Oct 23, 2005, 9:35:48 PM10/23/05

to

Joe Seigh wrote:
> Michel wrote:
>
>> I just have a question what a reasonable approach to lock-free
>> programming would be. Do you need to make any specific high to mid
>> level design considerations if using locks as opposed to lock-free
>> programming? Is a reasonable approach to use lock based programming as
>> a default and then for hot spots or contended locks detected by
>> measuremnt on typical loads see if a lock free algo would fit better?
>> Does it make sense in some cases to make the choice on lock/lock free
>> strategy dynamic based on hardware, configuration or some other
>> external dependency? How do you approach this problem as it seems not
>> to be black and white but rather a sliding range depending on a lot of
>> factors? A rather open ended question which I hope could spur some
>> interesting insights and discussions.
>
> The problem with lock-free is that the api's and design/usage patterns
> for them are not well established enough yet to really compare them
> to lock based solutions, unless you've been using lock-free for a while.
>
> So for the time being, this puts lock-free into the expert/special usage/
> experimental category.

And when it does become less experimental, it is more likely to be used
effectively in implementations of operating systems, languages, and database
engines than in general application programming. These are cases where the
complexity can be justified by benefits to many applications at once.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

RichT

unread,

Oct 24, 2005, 1:01:31 AM10/24/05

to

One pattern for lock free programming is "add-only". That is, you have
a data structure like a list that you only add objects to, and adding
objects is serialized through one thread. The code for the thread looks
like: wait for signal to add new object; create and initialize object;
atomically update list with object. There are other patterns, however I
think that this is the most common one.

However, there is a problem. The clever hardware designers have have
introduced out-of-order write, so it is not sufficient to just do:
create and initialize object; atomically update list with object. You
have to put in some synchronization instructions both in the thread
that is doing the atomic update and any other thread that may be
following the list.

Out of order write is a feature of the P4 and P5 processor (think Apple
OSX, AIX). Anyone who is writing general purpose threaded code should
write their lockless code to defend against out-of-order write. Here is
a pointer to what you need to do:
http://www-128.ibm.com/developerworks/eserver/articles/power4_mem.html

We saw problems with multi-process synchronization code that assumed
in-order writes when we ran in on the P4 processor.

Joe Seigh

unread,

Oct 24, 2005, 8:28:58 AM10/24/05

to

RichT wrote:
> We saw problems with multi-process synchronization code that assumed
> in-order writes when we ran in on the P4 processor.
>

The Intel arch states stores are in order except for string and streaming
instructions. Some of the systems builders may have broken the memory
model (i.e. implemented a different one) however.

Note also that in order writes don't imply in order reads. You still
need membars or other logic to order reads. Otherwise the writes may
appear to be out of order.

Joe Seigh

unread,

Oct 24, 2005, 8:49:14 AM10/24/05

to

Michel wrote:

> Joe Seigh wrote:
>
> lock free queues and such primitives.
>
>> But lock-free does scale better than lock based solutions, so if you have
>> a scalability problem, lock-free is a consideration.
>
>
> Well contention seems to come up reasonably often in both systems iI
> design and that I have been involved in troubleshooting, and usually it
> warrants a redesign to reduce contention and the scalability concerns I
> run into more often comes to other resources such as disk and network
> I/O bandwidth. Then you could argue that I'm not serious enough with my
> designs from the beginning but often shortcuts is taken with known
> issues for sake of simplicity, schedule demands and initial scalability
> requirements (which quite often is passed after the system has been in
> production for a while). And I guess the redesign to reduce contention
> has it's merits even when and if switching to lock free but it usually
> leads to harder to understand and more complex code. But I'm just
> surveying lock free to have another to add to my arsenal when and if the
> need rises. I also understand there is a discussion to wether lock free
> only should be used in very special and expert places such as inside the
> OS kernel and such for very critical structures. But I think as time
> comes with multicore cpus on the desktop and maybe even more cores at
> the server even application level server programmers will have to look
> into to it to scale better and cram even more performance out of the
> hardware without resorting to scale-out which isn't always feasible.
>

Well, the opinion that should only ever be used in the os and such is
mainly just one person. Lock-free is already in general usage in Java,
mostly things like lock-free traversal of queues. If you have event
handlers in Java, you're using lock-free.

The problem of exploiting lock-free in major apps is a lack of expertise
in this area combined with a resistance to new ideas. So adoption will
be slow and be mainly through replacement by new apps. Bad for Intel but
too bad. They waited too long. Plus it doesn't help that from an external
perspective, Intel appears to be profoundly clueless about the problem.
You'd be hard pressed to detect that Intel is doing anything except make
more marketing noise.

Michel

unread,

Oct 24, 2005, 12:43:55 PM10/24/05

to

Joe Seigh wrote:
> Well, the opinion that should only ever be used in the os and such is
> mainly just one person.

The community of people contributing here is quite small and vocal so
it's quite easy to get a skewed picture of things. But I'll guess this
will change over time as more people become interested and experts come
along.

> Lock-free is already in general usage in Java,
> mostly things like lock-free traversal of queues. If you have event
> handlers in Java, you're using lock-free.

I'm currently not that deep into Java, using it just occasionally. But
my prime candidates for research and experimentation for lock free would
be in the different observer implementation that I have since the
observer pattern is troublesome for several reasons in mt and even in
single threaded programming. And this sounds akin to the event handlers
which is an instance of the observer pattern. Currently I usually employ
asynchrounous changes, copy on notify or copy on change if busy
depending on size of observer list and other properties. But this I
think would be a prime candidate for lock-free.

> The problem of exploiting lock-free in major apps is a lack of expertise
> in this area combined with a resistance to new ideas. So adoption will
> be slow and be mainly through replacement by new apps. Bad for Intel but
> too bad. They waited too long. Plus it doesn't help that from an external
> perspective, Intel appears to be profoundly clueless about the problem.
> You'd be hard pressed to detect that Intel is doing anything except make
> more marketing noise.

For good reasons I think most developers should be kept from mt
programming with traditional primitives since there is a severe lack of
people understanding even that at the basic level, leaving code that is
deadlock prone, full of races, overlocked creating unessecary contention
or even plain wrong. I've managed to raise abit over basic understanding
but have a long way to wander. But I hope this will also change slowly
over time and hopefully we that have started early can share and will
become event more tractable to employ over tim. A lot of people shy away
from this type of programming because of the unexplainable errors, non
determinism and they lack the skill or motivation to understand whats
happening and design these kinds of systems.

/Michel

David Hopwood

unread,

Oct 24, 2005, 2:57:49 PM10/24/05

to

RichT wrote:
> Out of order write is a feature of the P4 and P5 processor (think Apple
> OSX, AIX). Anyone who is writing general purpose threaded code should
> write their lockless code to defend against out-of-order write. Here is
> a pointer to what you need to do:
> http://www-128.ibm.com/developerworks/eserver/articles/power4_mem.html

The memory barrier instructions are described in a rather implementation-
oriented way in that article, that is not actually very helpful.

So I looked for something a bit more definitive:
"Book II: PowerPC Virtual Environment Architecture", downloaded from
<http://www-128.ibm.com/developerworks/eserver/articles/archguide.html>
(Version 2.01, dated 10 Dec 2003). And what does it say but this (on
page 7):

Because stores cannot be performed "out-of-order" (see Book III,
PowerPC Operating Environment Architecture), if a Store instruction
depends on the value returned by a preceding Load instruction (because
the value returned by the Load is used to compute either the effective
address specified by the Store or the value to be stored), the
corresponding storage accesses are performed in program order.

Also, on page 24 of Book III:

Stores are not performed out-of-order (even if the Store instructions
that caused them were executed out-of-order).

"Performed" is defined on page 2 of Book II:

A load or instruction fetch by a processor or mechanism (P1) is performed
with respect to any processor or mechanism (P2) when the value to be
returned by the load or instruction fetch can no longer be changed by a
store by P2. A store by P1 is performed with respect to P2 when a load
by P2 from the location accessed by the store will return the value
stored (or a value stored subsequently).
[snip irrelevant stuff about cache block invalidations]
The preceding definitions apply regardless of whether P1 and P2 are the
same entity.

It is presumably not a coincidence that these are the same definitions used
in <http://research.compaq.com/wrl/people/kourosh/papers/1993-tr-68.pdf>.

Unfortunately, Book II only defines "performed with respect to", not "performed".
However, my reading of the above is that for each pair of processors P1 and
P2, P2 cannot observe stores made by P1 out-of-order. Also note that Book II
page 5 says that "In most systems the default is that all storage is Memory
Coherence Required", where

Memory coherence refers to the ordering of stores to a single location.
Atomic stores to a given location are coherent if they are serialized in
some order, and no processor or mechanism is able to observe any subset of
those stores as occurring in a conflicting order.

and all aligned stores are "atomic stores" (Book II page 3).

This would imply that aligned stores to locations of the default memory type
occur in a global total order, which is the order in which they are observed
by all processors.

So,

- if an incompatible change has been made to the PowerPC architecture, why
is it not documented? (I don't think I missed a more up-to-date version of
the arch manual; the web page certainly doesn't give any indication that
there might be one.)

- if an incompatible change has not been made, then what have I missed, and
what is <http://www-128.ibm.com/developerworks/eserver/articles/power4_mem.html>
going on about?

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Alexander Terekhov

unread,

Oct 24, 2005, 3:51:38 PM10/24/05

to

Yes.

David Hopwood wrote:
>
> RichT wrote:
> > Out of order write is a feature of the P4 and P5 processor (think Apple
> > OSX, AIX). Anyone who is writing general purpose threaded code should
> > write their lockless code to defend against out-of-order write. Here is
> > a pointer to what you need to do:
> > http://www-128.ibm.com/developerworks/eserver/articles/power4_mem.html
>
> The memory barrier instructions are described in a rather implementation-
> oriented way in that article, that is not actually very helpful.
>
> So I looked for something a bit more definitive:
> "Book II: PowerPC Virtual Environment Architecture", downloaded from
> <http://www-128.ibm.com/developerworks/eserver/articles/archguide.html>
> (Version 2.01, dated 10 Dec 2003). And what does it say but this (on
> page 7):

Let's stick to Version 2.02.

http://www-128.ibm.com/developerworks/eserver/library/es-archguide-v2.html

>
> Because stores cannot be performed "out-of-order" (see Book III,
> PowerPC Operating Environment Architecture), if a Store instruction
> depends on the value returned by a preceding Load instruction (because
> the value returned by the Load is used to compute either the effective
> address specified by the Store or the value to be stored), the
> corresponding storage accesses are performed in program order.

This is about implicit cc_hsb and dd_hsb.

>
> Also, on page 24 of Book III:

Uhmm, that must be page 28 in Version 2.02

>
> Stores are not performed out-of-order (even if the Store instructions
> that caused them were executed out-of-order).

See the context (1st and 2nd paragraphs in 4.2.4). It's about
*speculative* out-of-order stuff.

>
> "Performed" is defined on page 2 of Book II:

[... remote write atomicity ...]

To me, the really curious thing is programming note in 1.7.1. ;-)

regards,
alexander.

David Hopwood

unread,

Oct 24, 2005, 4:13:57 PM10/24/05

to

Michel wrote:
> Joe Seigh wrote:
>
>> Well, the opinion that should only ever be used in the os and such is
>> mainly just one person.
>
> The community of people contributing here is quite small and vocal so
> it's quite easy to get a skewed picture of things.

That's why it helps to address the arguments, rather than who or how
many people are making them.

>> Lock-free is already in general usage in Java,
>> mostly things like lock-free traversal of queues. If you have event
>> handlers in Java, you're using lock-free.
>
> I'm currently not that deep into Java, using it just occasionally. But
> my prime candidates for research and experimentation for lock free would
> be in the different observer implementation that I have since the
> observer pattern is troublesome for several reasons in mt and even in
> single threaded programming.

Indeed it is, but it's not clear that lock-free helps here. For another
approach that does, using the Observer/Listener pattern as an example,
see <http://www.erights.org/talks/promises/paper/tgc05.pdf>.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

David Hopwood

unread,

Oct 24, 2005, 4:14:57 PM10/24/05

to

Joe Seigh wrote:
> RichT wrote:
>
>> We saw problems with multi-process synchronization code that assumed
>> in-order writes when we ran in on the P4 processor.
>>
> The Intel arch states stores are in order except for string and streaming
> instructions. Some of the systems builders may have broken the memory
> model (i.e. implemented a different one) however.

He is talking about POWER4.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Joe Seigh

unread,

Oct 24, 2005, 10:28:09 PM10/24/05

to

Extra points if you can figure out how this works or doesn't.
http://www.ussg.iu.edu/hypermail/linux/kernel/0510.3/0160.html

David Hopwood

unread,

Oct 25, 2005, 12:06:47 AM10/25/05

to

It isn't documented or explained by its author, so why should anyone
believe that it works?

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Message has been deleted

Chris Thomasson

unread,

Oct 31, 2005, 2:47:54 PM10/31/05

to

> Do you need to make any specific high to mid level design considerations
> if using locks as opposed to lock-free programming? Is a reasonable
> approach to use lock based programming as a default and then for hot spots
> or contended locks detected by measuremnt on typical loads see if a lock
> free algo would fit better?

This usually works out okay, however, the "simple strategy" may, and
probably will, "re-focus" the contention to a new area of your application
that did not "seem" to have any previous "problems" wrt contention. A
lock-free solution that addresses a single point of contention may require
you to upgrade others. You have to really stop and think about how the
throughput of your application flows through the possible points of
contention. Think how a 200+% increase in throughput from contention point A
will affect contention points B-(whatever)...

In general, upgrading your application with lock-free algorithms can be a
rather tedious and time consuming task; although, once your done, you
usually end with a far superior design...

> Does it make sense in some cases to make the choice on lock/lock free
> strategy dynamic based on hardware, configuration or some other external
> dependency?

IMO, I don't think a reason to use a lock-free algorithm should be based on
the hardware your application might be running on. Besides, a CAS
instruction is available, or can be readily constructed, on nearly all
modern hardware. One exception may be for algorithms based on a double-width
cas (DWCAS) instruction because its availability is scarce on 64-bit
systems. I have created separate versions of some applications just to take
advantage of the wider range of advanced algorithms DWCAS can be used to
implement...

> How do you approach this problem as it seems not to be black and white but
> rather a sliding range depending on a lot of factors? A rather open ended
> question which I hope could spur some interesting insights and
> discussions.

Usually, "normal" applications do not really need to tap into the potential
power of lock-free algorithms. IMO, scalability is not really a concern for
such applications. Although, The hardware that is going to be available to
"normal" customers in the next couple of years may require some "normal"
applications to "scale-up"...

IMHO, the applications that currently reap major benefits from a well
thought-out lock-free design are high-user servers, operating systems,
applications that frequently access a lot of shared "read-mostly"
data-structures, or any other application that "expects to scale well wrt
adding more processors"...

Any thoughts?

P.S.

If you are seriously thinking about upgrading any of your applications with
lock-free algorithms, try to make sure that everybody on the design team has
a good understanding of "memory barriers and memory models in general"...
Then try to make sure they have a decent understanding of the "memory model
and basic instruction set of your applications target platforms"...

Unfortunately, this will probably be your biggest problem...

:O

Michel

unread,

Nov 1, 2005, 7:23:39 AM11/1/05

to

Thanks Chris for your excellent reply and thoughts.

Chris Thomasson wrote:
> IMHO, the applications that currently reap major benefits from a well
> thought-out lock-free design are high-user servers, operating systems,
> applications that frequently access a lot of shared "read-mostly"
> data-structures, or any other application that "expects to scale well wrt
> adding more processors"...

I'm am definitely in that space since I work on software with
disseminating financial information in forms of prices, trades and the
like. And handling financial transactions in large volumes in short time
periods. Read exhcange, trading, riskmanagement software with very high
demands on scalability and performance on all levels and the need for
handling very high peak flows both in and out of the system in a short
time without hickups.

So I have been battling quite a long time with "traditional" MT
programming to come sufficiently good at designing and developing these
kinds of applications. And of course I have done many of the mistakes
that can be done in that area, but it's been a revarding experience so
far. And now venturing into and researching lock free is exciting to say
the least, and I thank you for sharing and helping me out on the journey.

> If you are seriously thinking about upgrading any of your applications with
> lock-free algorithms, try to make sure that everybody on the design team has
> a good understanding of "memory barriers and memory models in general"...
> Then try to make sure they have a decent understanding of the "memory model
> and basic instruction set of your applications target platforms"...
>
> Unfortunately, this will probably be your biggest problem...

This is probably true, and even when just doing "traditonal" MT with
locks it still is the case that people just add some locks and thinks
that is enough to go MT and get performance. And it's probably a larger
uphill battle to get the team up on par with memory models, and
barriers. But also a part of the fun helping people to develop and gain
new skills ;)

Regards
/Michel

Joe Seigh

unread,

Nov 1, 2005, 9:05:04 AM11/1/05

to

Michel wrote:
> Thanks Chris for your excellent reply and thoughts.
>
> Chris Thomasson wrote:
>
>> IMHO, the applications that currently reap major benefits from a well
>> thought-out lock-free design are high-user servers, operating systems,
>> applications that frequently access a lot of shared "read-mostly"
>> data-structures, or any other application that "expects to scale well
>> wrt adding more processors"...
>
>
> I'm am definitely in that space since I work on software with
> disseminating financial information in forms of prices, trades and the
> like. And handling financial transactions in large volumes in short time
> periods. Read exhcange, trading, riskmanagement software with very high
> demands on scalability and performance on all levels and the need for
> handling very high peak flows both in and out of the system in a short
> time without hickups.
>
> So I have been battling quite a long time with "traditional" MT
> programming to come sufficiently good at designing and developing these
> kinds of applications. And of course I have done many of the mistakes
> that can be done in that area, but it's been a revarding experience so
> far. And now venturing into and researching lock free is exciting to say
> the least, and I thank you for sharing and helping me out on the journey.

Well, you pretty much have that specific niche all to yourself as I
don't think they hire in from the outside (people without a financials
background). You'll have to write your own api's from scratch though.
The one's I have aren't production ready since I haven't had a production
environment to test them against. I'll probably have to create a
production environement from scratch to do this (the outsider problem).
I was thinking of a database but that seemed to be a bit on the large
side. nscd (name services cache daemon) seemed like a possibility
but that turns out to be a database plus LIFO queue plus priority
queue so I'm not too sure of that either.

>
>> If you are seriously thinking about upgrading any of your applications
>> with lock-free algorithms, try to make sure that everybody on the
>> design team has a good understanding of "memory barriers and memory
>> models in general"... Then try to make sure they have a decent
>> understanding of the "memory model and basic instruction set of your
>> applications target platforms"...
>>
>> Unfortunately, this will probably be your biggest problem...
>
>
> This is probably true, and even when just doing "traditonal" MT with
> locks it still is the case that people just add some locks and thinks
> that is enough to go MT and get performance. And it's probably a larger
> uphill battle to get the team up on par with memory models, and
> barriers. But also a part of the fun helping people to develop and gain
> new skills ;)
>

Things are fairly problematic at that level. What you need or somebody needs
to do is develop an api with well defined semantics and work with that.

Michel

unread,

Nov 1, 2005, 1:59:21 PM11/1/05

to

Joe Seigh wrote:
> Well, you pretty much have that specific niche all to yourself as I
> don't think they hire in from the outside (people without a financials
> background).

Its a small niche, but a special one. Actually we hire from outside,
more than the outside hires us. There seems to be a sceptiscism about
having worked seriously with scalability , perfrormance, threading, data
comms, protocols and systems programming. This isn't traditionally seen
as things you work on when working on financial systems were outsiders
(from my POW) seems to think youre a VBA, VB, Cobol, Oracle Forms, angle
bracket loving, name your favorite three level acronym of the week,
zealot (shiver). The bulk of systems are at that level but some of the
core systems for front office type trading, real time risk managment and
information dissemination are far from that. I also don't think a lot of
technical and really people are attracted to the financial industry
because of that view and that is a pity. Since a lot of the problems are
really interesting to work on when working in front-office trading type
environment. If we can we hire telecoms people, industry automation,
real time people that seems technically apt and fit and the reverse is
seldom true due to prejudice and sceptisicm.

> You'll have to write your own api's from scratch though.

Yes, there doesn't seem that much you can get in canned apis at the
moment, but in due time there will be. And at that time it would be good
to have experimented a bit with lock free and maybe done some test shots
at an own api, maybe having some simple algos in production, this to be
able to judge the quality and implementation of the apis tools that will
surface.

> The one's I have aren't production ready since I haven't had a production
> environment to test them against. I'll probably have to create a
> production environement from scratch to do this (the outsider problem).
> I was thinking of a database but that seemed to be a bit on the large
> side. nscd (name services cache daemon) seemed like a possibility
> but that turns out to be a database plus LIFO queue plus priority
> queue so I'm not too sure of that either.

nscd sounds like a fairly good candidate to me. It's hard to get away
from some kind of db and its usually needed in one way at another but it
should be secondary. A simple superfast im process lightweight db
like/persistent store library supporting really simple querying,
indexing (hashed and sorted) on simple flatfile type records seems like
a good candidate to me. This could be used to store and manage
persistent state for processes to be able to to fast startup and
recovery from a real backing store.

>> This is probably true, and even when just doing "traditonal" MT with
>> locks it still is the case that people just add some locks and thinks
>> that is enough to go MT and get performance. And it's probably a
>> larger uphill battle to get the team up on par with memory models, and
>> barriers. But also a part of the fun helping people to develop and
>> gain new skills ;)
>
> Things are fairly problematic at that level. What you need or somebody
> needs
> to do is develop an api with well defined semantics and work with that.

But even that is problematic since to make the apis effective, usable
and not deadlock prone. Its quite hard to hide all the complexities of
MT even if we can go a long way, it's kind of like with DCOM or Corba we
try to hide the boundary behind a simple function call but it doesn't
work for a host of reasons. I guess it's a leaky abstraction at least at
the level the languages and tools are right now.

/Michel

Sean Kelly

unread,

Nov 1, 2005, 2:35:59 PM11/1/05

to

Michel wrote:
> Joe Seigh wrote:
> > Well, you pretty much have that specific niche all to yourself as I
> > don't think they hire in from the outside (people without a financials
> > background).
>
> Its a small niche, but a special one. Actually we hire from outside,
> more than the outside hires us. There seems to be a sceptiscism about
> having worked seriously with scalability , perfrormance, threading, data
> comms, protocols and systems programming.

Really? I've never had a problem in this regard. Perhaps it's just
the negative perception of IT folk versus those in research or at a
software company? Still, it would take a fairly shallow interviewer to
judge your technical skills based on such details and I'm not sure I'd
want to work in a firm who employed such people anyway :-)

> This isn't traditionally seen
> as things you work on when working on financial systems were outsiders
> (from my POW) seems to think youre a VBA, VB, Cobol, Oracle Forms, angle
> bracket loving, name your favorite three level acronym of the week,
> zealot (shiver).

I'm sure Goldman Sachs would be overjoyed to hear that all their
expensive analysts can be replaced by VB code monkeys ;-) Not to
mention the various exchanges, high-volume broker/dealers, investment
banks, etc. In my experience, large financial firms deal with greater
throughput, storage, and uptime requirements than most other business
sectors. And they tend to write the bulk of the software they use
in-house.

> The bulk of systems are at that level but some of the
> core systems for front office type trading, real time risk managment and
> information dissemination are far from that.

Not to mention order reconciliation, static and realtime data
dissemination, etc.

> > You'll have to write your own api's from scratch though.

I'm not sure I agree. Standard lock-free containers and such can be
reused as-is. Not that there are a lot of them available, but still...
It's not like the finance industry has a problem set that's completely
disconnected from the rest of the computing world.

> But even that is problematic since to make the apis effective, usable
> and not deadlock prone. Its quite hard to hide all the complexities of
> MT even if we can go a long way, it's kind of like with DCOM or Corba we
> try to hide the boundary behind a simple function call but it doesn't
> work for a host of reasons. I guess it's a leaky abstraction at least at
> the level the languages and tools are right now.

I'm beginning to lean towards the inclusion of special language
features to make MT programming more natural. I like the Cilk syntax,
though I haven't seen many other specialized languages or language
extensions to compare to.

Sean

Michel

unread,

Nov 1, 2005, 3:25:06 PM11/1/05

to

Sean Kelly wrote:

> Michel wrote:
> Really? I've never had a problem in this regard. Perhaps it's just
> the negative perception of IT folk versus those in research or at a
> software company? Still, it would take a fairly shallow interviewer to
> judge your technical skills based on such details and I'm not sure I'd
> want to work in a firm who employed such people anyway :-)

Usually the problem is coming to interview in a niche outside the
financial business were there seems to be a credibility poblem. But
thats maybe more of the recent downturn, that now seems have come to an
end.

> I'm sure Goldman Sachs would be overjoyed to hear that all their
> expensive analysts can be replaced by VB code monkeys ;-) Not to
> mention the various exchanges, high-volume broker/dealers, investment
> banks, etc. In my experience, large financial firms deal with greater
> throughput, storage, and uptime requirements than most other business
> sectors. And they tend to write the bulk of the software they use
> in-house.

Yes thats my point! It's rather complicated software not enough with
throughput, storage, uptime add to that low latency, large amounts of
simoltaneously connected users that need to be connected full time. And
large demand to disseminate distribute large amounts of real time
marketinformation than the traditional transaction/response style
interaction. But I've seen both comapanies struggle to find the right
competencies and people wanting to leave the business (usually cause of
moving) struggling to get adequate jobs. But this might be different in
some places, in general some industries seems rather traditional and
just recruiting from within.

>>The bulk of systems are at that level but some of the
>>core systems for front office type trading, real time risk managment and
>>information dissemination are far from that.
>
> Not to mention order reconciliation, static and realtime data
> dissemination, etc.

Yes thats true, with information dissemination I meant all kinds of real
time dissemination of marketinfo, prices, trades, static data and the
such. And then we also have the "simple" and very "mundane" task of
identifying an instrument, contract, asset or whatever universally ;).

>> > You'll have to write your own api's from scratch though.
> I'm not sure I agree. Standard lock-free containers and such can be
> reused as-is. Not that there are a lot of them available, but still...
> It's not like the finance industry has a problem set that's completely
> disconnected from the rest of the computing world.

Certainly not. The problem set and solutions are common. But I think
Joes comment and my gut reaction was the lack of lock free apis would
warrant that you invent your own if you want to try lock free now or
hold your horses and wait for the apis to crop up and even by then you
should have some experience and understanding to evaluate the apis and
make informed decisions which apis to use.

> I'm beginning to lean towards the inclusion of special language
> features to make MT programming more natural. I like the Cilk syntax,
> though I haven't seen many other specialized languages or language
> extensions to compare to.

I think thats the way to go also. But at a higher level than threading
has been integrated through monitors in eg Java, .NET and the like.

/Michel

Markus Elfring

unread,

Nov 1, 2005, 4:22:19 PM11/1/05

to

> But even that is problematic since to make the apis effective, usable
> and not deadlock prone.

Can you benefit from the existing lock-free implementations for fundamental data
structures like lists and queues already?
How much "polish" do you expect for the interfaces until they can be reused in higher
lever (class) libraries and models?

> Its quite hard to hide all the complexities of
> MT even if we can go a long way, it's kind of like with DCOM or Corba we
> try to hide the boundary behind a simple function call but it doesn't
> work for a host of reasons. I guess it's a leaky abstraction at least at
> the level the languages and tools are right now.

Do you see chances to integrate the technology into DSMLs?
Examples:
- http://ptolemy.eecs.berkeley.edu/
- http://en.wikipedia.org/wiki/Domain-specific_modelling

Regards,
Markus

Michel

unread,

Nov 1, 2005, 5:20:08 PM11/1/05

to

Markus Elfring wrote:
> Can you benefit from the existing lock-free implementations for fundamental data
> structures like lists and queues already?
> How much "polish" do you expect for the interfaces until they can be reused in higher
> lever (class) libraries and models?

Depending..But I prefer standardized interfaces, boost level quality and
interface style or soon to be standardized interfaces. Since a lot of
review and scrutiny has been put to these kinds of interfaces. Otherwise
I roll my own or wrap some adequate implementation if it can be found
and licensing arent prohibitining in my own interface.

>>Its quite hard to hide all the complexities of
>>MT even if we can go a long way, it's kind of like with DCOM or Corba we
>>try to hide the boundary behind a simple function call but it doesn't
>>work for a host of reasons. I guess it's a leaky abstraction at least at
>>the level the languages and tools are right now.
>
>
> Do you see chances to integrate the technology into DSMLs?
> Examples:
> - http://ptolemy.eecs.berkeley.edu/
> - http://en.wikipedia.org/wiki/Domain-specific_modelling

Not to dissapoint you, but the phrase "Think of a DSM system as an
application to make a CASE tool (framework) or an application for making
applications to make applications" really rings a lot of warning bells
in my ear. Sounds like at least one level of abstraction to many and
that usually takes its toll in performance and hacks needed to
circumvent things.

/Michel

chris noonan

unread,

Nov 1, 2005, 7:09:37 PM11/1/05

to

Here are some principles of lock-free design (all IMHO of course):

1. Lock-free algorithms are low-level. Learn your
processor thoroughly, it's memory model, atomic and
synchronising instructions. And the operating system
too; for example if it zeros out new memory, more
algorithms become feasible. Code in assembly language,
use every trick available, forget about portability
and forward compatibility.

2. Look at the problem you want to solve as a whole;
do not subdivide it. Find out exactly what the constraints
are. Do not try to make the solution more general-purpose
than it need be - you may be ruling out the most efficient
method. If a proposed solution has drawbacks such as
starvation, perform analysis to try and bound the effects;
it is surprising how often a frightening feature boils down
to little or nothing.

3. Be prepared to pack your data. If you can squeeze
a key field or a whole record into a size that is
atomically readable or writeable by the processor,
your task will be that much easier.

4. Patterns of memory use are different. You are
likely to have memory pools (treated as arrays of
fixed-length records) rather than making small
allocations on the heap. Memory within the pools
will be consumed progressively up to a high water
mark and then can't be released - it is difficult
to hand memory back to the system, lock-free.
High water mark allocation is pretty harmless on
virtual memory platforms (though you may have
trouble convincing your colleagues!).

5. It is often necessary to keep control structures
in a memory area distinct from the data they control.
This is because if records are deleted, the location
of an embedded control field (such as a 'next' pointer)
may be reused for data, which is problematic.
Particularly applies to variable-length records in
lock-free lists.

Chris

Markus Elfring

unread,

Nov 2, 2005, 4:22:11 PM11/2/05

to

> Not to dissapoint you, but the phrase "Think of a DSM system as an
> application to make a CASE tool (framework) or an application for making
> applications to make applications" really rings a lot of warning bells
> in my ear. Sounds like at least one level of abstraction to many and
> that usually takes its toll in performance and hacks needed to
> circumvent things.

I hope that automatic transformation and generation from higher level abstractions with
domain-specific modelling and model-driven development techniques will result in code with
quality and speed that will be as fast or even quicker than some hand-crafted approaches.
Would you like to avoid any manual efforts if desired design requirements like real-time
limits can be checked by appropriate tool chains?
How many options from "solution space" do you keep into account for the goals?

Do you try to avoid to repeat the iron man challenge?
http://en.wikipedia.org/wiki/John_Henry_%28folklore%29

Regards,
Markus

Michel

unread,

Nov 2, 2005, 5:22:26 PM11/2/05

to

Markus Elfring wrote:
> I hope that automatic transformation and generation from higher level abstractions with
> domain-specific modelling and model-driven development techniques will result in code with
> quality and speed that will be as fast or even quicker than some hand-crafted approaches.
> Would you like to avoid any manual efforts if desired design requirements like real-time
> limits can be checked by appropriate tool chains?
> How many options from "solution space" do you keep into account for the goals?

The phrase "an application for making applications to make applications"
just rings to many warning bells with failed promises in the past for
CASE, 4GL and businessmen writing cobol and the such. I have a hard time
believing there is a silver bullet. Writing software is inherently
complex and hard and throwing in concurrency to the mix at least squares
the inherent complexity. But if such tools become commerciably viable
and accepted by the community producing real software I'm all ears. Just
saying right now there are a lot of alarm bells singing in my head.

> Do you try to avoid to repeat the iron man challenge?
> http://en.wikipedia.org/wiki/John_Henry_%28folklore%29

Well I haven't seen a salesman coming with a steammachine yet, but there
might be some research in academia about steammachines. But I certainly
keep my eyes open and watch for new ways of doing things, and try to
keep up front, but still with my eyes on what can be delivered and used
today and meet my requirements not what can come the next steammachine
hammer.

/Michel

Imre Palik

unread,

Nov 7, 2005, 5:55:54 AM11/7/05

to

"Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> writes:
> ...

> If you are seriously thinking about upgrading any of your applications with
> lock-free algorithms, try to make sure that everybody on the design team has
> a good understanding of "memory barriers and memory models in general"...
> Then try to make sure they have a decent understanding of the "memory model
> and basic instruction set of your applications target platforms"...

How can one gain suc an understanding? Are there any good books around?

Thx

ImRe

Joe Seigh

unread,

Nov 7, 2005, 12:54:09 PM11/7/05

to

Understanding or what? The memory model? For that you need to consult
the hardware architecture definition manuals. You can also try
http://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html for
a start.

For lock-free try http://www.rdrop.com/users/paulmck/RCU/ with the
annotated bibliography as a good start. Also google for "lock-free".

The memory model is important if you intend to implement any lock-free
algorithms. If you're using a lock-free api then the semantics of the
api should be documented. RCU has a supported api for the Linux kernel
environment. In user application space there are lock-free api's but
with the exception of some low level atomic api's, they're not well
supported in the sense of resources needed for support. So you might
consider them more experimental at this point. At least until the
multi-core vendors start getting desperate enough to throw resources
at the problem.