Lock-free Guidelines? was: Posix Mutex

MCD

unread,

Feb 27, 2007, 1:07:32 PM2/27/07

to

On Feb 26, 4:43 am, Dave Butenhof <david.buten...@hp.com> wrote:
> Everyone with interest in threads or parallelism ought to be paying
> attention to lock-free, because it's important and is going to get more
> important as the techniques get generalized and encapsulated and as
> multi-core systems become essentially universal. We're reaching the end
> of the days when anyone using lock-free will be researching and
> implementing it themselves (which is good, because it's awfully easy to
> get catastrophically wrong); but to use them effectively you should
> still understand the principles.

I started a thread last week requesting some design recommendations
and the replies I've got so far have mostly suggested using lock-free
and one of David's comments (snipped above) in the Posix Mutex thread
really got me thinking.

As stated in that thread, I'm quite a newbie to MT, I didn't realize
before coming to this group that lock-free was an even an option. I'm
excited by its potential, but a bit overwhelmed by its implementation.
I completely agree that I should better understand the principles. So,
as to gain a better understanding, this begs me to ask the question:
when is lock-free the right choice? Are there some basic guidelines
for knowing when to use them and when to stay away?

Thanks
Marcus

Chris Thomasson

unread,

Feb 27, 2007, 5:00:05 PM2/27/07

to

"MCD" <mcde...@walla.com> wrote in message
news:1172599652.8...@k78g2000cwa.googlegroups.com...

> On Feb 26, 4:43 am, Dave Butenhof <david.buten...@hp.com> wrote:

[...]

>> We're reaching the end
>> of the days when anyone using lock-free will be researching and
>> implementing it themselves (which is good, because it's awfully easy to
>> get catastrophically wrong); but to use them effectively you should
>> still understand the principles.
>
> I started a thread last week requesting some design recommendations
> and the replies I've got so far have mostly suggested using lock-free
> and one of David's comments (snipped above) in the Posix Mutex thread
> really got me thinking.
>
> As stated in that thread, I'm quite a newbie to MT, I didn't realize
> before coming to this group that lock-free was an even an option. I'm
> excited by its potential, but a bit overwhelmed by its implementation.

> I completely agree that I should better understand the principles.

Before you implement anything, think about how you could use locks to get
the job done first. I would recommend creating a full-blown working
prototype of your synchronization scheme using C/C++ w/ POSIX Threads. Get
that out of the way first. Then you can start to experiment with
lock-free...

> So,
> as to gain a better understanding, this begs me to ask the question:
> when is lock-free the right choice? Are there some basic guidelines
> for knowing when to use them and when to stay away?

Its good to keep the following in mind:

http://groups.google.com/group/comp.programming.threads/msg/9a5500d831dd2ec7

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8329a6ddcb95b8ac

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/576ad37c34dd673a

Some of the lock-free algorithms that are worth taking a look at, reader
patterns in particular, are more about achieving very excellent scalability
and performance characteristics than anything else... So, if you feel that
you have a problem with any of that, and you have tested different types of
advance locking techniques using adaptive mutexs, well, then lock-free may
turn out to be fairly important after all... ;^)

Randy Howard

unread,

Feb 28, 2007, 2:08:15 PM2/28/07

to

On Tue, 27 Feb 2007 16:00:05 -0600, Chris Thomasson wrote
(in article <UpOdnfTAttBNNHnY...@comcast.com>):

> "MCD" <mcde...@walla.com> wrote in message
> news:1172599652.8...@k78g2000cwa.googlegroups.com...
>> On Feb 26, 4:43 am, Dave Butenhof <david.buten...@hp.com> wrote:
> [...]
>>> We're reaching the end
>>> of the days when anyone using lock-free will be researching and
>>> implementing it themselves (which is good, because it's awfully easy to
>>> get catastrophically wrong); but to use them effectively you should
>>> still understand the principles.
>>
>> I started a thread last week requesting some design recommendations
>> and the replies I've got so far have mostly suggested using lock-free
>> and one of David's comments (snipped above) in the Posix Mutex thread
>> really got me thinking.
>>
>> As stated in that thread, I'm quite a newbie to MT, I didn't realize
>> before coming to this group that lock-free was an even an option. I'm
>> excited by its potential, but a bit overwhelmed by its implementation.
>
>> I completely agree that I should better understand the principles.
>
> Before you implement anything, think about how you could use locks to get
> the job done first. I would recommend creating a full-blown working
> prototype of your synchronization scheme using C/C++ w/ POSIX Threads. Get
> that out of the way first. Then you can start to experiment with
> lock-free...

And if after you get your prototype up and running with pthreads, you
discover that performance is as good or better than expected (as most
do), you don't have to dive into the "bleeding edge" end of the pool at
all. :-)

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

David Schwartz

unread,

Feb 28, 2007, 10:31:08 PM2/28/07

to

On Feb 27, 10:07 am, "MCD" <mcdesi...@walla.com> wrote:

> As stated in that thread, I'm quite a newbie to MT, I didn't realize
> before coming to this group that lock-free was an even an option. I'm
> excited by its potential, but a bit overwhelmed by its implementation.
> I completely agree that I should better understand the principles. So,
> as to gain a better understanding, this begs me to ask the question:
> when is lock-free the right choice? Are there some basic guidelines
> for knowing when to use them and when to stay away?

Bluntly, unless you are implementing an operating system or threading
library, lock-free approaches are almost never the right choice.

The belief that they are generally superior to locks is based on the
fundamental misconception that contention is *good* and that
techniques that maximize contention are better than techniques that
avoid it.

Lock-free is a contention-maximizing technique because it allows
threads that contend to continue to contend. Locks are contention-
minimizing techniques because the cause threads that contend to be de-
scheduled and threads that do not contend to be scheduled
concurrently.

Contention is what is bad. So why choose a technique that's explicit
design goal is to maximize contention?

DS

Chris Thomasson

unread,

Feb 28, 2007, 11:29:34 PM2/28/07

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:1172719868.0...@k78g2000cwa.googlegroups.com...

> On Feb 27, 10:07 am, "MCD" <mcdesi...@walla.com> wrote:
>
>> As stated in that thread, I'm quite a newbie to MT, I didn't realize
>> before coming to this group that lock-free was an even an option. I'm
>> excited by its potential, but a bit overwhelmed by its implementation.
>> I completely agree that I should better understand the principles. So,
>> as to gain a better understanding, this begs me to ask the question:
>> when is lock-free the right choice? Are there some basic guidelines
>> for knowing when to use them and when to stay away?
>
> Bluntly, unless you are implementing an operating system or threading
> library, lock-free approaches are almost never the right choice.
>
> The belief that they are generally superior to locks is based on the
> fundamental misconception that contention is *good* and that
> techniques that maximize contention are better than techniques that
> avoid it.
>
> Lock-free is a contention-maximizing technique

That's a bit misleading. Because if your "totally" correct, well, what are
adaptive mutexs all about then? Do they not try to implement a lock-free
fast-path on the mutex? Mutexs try to do this because a lock-free fast-path
on the mutex is usually an "ideal" path to take. We have been through this
before. Shall we drill down on a particular type of lock-free algorithm?
Humm, for instance, lock-free reader patterns are virtually second to none
wrt scalability and performance in general. Well, try to create a 100%*
lock-based reader pattern that has memory barrier free reads? Can you do it?
I you can, PLEASE, post the algorithm! I want to read it ASAP.

* Well, that's unfair... It's the all or nothing argument... To redeem
myself, well, I know for sure that you can "marry" locking schemes and
lock-free reader patterns together in such a way to render optimal
performance and portability. So, I would not simply write off using any
lock-free techniques... BTW, DS, why don't you join in the conversation on
STM? I can end up using an ally like you against some of the lock-free
techniques in STM! ;^) Well, true be told, Joe Seighs STM implementation
has some fairly good forward progress capabilities...

Chris Thomasson

unread,

Mar 1, 2007, 1:11:37 AM3/1/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:UpOdnfTAttBNNHnY...@comcast.com...

> "MCD" <mcde...@walla.com> wrote in message
> news:1172599652.8...@k78g2000cwa.googlegroups.com...
>> On Feb 26, 4:43 am, Dave Butenhof <david.buten...@hp.com> wrote:
> [...]

[...]

>> I completely agree that I should better understand the principles.
>
> Before you implement anything, think about how you could use locks to get
> the job done first.

Got to know how to make efficient use of locks BEFORE you even think of
using a "lock-free" anything! Agreed?

Chris Thomasson

unread,

Mar 1, 2007, 1:27:12 AM3/1/07

to

[...]

BTW, DS, why don't you join in the conversation on
> STM? I can end up using an ally like you against some of the lock-free
> techniques in STM! ;^) Well, true be told, Joe Seighs STM implementation
> has some fairly good forward progress capabilities...

The part where he makes a transaction execute the compare-and-swap list even
though it's "not exactly committed" caught my eye.

Chris Thomasson

unread,

Mar 1, 2007, 1:35:07 AM3/1/07

to

"Randy Howard" <randy...@FOOverizonBAR.net> wrote in message
news:0001HW.C20B2F40...@news.verizon.net...

> On Tue, 27 Feb 2007 16:00:05 -0600, Chris Thomasson wrote
> (in article <UpOdnfTAttBNNHnY...@comcast.com>):

[...]

> And if after you get your prototype up and running with pthreads, you
> discover that performance is as good or better than expected (as most
> do), you don't have to dive into the "bleeding edge" end of the pool at
> all. :-)

If you feel the need for increased scalability and throughput, it might be a
good idea to create an experimental prototype of your application with all
of its rw-locks replaced with lock-free reader patterns... Some lock-free
algorithms, the "reader" class especially, is all about scalability and
sheer performance, which WILL become more and more important in the near
future. Also, you should also take David's point into account... If your
developing an Operating System, Memory Allocator or Enterprise Anything,
well, lock-free is fairly important!

Joe Seigh

unread,

Mar 1, 2007, 6:34:58 AM3/1/07

to

Your argument applies to threading as well, so basically you're
arguing against multi-threading unless you don't see the need for
consistency here.

Really, the issue is with after you've parallelized your program
to the maximum extent possible or practical, how you deal with the
unparallelized bits. On a 100 way multi-processors, lock based
syncrhonization is not going to handle those bits very gracefully.
The only thing your buggy whip argument is going to be good for is
beating a dead horse to death.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Dave Butenhof

unread,

Mar 1, 2007, 8:09:45 AM3/1/07

to

Contention is inevitable in any form of parallel programming. There are
all sorts of ways to MANAGE and CONTROL that contention, from coarse
grained message-passing techniques to very fine grained cache line
collision (including false sharing) in an array calculation.

What you're really saying, then, is that contention for a mutex is
better than contention for a memory location. OK... but what IS a mutex
but a memory location with contention characteristics that are carefully
managed? And more than that, a mutex is a "contention proxy" for the
data you really want to affect. You may limit contention for the
application data by blocking on the mutex... but this doesn't apply to
the thread library (and/or kernel) contention FOR the mutex, which has
to occur before and around the blocking. And the contention management
policies for a mutex are GENERIC (however tunable some aspects might be
on some systems). Any statement that such a situation is ALWAYS or even
"nearly always" better than directly managing the contention you really
care about seems obviously flawed.

Coarse grained contention is often best managed by making one partner
get out of the way while the other completes. Message passing is a
trivial example here, where you don't even usually need to be aware that
there was contention. (In a sense there isn't.) Mutexes also allow you,
at a lower level, to segregate the contexts and keep them out of each
others' way. But a mutex is still serializing a fairly large set of
operations; the block and unlock are expensive, and using this
"serialization token" to manage a straightforward write collision on a
single word often doesn't make much sense.

Lock-free doesn't remove contention, any more than anything else does;
but it allows you to manage that contention at a lower level and finer
granularity, closer to the actual problem. A mutex always contains
essentially a compare-and-swap. If the real problem to which you're
trying to apply the mutex can also be reduced to a compare-and-swap,
then the only advantage to the mutex is that it provides a portable API.
With lock-free (in this case a simple compare-and-swap), you're beating
on the DATA's cache line instead of on the MUTEX cache line. If the
mutex is usually free then you haven't reduced contention, but all the
contention is now focused on the actual problem rather than in managing
a proxy (the mutex) for the actual problem. And if the mutex is
frequently locked, you've got a host of benefits because the mutex
scheduling incurs additional indirect contention, priority inversion
concerns, deadlocks, and so forth that are completely foreign to the
compare-and-swap you really wanted in the first place.

The argument for substantially higher level lock-free becomes fuzzier
because the tradeoffs become polluted by so many factors that the system
is chaotic. Like many other performance/scalability techniques you need
to carefully consider (and often MEASURE) the effect on the APPLICATION
behavior. (Compiling sequential code with "optimization level 5" doesn't
necessarily produce "better" code than "optimization level 4", because
the compiler's optimizer doesn't know enough about what the APPLICATION
is doing.) Shortening a non-critical path may just make the rest of the
application look worse; shortening a critical path from 2 seconds to
half a second may just expose another critical path at 1 second. And,
yes, if your application has bad contention characteristics on other
areas of code, any CHANGE (whether locally good or bad) may hurt you.

But, finally: changing from a lock-based package to a lock-free package
is often NOT externally transparent, and in the extreme may require
substantial redesign of the surrounding code to accommodate the new
contention and scheduling patterns. Lock-free is often much more
profitably exploited in careful initial design than in later retrofits.
Much, as it happens, like threading. This can make "experimentation" a
bitter pill to swallow, because once you've (re)designed and (re)coded
enough of your application infrastructure to give the alternate design a
fair shake, it's difficult to throw all that away even if it's still a
net loss.

David Schwartz

unread,

Mar 1, 2007, 8:40:54 AM3/1/07

to

On Feb 28, 10:35 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

> Also, you should also take David's point into account... If your
> developing an Operating System, Memory Allocator or Enterprise Anything,
> well, lock-free is fairly important!

I agreed with you up to "Enterprise Anything". With any real-world
enterprise application, there are so many things to do that won't
contend that it's self-destructive to try hard to keep contending jobs
contending.

DS

David Schwartz

unread,

Mar 1, 2007, 8:43:45 AM3/1/07

to

On Mar 1, 3:34 am, Joe Seigh <jseigh...@xemaps.com> wrote:

> > Contention is what is bad. So why choose a technique that's explicit
> > design goal is to maximize contention?

> Your argument applies to threading as well, so basically you're
> arguing against multi-threading unless you don't see the need for
> consistency here.

I don't follow you at all. There currently is no alternative to multi-
threading. Process-pool architectures aren't sufficiently mature yet
and multi-process approaches don't provide sane ways to manage shared
state.

> Really, the issue is with after you've parallelized your program
> to the maximum extent possible or practical, how you deal with the
> unparallelized bits. On a 100 way multi-processors, lock based
> syncrhonization is not going to handle those bits very gracefully.
> The only thing your buggy whip argument is going to be good for is
> beating a dead horse to death.

This has nothing to do with parallel versus non-parallel. It's about
contending versus non-contending. The goal is to schedule as many non-
contending tasks concurrently as possible and, wherever possible, not
to schedule contending tasks concurrently.

DS

Dave Butenhof

unread,

Mar 1, 2007, 9:42:17 AM3/1/07

to

David Schwartz wrote:

>> Really, the issue is with after you've parallelized your program
>> to the maximum extent possible or practical, how you deal with the
>> unparallelized bits. On a 100 way multi-processors, lock based
>> syncrhonization is not going to handle those bits very gracefully.
>> The only thing your buggy whip argument is going to be good for is
>> beating a dead horse to death.
>
> This has nothing to do with parallel versus non-parallel. It's about
> contending versus non-contending. The goal is to schedule as many non-
> contending tasks concurrently as possible and, wherever possible, not
> to schedule contending tasks concurrently.

If you truly never scheduled contending tasks, then you wouldn't need
synchronization at all, (except within the scheduler, which frequently
has many algorithms that can substantially benefit from lock-free) and
contention wouldn't be an issue (for the main application code). For any
case where you can practically accomplish this, fine; you're done. There
"might as well be" no contention, and your (perfect) scheduling
essentially becomes your synchronization.

That's not realistic for most applications; contention is part of the
game, and the "fun" is in how you manage that contention. SOME
contention is indeed best managed by getting all other potential
contenders out of the way for a long time; but there's plenty of
contention patterns where that's inefficient. Mutexes can provide a sort
of "reactive" gateway from contention to scheduling, to approximate the
perfect "get out of the way" model of coarse grain contention control
through scheduling (but "on demand" rather than predictive). But for
finer-grained contention, the mutex often becomes just a proxy
contention point that's mostly getting in the way... you may be doubling
(tripling, or worse) the amount of contention actually required for the
job at hand. (And sometimes it matters WHERE the contention occurs;
sometimes it doesn't.)

Again, it's not a matter of whether you have contention. You DO. It's a
matter of whether it's best for your algorithm to focus the contention
"proactively" into the scheduler, which contends on its own private
cache lines; or "reactively" into either a mutex (which contends on a
proxy cache line and funnels into the scheduler) or into the specific
cache line(s) you really want to touch (lock-free).

(And often that's a good place to start the analysis; if your contention
point crosses cache lines, particularly when it's a potentially large
set as in a general memory management package, you're often best off
focusing the worst of the contention through a single SCHEDULED
contention point, like a mutex, rather than distributing the contention
management across a set of cache lines independently. You STILL may be
better off with a lock-free front end, such as a size-keyed malloc
lookaside list before going into a generalized locked heap.)