Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pthread_mutex_trylock vs. pthread_mutex_lock

3,117 views
Skip to first unread message

Serge...@gmail.com

unread,
Oct 14, 2008, 11:29:42 AM10/14/08
to
Hi guys,
Could somebody with in-depth knowledge about pthread internal
elaborate on how pthread_mutex_lock works.

A fairytale tales:

pthread_mutex_lock blocks until lock acquired while
pthread_mutex_trylock tries to acquire and return if other thread
holds the lock. Nice-n-simple...

Question.
It seams that pthread_mutex_lock has competitive advantage compare to
pthread_mutex_trylock(). Here are two scenarios:
1. pthread_mutex_lock(&mutex);

2. while(true) {
if (0 == pthread_mutex_trylock(&mutex)) break;
sleep(ThreadTimeSlice);
}

When pthread_mutex_lock acquiring lock...
1.
It may try to acquire the lock and if it fails at first attempt it
puts thread to sleep for remaining of Thread TimeSlice. Than try it
again (Opppsss... the same as scenario 2). In this case scheduler
don't have any priories when it choose the next thread to wake up. So
both scenarios are the same.

2.
Another case when once pthread_mutex_lock fails to acuire the lock it
puts thread on lock's 'waiting list' before going into suspend mode.
The thread that owns the lock will wake up thread from 'waiting list'
before release the lock. In this case pthread_mutex_lock get
competitive advantage since thread is awaken at exact moment when lock
released.
So it has much higher probability to acquire the lock compare to
pthread_mutex_trylock.

3.
Another case when once pthread_mutex_lock fails to acuire the lock it
puts thread on lock's 'waiting list' before going into suspend mode.
The thread that owns the lock will boost priority of the thread from
'waiting list' before release the lock. In this case
pthread_mutex_lock still gets some competitive advantage but it can be
payed down by boosting thread priority in scenario 2.

Hallvard B Furuseth

unread,
Oct 14, 2008, 12:45:37 PM10/14/08
to
Serge...@gmail.com writes:
> Question.
> It seams that pthread_mutex_lock has competitive advantage compare to
> pthread_mutex_trylock(). Here are two scenarios:
> 1. pthread_mutex_lock(&mutex);
>
> 2. while(true) {
> if (0 == pthread_mutex_trylock(&mutex)) break;
> sleep(ThreadTimeSlice);
> }

(2) rarely makes sense. If you want a lock before proceeding, use a
plain pthread_mutex_lock().

OTOH you might use trylock if your locking is complicated and might
deadlock. If the trylock fails, undo whatever you were doing, release
your locks, and try again later.

Or a function may need to do a mutex-protected task, but it doesn't
matter exactly when. E.g. it may want to read and update a counter. So
while the funcion is working on other stuff it trylocks once in a while
if that hasn't succeeded yet, and when it does succeed it updates the
counter. By the time the function needs that task done, if no trylocks
succeeded it does a plain lock and does the task before proceeding. If
several threads are doing that, this reduces the number of times one of
the threads must block and wait for the mutex.

As for waiting lists and scheduling: pthread_mutex_unlock() may give the
mutex to some other thread and schedule that thread to be run, but that
doesn't mean that other thread will run immediately. Likely that thread
must still wait for the scheduler to give it some CPU time. That wait
doesn't happen after a trylock, which doesn't put the thread to sleep.

--
Hallvard

Serge...@gmail.com

unread,
Oct 14, 2008, 1:42:33 PM10/14/08
to
On Oct 14, 12:45 pm, Hallvard B Furuseth <h.b.furus...@usit.uio.no>
wrote:

Thank you for reply,
My point is that in highly concurrent environment
pthread_mutex_trylock() seams to be less efficient than
pthread_mutex_lock().
Or let me put another way. In highly concurrent environment the actual
implementation details of pthread_mutex_trylock() and
pthread_mutex_lock() may render pthread_mutex_trylock() useless.

My actual problem is that I have big app with multiple mutexes. It
hungs somewhere. I would like to put in instrumentation that will
replace some/all pthread_mutex_lock() with pthread_mutex_trylock() so
I can collect ownership info and put in timeouts to expose delays. But
I afraid that replacing pthread_mutex_lock() with
pthread_mutex_trylock() may have side effects including that they less
likely to own the lock.

Chris Friesen

unread,
Oct 14, 2008, 2:16:57 PM10/14/08
to
Serge...@gmail.com wrote:

> It seams that pthread_mutex_lock has competitive advantage compare to
> pthread_mutex_trylock(). Here are two scenarios:
> 1. pthread_mutex_lock(&mutex);
>
> 2. while(true) {
> if (0 == pthread_mutex_trylock(&mutex)) break;
> sleep(ThreadTimeSlice);

Normally you wouldn't ever do 2 with a sleep. A more common scenario is
to simply abort the whole operation if the lock is already taken, since
some other thread is already doing whatever we wanted to do.
Alternately, it can be useful to busy-loop while trying the mutex. This
is a simple way to allow us to break out early if we've been waiting for
too long, which can be useful in debugging--of course you'd want to be
careful implementing this on uniprocessor systems.

> 2.
> Another case when once pthread_mutex_lock fails to acuire the lock it
> puts thread on lock's 'waiting list' before going into suspend mode.
> The thread that owns the lock will wake up thread from 'waiting list'
> before release the lock. In this case pthread_mutex_lock get
> competitive advantage since thread is awaken at exact moment when lock
> released.
> So it has much higher probability to acquire the lock compare to
> pthread_mutex_trylock.

This is how most implementations do it.

> 3.
> Another case when once pthread_mutex_lock fails to acuire the lock it
> puts thread on lock's 'waiting list' before going into suspend mode.
> The thread that owns the lock will boost priority of the thread from
> 'waiting list' before release the lock. In this case
> pthread_mutex_lock still gets some competitive advantage but it can be
> payed down by boosting thread priority in scenario 2.

In my experience it is unusual to have a combination of trylock and
regular locking where you're actually worried about providing fairness
to the two types of waiters.

Chris

Hallvard B Furuseth

unread,
Oct 14, 2008, 3:21:15 PM10/14/08
to
Serge...@gmail.com writes:

> Thank you for reply,
> My point is that in highly concurrent environment
> pthread_mutex_trylock() seams to be less efficient than
> pthread_mutex_lock().

If it's used the way you suggested, yes:-)

> Or let me put another way. In highly concurrent environment the actual
> implementation details of pthread_mutex_trylock() and
> pthread_mutex_lock() may render pthread_mutex_trylock() useless.

No, it provides useful functionality which pthread_mutex_lock() does
not. Also it can be faster when used right. However:

> My actual problem is that I have big app with multiple mutexes. It
> hungs somewhere.

That doesn't sound related to how efficient they are. Unless by
hung you mean a thread is looping (and thus consuming CPU time).

> I would like to put in instrumentation that will
> replace some/all pthread_mutex_lock() with pthread_mutex_trylock() so
> I can collect ownership info and put in timeouts to expose delays.

First:

Have you compiled it with debugging symbols and inspected the hung
process it in a debugger, to see where it failed?

Do you check the return values from your mutex calls? If e.g.
pthread_mutex_lock() fails due to a deadlock and the caller proceeds as
if it had the lock, you get into trouble. And do you use error checking
mutexes? (Initialize with PTHREAD_MUTEX_ERRORCHECK or something like
it, I don't quite remember.)

Of course it breaks because you have free()d the memory which contains
the mutex, you get into worse trouble...

Anyway, you can also try to run your program under Valgrind
--tool=helgrind, Purify (I think) or something similar. Maybe that will
find the problem.

> But I afraid that replacing pthread_mutex_lock() with
> pthread_mutex_trylock() may have side effects including that they less
> likely to own the lock.

Indeed. But if you want timing, you could do something like

mutex_lock_wrapper(wrapped_mutex_t *wmutex) {
int rc = pthread_mutex_trylock(&wmutex->mutex);
if (rc == 0) {
wmutex->immediate++;
} else if (rc == EBUSY) {
<start timer>;
rc = pthread_mutex_lock(&wmutex->mutex);
<check timer and save difference to wmutex>;
}
return rc;
}

However a wrapper cannot get at the time it takes to reacquire the mutex
in pthread_cond_wait(). You'd have to implement your own equivalent of
the pthread_cond_t type.

--
Hallvard

David Schwartz

unread,
Oct 14, 2008, 6:29:16 PM10/14/08
to
On Oct 14, 10:42 am, Sergey....@gmail.com wrote:

> My point is that in highly concurrent environment

In a highly concurrent environment, most work is done with no locks
held. So contention is the exception.

> pthread_mutex_trylock() seams to be less efficient than
> pthread_mutex_lock().

It depends what you're trying to do. If your intention is to block
until the lock can be acquired, pthread_mutex_lock is your best
choice, since it's designed to do precisely that. The
'pthread_mutex_trylock' function is only rarely useful.

> Or let me put another way. In highly concurrent environment the actual
> implementation details of pthread_mutex_trylock() and
> pthread_mutex_lock() may render pthread_mutex_trylock() useless.

What if you are in a thread that has very important work to do, and
can do that work two ways. One way is very efficient, but requires
acquiring a mutex that might be held for longer than you can afford to
wait. The other is not as efficient, but doesn't require that mutex.
In this case, you can use pthread_mutex_trylock to see if you can get
the mutex and do the more efficient path. If that fails, you use the
less efficient way.

This is a common pattern when a very important thread has work it can
either do itself or dispatch to another thread.

> My actual problem is that I have big app with multiple mutexes.  It
> hungs somewhere.  I would like to put in instrumentation that will
> replace some/all pthread_mutex_lock() with pthread_mutex_trylock() so
> I can collect ownership info and put in timeouts to expose delays. But
> I afraid that replacing pthread_mutex_lock() with
> pthread_mutex_trylock() may have side effects including that they less
> likely to own the lock.

That's not how you instrument code to solve a problem like that. You
instrument the code so that it that it registers what locks it holds
in a global list of held locks.

For example, pthread_mutex_lock becomes:

1) Add thread/mutex pair to global list. Include current time. (You
can include file/line if you want.)
2) Acquire lock.

And pthread_mutex_unlock becomes:

1) Remove thread/mutex pair from global list.

The pthread_mutex_trylock function only needs to add itself to the
list if it succeeds.

Then, an extra instrumentation thread scans the global list. It can
report threads that have held a lock for "too long", including the
file/line that acquired the lock.

Another choice is to replace your mutexes with smarter locks that can
wait for a limited amount of time. If they fail, they can dump the
conflicting locks from the global list. You can emulate a simple POSIX
mutex with a mutex and condition variable, and add all the capability
you need.

However, changing a lock to a sleep loop is a very invasive change.
That would be my last resort.

DS

Serge...@gmail.com

unread,
Oct 15, 2008, 12:19:40 PM10/15/08
to
On Oct 14, 6:29 pm, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 14, 10:42 am, Sergey....@gmail.com wrote:
>
> > My point is that in highly concurrent environment
>
> In a highly concurrent environment, most work is done with no locks
> held. So contention is the exception.
>
> >pthread_mutex_trylock() seams to be less efficient than
> > pthread_mutex_lock().
>
> It depends what you're trying to do. If your intention is to block
> until the lock can be acquired, pthread_mutex_lock is your best
> choice, since it's designed to do precisely that. The
> 'pthread_mutex_trylock' function is only rarely useful.
>
> > Or let me put another way. In highly concurrent environment the actual
> > implementation details ofpthread_mutex_trylock() and
> > pthread_mutex_lock() may renderpthread_mutex_trylock() useless.

>
> What if you are in a thread that has very important work to do, and
> can do that work two ways. One way is very efficient, but requires
> acquiring a mutex that might be held for longer than you can afford to
> wait. The other is not as efficient, but doesn't require that mutex.
> In this case, you can usepthread_mutex_trylockto see if you can get

> the mutex and do the more efficient path. If that fails, you use the
> less efficient way.
>
> This is a common pattern when a very important thread has work it can
> either do itself or dispatch to another thread.
>
> > My actual problem is that I have big app with multiple mutexes.  It
> > hungs somewhere.  I would like to put in instrumentation that will
> > replace some/all pthread_mutex_lock() withpthread_mutex_trylock() so

> > I can collect ownership info and put in timeouts to expose delays. But
> > I afraid that replacing pthread_mutex_lock() with
> >pthread_mutex_trylock() may have side effects including that they less
> > likely to own the lock.
>
> That's not how you instrument code to solve a problem like that. You
> instrument the code so that it that it registers what locks it holds
> in a global list of held locks.
>
> For example, pthread_mutex_lock becomes:
>
> 1) Add thread/mutex pair to global list. Include current time. (You
> can include file/line if you want.)
> 2) Acquire lock.
>
> And pthread_mutex_unlock becomes:
>
> 1) Remove thread/mutex pair from global list.
>
> Thepthread_mutex_trylockfunction only needs to add itself to the

> list if it succeeds.
>
> Then, an extra instrumentation thread scans the global list. It can
> report threads that have held a lock for "too long", including the
> file/line that acquired the lock.
>
> Another choice is to replace your mutexes with smarter locks that can
> wait for a limited amount of time. If they fail, they can dump the
> conflicting locks from the global list. You can emulate a simple POSIX
> mutex with a mutex and condition variable, and add all the capability
> you need.
>
> However, changing a lock to a sleep loop is a very invasive change.
> That would be my last resort.
>
> DS

On Oct 14, 6:29 pm, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 14, 10:42 am, Sergey....@gmail.com wrote:
>
> > My point is that in highly concurrent environment
>
> In a highly concurrent environment, most work is done with no locks
> held. So contention is the exception.
>
> >pthread_mutex_trylock() seams to be less efficient than
> > pthread_mutex_lock().
>
> It depends what you're trying to do. If your intention is to block
> until the lock can be acquired, pthread_mutex_lock is your best
> choice, since it's designed to do precisely that. The
> 'pthread_mutex_trylock' function is only rarely useful.
>
> > Or let me put another way. In highly concurrent environment the actual
> > implementation details ofpthread_mutex_trylock() and

> > pthread_mutex_lock() may renderpthread_mutex_trylock() useless.


>
> What if you are in a thread that has very important work to do, and
> can do that work two ways. One way is very efficient, but requires
> acquiring a mutex that might be held for longer than you can afford to
> wait. The other is not as efficient, but doesn't require that mutex.

> In this case, you can usepthread_mutex_trylockto see if you can get


> the mutex and do the more efficient path. If that fails, you use the
> less efficient way.
>
> This is a common pattern when a very important thread has work it can
> either do itself or dispatch to another thread.
>
> > My actual problem is that I have big app with multiple mutexes. It
> > hungs somewhere. I would like to put in instrumentation that will

> > replace some/all pthread_mutex_lock() withpthread_mutex_trylock() so


> > I can collect ownership info and put in timeouts to expose delays. But
> > I afraid that replacing pthread_mutex_lock() with
> >pthread_mutex_trylock() may have side effects including that they less
> > likely to own the lock.
>
> That's not how you instrument code to solve a problem like that. You
> instrument the code so that it that it registers what locks it holds
> in a global list of held locks.
>
> For example, pthread_mutex_lock becomes:
>
> 1) Add thread/mutex pair to global list. Include current time. (You
> can include file/line if you want.)
> 2) Acquire lock.
>
> And pthread_mutex_unlock becomes:
>
> 1) Remove thread/mutex pair from global list.
>

> Thepthread_mutex_trylockfunction only needs to add itself to the


> list if it succeeds.
>
> Then, an extra instrumentation thread scans the global list. It can
> report threads that have held a lock for "too long", including the
> file/line that acquired the lock.
>
> Another choice is to replace your mutexes with smarter locks that can
> wait for a limited amount of time. If they fail, they can dump the
> conflicting locks from the global list. You can emulate a simple POSIX
> mutex with a mutex and condition variable, and add all the capability
> you need.
>
> However, changing a lock to a sleep loop is a very invasive change.
> That would be my last resort.
>
> DS

Hi guys,
Thank you a lot for comments.
1.
Application is in production. It has > 40 threads and was written by
many, many people over 8 years. Given that it does fairly well
work :). ‘Hangs’ is our slang means that msg takes about 100-500 ms to
propagate through the app compare to usual 5-10 ms.

2.
We have hard time to simulate production env in development and
interested in getting performance warning in production as well as
dev.

3.


> Then, an extra instrumentation thread scans the global list. It can
> report threads that have held a lock for "too long", including the
> file/line that acquired the lock.

It looks like few posix call takes timeout as a parameter so you
sagest to extra thread to monitor.
4.
Here is example how boost 1_36 handles lack of timeout interface in
posix. Looks similar :((
boost::posix_time::ptime now;
while((now = microsec_clock::universal_time()) < abs_time){
if(semaphore_try_wait(handle))
return true;
thread_yield();
}

5.
So here my options to implement instrumentations:
a. extra thread that monitors pthread_mutexes that log lock/unlock
into list (nice easy)
b. Simulate mutex using condition variable. Seams the way to go. But
I just wondering how conditional variable to timeouts? I bet they use
busy loop with pthread_mutex_trylock!!!
c. Follow boost steps – busy loop with pthread_mutex_trylock. Sound
expensive but even best of the best from boost community use it.

So guys sound that extra thread is the only safe way to go?

BTW pity Win32 API has *interlock* set of f-ns that provide very low
level user-mode sync api. It essentially it is a wrapper around asm
instructions with ix86 LOCK prefix. It provide nice framework to
build own light weight sync objects. Does posix provide such heretic
thing? In this case

Martin Vuille

unread,
Oct 15, 2008, 12:33:19 PM10/15/08
to
Serge...@gmail.com wrote in
news:d8292191-2c5a-4c24...@s20g2000prd.googlegroups.
com:

>> > My actual problem is that I have big app with multiple
>> > mutexes.  It hungs somewhere.  I would like to put in
>> > instrumentation that will replace some/all
>> > pthread_mutex_lock() withpthread_mutex_trylock() so I can
>> > collect ownership info and put in timeouts to expose delays.
>> > But I afraid that replacing pthread_mutex_lock() with
>> >pthread_mutex_trylock() may have side effects including that
>> >they less
>> > likely to own the lock.
>>
>

> Hi guys,
> Thank you a lot for comments.
> 1.
> Application is in production. It has > 40 threads and was
> written by many, many people over 8 years. Given that it does
> fairly well work :). ‘Hangs’ is our slang means that msg takes
> about 100-500 ms to propagate through the app compare to usual
> 5-10 ms.
>

Maybe I'm missing something, but as a first step I would be inclined
to simply get the time before and after the pthread_mutex_lock()
calls and log instances where the difference is "too long".

That would confirm that pthread_mutex_lock() is really where the
"hang" occurs and identify whether the issue is located at a specific
call or occurs randomly at different calls.

MV

--
I do not want replies; please follow-up to the group.

Serge...@gmail.com

unread,
Oct 15, 2008, 1:46:56 PM10/15/08
to
On Oct 15, 12:33 pm, Martin Vuille <jpm...@yahoo.com> wrote:
> Sergey....@gmail.com wrote innews:d8292191-2c5a-4c24...@s20g2000prd.googlegroups.

We already did more or less what you suggest. Now it is a question to
put in instrumentation that will routinely report delays and actual
place that cause delay with little or no overhead. Usually it is not
problem with thread that tries to acquire lock but rather problem with
thread that already owns the lock. Reporting each lock/unlock will
flood the log files and slow down system to the point that it will
become useless. I done the simulator thing on windows. But Win32 API
has very rich synchronization api where almost each call has timeout
and ready to go interlock f-ns to make own enchantments.

Martin Vuille

unread,
Oct 15, 2008, 2:11:11 PM10/15/08
to
Serge...@gmail.com wrote in
news:a209d449-5345-4ecd...@c60g2000hsf.googlegroups.
com:

> On Oct 15, 12:33 pm, Martin Vuille <jpm...@yahoo.com> wrote:
>> Sergey....@gmail.com wrote

>> innews:d8292191-2c5a-4c24-b064-49bbcd6943df@s2

Sorry, I don't follow. I never suggested that you log all calls to
pthread_mutex_lock(). I was suggesting that you only log those
instances where the pthread_mutex_lock() takes "too long", by
whatever definition of "too long" you are using. That should not
flood the log files. It also seems to me that two calls to check
the current time is less overhead (or at least no more overhead)
that the while loop around the pthread_mutex_trylock().

David Schwartz

unread,
Oct 15, 2008, 3:56:09 PM10/15/08
to
On Oct 15, 9:33 am, Martin Vuille <jpm...@yahoo.com> wrote:

> Maybe I'm missing something, but as a first step I would be inclined
> to simply get the time before and after the pthread_mutex_lock()
> calls and log instances where the difference is "too long".

That won't help. The problem thread will never release the mutex, so
the victim threads will never acquire the mutex. You won't get any log
entries.

DS

David Schwartz

unread,
Oct 15, 2008, 3:56:55 PM10/15/08
to
Sergey....@gmail.com wrote:

> b. Simulate mutex using condition variable. Seams the way to go. But
> I just wondering how conditional variable to timeouts? I bet they use
> busy loop with pthread_mutex_trylock!!!

No, they don't. Why would they implement such an important function so
badly?

DS

David Schwartz

unread,
Oct 15, 2008, 3:58:50 PM10/15/08
to
On Oct 15, 11:11 am, Martin Vuille <jpm...@yahoo.com> wrote:

> Sorry, I don't follow. I never suggested that you log all calls to
> pthread_mutex_lock(). I was suggesting that you only log those
> instances where the pthread_mutex_lock() takes "too long", by
> whatever definition of "too long" you are using. That should not
> flood the log files. It also seems to me that two calls to check
> the current time is less overhead (or at least no more overhead)
> that the while loop around the pthread_mutex_trylock().

If the problem is a "grinds to a halt" type, this may identify which
log first slows things down, which may lead you right to the problem.
If the problem is a "suddenly stops" type, this will not help. No
thread will ever successfully acquire the problem mutex, so there will
be no logs of returns from pthread_mutex_lock.

DS

Martin Vuille

unread,
Oct 15, 2008, 5:04:05 PM10/15/08
to
David Schwartz <dav...@webmaster.com> wrote in
news:aabc2537-923e-4e89...@b31g2000prb.googlegroups.
com:

I interpreted the following statement from the OP

> Application is in production. It has > 40 threads and was
> written by many, many people over 8 years. Given that it does
> fairly well work :). ‘Hangs’ is our slang means that msg takes
> about 100-500 ms to propagate through the app compare to usual
> 5-10 ms.
>

to mean that the mutex _was_ being released, but only after an
amount of time that was deemed "too long", not that it was never
released.

David Schwartz

unread,
Oct 15, 2008, 7:15:19 PM10/15/08
to
On Oct 15, 2:04 pm, Martin Vuille <jpm...@yahoo.com> wrote:

> > Application is in production. It has > 40 threads and was
> > written by many, many people over 8 years.  Given that it does
> > fairly well work :). ‘Hangs’ is our slang means that msg takes
> > about 100-500 ms to propagate through the app compare to usual
> > 5-10 ms.

> to mean that the mutex _was_ being released, but only after an
> amount of time that was deemed "too long", not that it was never
> released.

You're most likely right. Your approach is definitely worth trying,
especially since it's so easy to implement.

DS

Chris M. Thomasson

unread,
Oct 15, 2008, 8:07:46 PM10/15/08
to

<Serge...@gmail.com> wrote in message
news:1baab1b9-99e3-4258...@b38g2000prf.googlegroups.com...

> Hi guys,
> Could somebody with in-depth knowledge about pthread internal
> elaborate on how pthread_mutex_lock works.
>
> A fairytale tales:
>
> pthread_mutex_lock blocks until lock acquired while
> pthread_mutex_trylock tries to acquire and return if other thread
> holds the lock. Nice-n-simple...
>
> Question.
> It seams that pthread_mutex_lock has competitive advantage compare to
> pthread_mutex_trylock(). Here are two scenarios:
> 1. pthread_mutex_lock(&mutex);
>
> 2. while(true) {
> if (0 == pthread_mutex_trylock(&mutex)) break;
> sleep(ThreadTimeSlice);
> }


The pattern is usually like:


if (! pthread_mutex_trylock(...)) {
// do something else
} else {
// cool!
}


Also, you can use `pthread_mutex_trylock()' to implement complicated locking
stratgeies. All in all, tricky locking schemes aside, well, its usually only
a good idea to use try-lock when you know that there is always something to
do in case the attempt fails...


> When pthread_mutex_lock acquiring lock...
[...]

The usual impl is that the thread will enter a kernel waitset on contention.
However, there are so-called adaptive mutexs which behave like a spin-lock
with a bounded number of spins before they actually call into the kernel to
block; High-level pseudo-code would look like:


void lock() {
for(unsigned spins = spin_max; spins > 0; --spin) {
if (try_to_lock()) {
return;
}
yield();
}
blocking_lock();
}


adaptive mutexs can yield better performance on multi-processor boxes by
attempting to skip the call into the kernel on every case of contention. I
also believe that some PThread impls support the `PTHREAD_MUTEX_ADAPTIVE_NP'
mutex type.

Hallvard B Furuseth

unread,
Oct 16, 2008, 2:54:39 AM10/16/08
to
Serge...@gmail.com writes:
> It looks like few posix call takes timeout as a parameter so you
> sagest to extra thread to monitor.
> (...)

> 4.
> Here is example how boost 1_36 handles lack of timeout interface
> in posix.

There are separate functions that take timeout parameters.
pthread_mutex_timedlock(), sem_timedwait(). Or semtimedop()
with SysV semaphores.

--
Hallvard

Hallvard B Furuseth

unread,
Oct 16, 2008, 5:03:49 AM10/16/08
to
Serge...@gmail.com writes:

> Now it is a question to
> put in instrumentation that will routinely report delays and actual
> place that cause delay with little or no overhead. Usually it is not
> problem with thread that tries to acquire lock but rather problem with
> thread that already owns the lock.

So - take the time from after locking the mutex to you unlock it and log
that if it's too long, rather than timing how long it takes to lock?

--
Hallvard

Chris M. Thomasson

unread,
Oct 16, 2008, 5:12:33 AM10/16/08
to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:nuvJk.209$514...@newsfe13.iad...

>
> <Serge...@gmail.com> wrote in message
> news:1baab1b9-99e3-4258...@b38g2000prf.googlegroups.com...
>> Hi guys,
>> Could somebody with in-depth knowledge about pthread internal
>> elaborate on how pthread_mutex_lock works.
>>
>> A fairytale tales:
>>
>> pthread_mutex_lock blocks until lock acquired while
>> pthread_mutex_trylock tries to acquire and return if other thread
>> holds the lock. Nice-n-simple...
>>
>> Question.
>> It seams that pthread_mutex_lock has competitive advantage compare to
>> pthread_mutex_trylock(). Here are two scenarios:
>> 1. pthread_mutex_lock(&mutex);
>>
>> 2. while(true) {
>> if (0 == pthread_mutex_trylock(&mutex)) break;
>> sleep(ThreadTimeSlice);
>> }
>
>
> The pattern is usually like:
>
>
> if (! pthread_mutex_trylock(...)) {
> // do something else
> } else {
> // cool!
> }
[...]

well, since I use POSIX interface I might as well get it right!!!!!!!!


if (pthread_mutex_trylock(...)) {
/* do something else */
} else {
/* cool! */
}


OF COURSE!!!!

;^(.......


Blah!


Szabolcs Ferenczi

unread,
Oct 16, 2008, 6:34:07 AM10/16/08
to
On Oct 16, 2:07 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> The pattern is usually like:
>
> if (! pthread_mutex_trylock(...)) {
>   // do something else
>
> } else {
>   // cool!
> }

You mean anti-pattern, don't you? In concurrent programming it is a
smell if you check the momentary status of some shared variables (in
this case the mutex) and you make any decision on that status. You
know why, don't you?

This anti-pattern should be used only in exceptional cases and most
probably it is never used in a well designed concurrent system.

Remember, you tried to use it and you ended up with a potentially live-
locked system and later on, as it is at this moment, with a guaranteed
to live-locked system:
http://groups.google.com/group/comp.programming.threads/msg/eece54e46795d84f

Best Regards,
Szabolcs

David Schwartz

unread,
Oct 16, 2008, 9:25:41 AM10/16/08
to
On Oct 16, 3:34 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> On Oct 16, 2:07 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
> > The pattern is usually like:
>
> > if (! pthread_mutex_trylock(...)) {
> >   // do something else
>
> > } else {
> >   // cool!
> > }
>
> You mean anti-pattern, don't you?

No, this is a common and sensible pattern.

> In concurrent programming it is a
> smell if you check the momentary status of some shared variables (in
> this case the mutex) and you make any decision on that status. You
> know why, don't you?

Yes, but that's not what he does in this case. What he does it tests
the momentary status of a shared variable and acts *atomically* based
on that status.

> This anti-pattern should be used only in exceptional cases and most
> probably it is never used in a well designed concurrent system.

It's a common and sensible pattern.

> Remember, you tried to use it and you ended up with a potentially live-
> locked system and later on, as it is at this moment, with a guaranteed

> to live-locked system:http://groups.google.com/group/comp.programming.threads/msg/eece54e46...

It would be an anti-pattern if you did it in a loop. But if you do it
this way:

if(trylock(lock))
{
yay_do_it_fast_way(stuff);
unlock(lock);
}
else
too_bad_do_it_slow_way(stuff);

That's a perfectly sensible pattern. Note that 'trylock' isn't a
'test' operation, it's an atomic 'test and set' operation.

DS

Szabolcs Ferenczi

unread,
Oct 16, 2008, 10:02:46 AM10/16/08
to
On Oct 16, 2:07 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> [...]
>
> The usual impl is that the thread will enter a kernel waitset on contention.
> However, there are so-called adaptive mutexs which behave like a spin-lock
> with a bounded number of spins before they actually call into the kernel to
> block; High-level pseudo-code would look like:
>
> void lock() {
>   for(unsigned spins = spin_max; spins > 0; --spin) {
>     if (try_to_lock()) {
>       return;
>     }
>     yield();
>   }
>   blocking_lock();
>
> }

There are adaptive mutexes, however, this pseudo code does not
describe one.

Because of the yield, you do not gain anything here. It is not
spinning at all but de-schedules itself a couple of times before it
would be de-scheduled awaiting for the lock to be available.

It makes some sense only if you omit the yield, then, it is something
like a pseudo code for adaptive mutexes.

Best Regards,
Szabolcs

Serge...@gmail.com

unread,
Oct 16, 2008, 10:15:03 AM10/16/08
to
On Oct 16, 2:54 am, Hallvard B Furuseth <h.b.furus...@usit.uio.no>
wrote:

Yep!
You are right, they are part of POSIX draft 2004
http://www.opengroup.org/onlinepubs/009695399/basedefs/semaphore.h.html
While I still have to compile with SUN C++ 5.5 among other compilers :
(. I finally understood why some people really like Java :).


Chris M. Thomasson

unread,
Oct 16, 2008, 12:45:27 PM10/16/08
to

"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:ca490e53-04b3-4eeb...@v28g2000hsv.googlegroups.com...

As usual you don't know what the heck your writing about. One simply NEEDS
the pseudo-code `yield()' in there! You obviously don't know how to build
spin-locks, well, it sure seems that way. Pick an arch... Humm, okay, I go
with simple one... x86... Here is a test... What would `yield()' resolve to
on x86?

David Schwartz

unread,
Oct 16, 2008, 9:12:59 PM10/16/08
to
On Oct 16, 7:02 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> There are adaptive mutexes, however, this pseudo code does not


> describe one.
>
> Because of the yield, you do not gain anything here. It is not
> spinning at all but de-schedules itself a couple of times before it
> would be de-scheduled awaiting for the lock to be available.
>
> It makes some sense only if you omit the yield, then, it is something
> like a pseudo code for adaptive mutexes.

Wtf?!?!

DS

Szabolcs Ferenczi

unread,
Oct 17, 2008, 3:55:55 AM10/17/08
to

Quite simple, my jumping friend. If you de-schedule your thread while
awaiting, it is still busy wait but not spinning at all.

Your little pet notoriously inserts yields and delays here and there
since he is not educated in concurrent programming and he thinks he
solves scheduling problems with delays and de-schedules.

Not mentioning that he has hacked together an infinite loop again so
he not only has problems with concurrent programming but ...

<qoute>


void lock() {
for(unsigned spins = spin_max; spins > 0; --spin) {
if (try_to_lock()) {
return;
}
yield();
}
blocking_lock();
}

</quote>

You just keep petting him.

I hope I could help.

Best Regards,
Szabolcs

Serge...@gmail.com

unread,
Oct 17, 2008, 9:34:49 AM10/17/08
to
Hi Guys,
Thank you s lot.
The bottom line.
By pooling mutex using try_lock we have by far less probability to
acquire mutex compare to lock. Proper waits with timeout for mutex
(or any other sync object) can be implemented only as part of OS.
Example Win32 WaitForSingleObject().
Any attempt to do this in user mode has good chances to succeed but
1. Should be very complex.
2. Very much depends on how particular flavor of Unix/Linux does
signals and wait etc.

In my case It looks like I will go with unpleasant solution. Each
mutex has map of timestamps for each thread that acquires this mutex.
When thread calls pthread_lock_mutex () its record 2 timestamps :
before and after call. After thread calls pthread_unlock_mutex () it
goes through map and check if it delayed any other threads and reports
delays. Mutex will also get recursion counter to and map may be actual
something else since I don't want to be involved in extensive memory
allocation/deallocation associated with maps.

Chris M. Thomasson

unread,
Oct 17, 2008, 3:27:53 PM10/17/08
to

"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:6cb79053-098c-4475...@64g2000hsm.googlegroups.com...

On Oct 17, 3:12 am, David Schwartz <dav...@webmaster.com> wrote:
> > On Oct 16, 7:02 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> > wrote:
> >
> > > There are adaptive mutexes, however, this pseudo code does not
> > > describe one.
> >
> > > Because of the yield, you do not gain anything here. It is not
> > > spinning at all but de-schedules itself a couple of times before it
> > > would be de-scheduled awaiting for the lock to be available.
> >
> > > It makes some sense only if you omit the yield, then, it is something
> > > like a pseudo code for adaptive mutexes.
> >
> > Wtf?!?!

> Quite simple, my jumping friend. If you de-schedule your thread while


> awaiting, it is still busy wait but not spinning at all.

ROFL! :^D


> Your little pet notoriously inserts yields and delays here and there
> since he is not educated in concurrent programming and he thinks he
> solves scheduling problems with delays and de-schedules.

lol.


> Not mentioning that he has hacked together an infinite loop again so
> he not only has problems with concurrent programming but ...

too funny.


> <qoute>
> void lock() {
> for(unsigned spins = spin_max; spins > 0; --spin) {
> if (try_to_lock()) {
> return;
> }
> yield();
> }
> blocking_lock();
> }
> </quote>

> You just keep petting him.

> I hope I could help.

Wow... You are the village idiot indeed. I know now for a fact that you have
never created any synchronization primitives from scratch. Your totally
ignorant. Let me ask you again:


what would the pseudo-code procedure `yield()' resolve to on an x86?


I know you don't have a clue and I am not going to do your homework; do it
yourself! When you have an answer, please let us know. I have a feeling hell
will freeze over before anything happens.

Chris M. Thomasson

unread,
Oct 17, 2008, 3:32:04 PM10/17/08
to
"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:6cb79053-098c-4475...@64g2000hsm.googlegroups.com...
On Oct 17, 3:12 am, David Schwartz <dav...@webmaster.com> wrote:
> > On Oct 16, 7:02 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> > wrote:
> >
> > > There are adaptive mutexes, however, this pseudo code does not
> > > describe one.
> >
> > > Because of the yield, you do not gain anything here. It is not
> > > spinning at all but de-schedules itself a couple of times before it
> > > would be de-scheduled awaiting for the lock to be available.
> >
> > > It makes some sense only if you omit the yield, then, it is something
> > > like a pseudo code for adaptive mutexes.
> >
> > Wtf?!?!

> Quite simple, my jumping friend. If you de-schedule your thread while


> awaiting, it is still busy wait but not spinning at all.

[...]

That has got to be the most retarded statement I have seen in the last
couple of years. Keep up the good work Szabolcs!

:^D

David Schwartz

unread,
Oct 17, 2008, 3:52:22 PM10/17/08
to
On Oct 17, 12:55 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> Quite simple, my jumping friend. If you de-schedule your thread while


> awaiting, it is still busy wait but not spinning at all.

De-schedule? Do you have any idea what we're talking about?

> Your little pet notoriously inserts yields and delays here and there
> since he is not educated in concurrent programming and he thinks he
> solves scheduling problems with delays and de-schedules.

Tell me something, in pseudo-code, what do you think the 'lock'
function should like like for an adaptive (spin/sleep) mutex?

> Not mentioning that he has hacked together an infinite loop again so
> he not only has problems with concurrent programming but ...
>
> <qoute>
> void lock() {
>   for(unsigned spins = spin_max; spins > 0; --spin) {
>     if (try_to_lock()) {
>       return;
>     }
>     yield();
>   }
>   blocking_lock();}
>
> </quote>

Note that the code excerpt above would be a perfect answer.

Here's some real code (simplified):
{
if(MaxNumCPU>1)
for(int i=200; i!=0; i--)
{
cpu_pause(); cpu_pause(); cpu_pause(); cpu_pause(); cpu_pause();
cpu_pause(); cpu_pause(); cpu_pause(); cpu_pause(); cpu_pause();
if( ((*(volatile int *)spinlock)==0) && iTryLock() ) return;
}

yield();
if(iTryLock()) return;
yield();
if(iTryLock()) return;

AtomicInc(&waiter_count);
do sys_futex(spinlock, FUTEX_WAIT, 1, NULL); while(!iTryLock());
AtomicDec(&waiter_count);
}

DS

Szabolcs Ferenczi

unread,
Oct 17, 2008, 4:09:09 PM10/17/08
to
On Oct 17, 9:52 pm, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 17, 12:55 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> wrote:
>
> > Quite simple, my jumping friend. If you de-schedule your thread while
> > awaiting, it is still busy wait but not spinning at all.
>
> De-schedule? Do you have any idea what we're talking about?

Sure. With the yield you de-schedule the current process. When you are
going to implement spinning, you do not want that because it will not
be spinning any more. So simple is it.

> > Your little pet notoriously inserts yields and delays here and there
> > since he is not educated in concurrent programming and he thinks he
> > solves scheduling problems with delays and de-schedules.
>
> Tell me something, in pseudo-code, what do you think the 'lock'
> function should like like for an adaptive (spin/sleep) mutex?

Something your furry pet has put together here but without the yield
and without the infinite loop.

> > Not mentioning that he has hacked together an infinite loop again so
> > he not only has problems with concurrent programming but ...
>
> > <qoute>
> > void lock() {
> >   for(unsigned spins = spin_max; spins > 0; --spin) {
> >     if (try_to_lock()) {
> >       return;
> >     }
> >     yield();
> >   }
> >   blocking_lock();}
>
> > </quote>
>
> Note that the code excerpt above would be a perfect answer.

Well, I would not like to use any device that you and your little pet
codes. That will end up in inefficient busy loops and infinite loops
like the pseudo-code above.

Best Regards,
Szabolcs

Chris M. Thomasson

unread,
Oct 17, 2008, 4:40:28 PM10/17/08
to
"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:8a116062-8e06-48d3...@17g2000hsk.googlegroups.com...

> On Oct 17, 9:52 pm, David Schwartz <dav...@webmaster.com> wrote:
> > On Oct 17, 12:55 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> > wrote:
> >
> > > Quite simple, my jumping friend. If you de-schedule your thread while
> > > awaiting, it is still busy wait but not spinning at all.
> >
> > De-schedule? Do you have any idea what we're talking about?

> Sure. With the yield you de-schedule the current process. When you are
> going to implement spinning, you do not want that because it will not
> be spinning any more. So simple is it.

Wow! :^|

I suppose you want the contenders to retry as fast as possible and catch on
fire in the process. I suppose you would build the lock portion of a
spin-lock like:


void lock(atomicword* this) {
while (XCHG(this, 1));
}


That is HORRIBLE! Ouch.


> > > Your little pet notoriously inserts yields and delays here and there
> > > since he is not educated in concurrent programming and he thinks he
> > > solves scheduling problems with delays and de-schedules.
> >
> > Tell me something, in pseudo-code, what do you think the 'lock'
> > function should like like for an adaptive (spin/sleep) mutex?

> Something your furry pet has put together here but without the yield
> and without the infinite loop.

You NEED the yield in there or else the damn loop will catch on fire! Also,
why do you think `yield()' would de-schedule the process? There are more
than one form of `yield()'. I ask you specifically what it would resolve to
on an x86. This is really easy question.


> > > Not mentioning that he has hacked together an infinite loop again so
> > > he not only has problems with concurrent programming but ...
> >
> > > <qoute>
> > > void lock() {
> > > for(unsigned spins = spin_max; spins > 0; --spin) {
> > > if (try_to_lock()) {
> > > return;
> > > }
> > > yield();
> > > }
> > > blocking_lock();}
> >
> > > </quote>
> >
> > Note that the code excerpt above would be a perfect answer.

> Well, I would not like to use any device that you and your little pet
> codes. That will end up in inefficient busy loops and infinite loops
> like the pseudo-code above.

Give me a break; you must be referring to the simple typo: '--spin' should
be '--spins'; fine:


void lock() {
for(unsigned spins = spin_max; spins > 0; --spins) {


if (try_to_lock()) {
return;
}
yield();
}
blocking_lock();
}


Where is the infinite loop?

Szabolcs Ferenczi

unread,
Oct 17, 2008, 4:58:11 PM10/17/08
to
On Oct 17, 10:40 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Szabolcs Ferenczi" <szabolcs.feren...@gmail.com> wrote in message

>
> news:8a116062-8e06-48d3...@17g2000hsk.googlegroups.com...
>
> > On Oct 17, 9:52 pm, David Schwartz <dav...@webmaster.com> wrote:
> > > On Oct 17, 12:55 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> > > wrote:
>
> > > > Quite simple, my jumping friend. If you de-schedule your thread while
> > > > awaiting, it is still busy wait but not spinning at all.
>
> > > De-schedule? Do you have any idea what we're talking about?
> > Sure. With the yield you de-schedule the current process. When you are
> > going to implement spinning, you do not want that because it will not
> > be spinning any more. So simple is it.
>
> Wow!  :^|

Surprised? You have a lot to learn.

> I suppose you want the contenders to retry as fast as possible and catch on
> fire in the process. I suppose you would build the lock portion of a
> spin-lock like:
>
> void lock(atomicword* this) {
>   while (XCHG(this, 1));
>
> }
>
> That is HORRIBLE! Ouch.

Yes, it is horrible. Agreed. You hacked it.

Best Regards,
Szabolcs

Chris M. Thomasson

unread,
Oct 17, 2008, 5:36:05 PM10/17/08
to

"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:ba5b8a5d-d016-4d63...@u57g2000hsf.googlegroups.com...

On Oct 17, 10:40 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > "Szabolcs Ferenczi" <szabolcs.feren...@gmail.com> wrote in message
> >
> > news:8a116062-8e06-48d3...@17g2000hsk.googlegroups.com...
> >
> > > On Oct 17, 9:52 pm, David Schwartz <dav...@webmaster.com> wrote:
> > > > On Oct 17, 12:55 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> > > > wrote:
> >
> > > > > Quite simple, my jumping friend. If you de-schedule your thread
> > > > > while
> > > > > awaiting, it is still busy wait but not spinning at all.
> >
> > > > De-schedule? Do you have any idea what we're talking about?
> > > Sure. With the yield you de-schedule the current process. When you are
> > > going to implement spinning, you do not want that because it will not
> > > be spinning any more. So simple is it.
> >
> > Wow! :^|

> Surprised? You have a lot to learn.

LOL!


> > I suppose you want the contenders to retry as fast as possible and catch
> > on
> > fire in the process. I suppose you would build the lock portion of a
> > spin-lock like:
> >
> > void lock(atomicword* this) {
> > while (XCHG(this, 1));
> >
> > }
> >
> > That is HORRIBLE! Ouch.

> Yes, it is horrible.

Your the idiot who did not want any yield mechanism on the slow-path. I show
you what it looks like when your advise if followed and you say its
horrible. Wow. Anyway, nice to see you totally agree with me on this point!
A yield mechanism is needed or else its horrible.

:^D


> Agreed. You hacked it.

too funny! =^D

Chris M. Thomasson

unread,
Oct 17, 2008, 5:40:11 PM10/17/08
to

"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:a2030aa6-e695-406c...@k7g2000hsd.googlegroups.com...

As usual you snipped relevant portion of my response; sigh... Let me add it
in here for proper context:


> > Also, you can use `pthread_mutex_trylock()' to implement complicated
> > locking
> > stratgeies. All in all, tricky locking schemes aside, well, its usually
> > only
> > a good idea to use try-lock when you know that there is always something
> > to
> > do in case the attempt fails...


This pattern can be very useful indeed. Why should a thread block for a
mutex when it knows that it can be doing something else _if_ the acquisition
failed? Instead of queuing on a kernel waitset in response to contention on
the mutex, the thread will be doing actual work; IMHO, this is a good
thing...

David Schwartz

unread,
Oct 17, 2008, 6:10:32 PM10/17/08
to
On Oct 17, 1:09 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> Sure. With the yield you de-schedule the current process. When you are
> going to implement spinning, you do not want that because it will not
> be spinning any more. So simple is it.

How do you know that these are semantics of his pseudo-code "yield"
operation? He pretty clearly stated that this was high-level pseudo
code meant to illustrate a completely different point that has nothing
whatsoever to do with what belongs inside the loop of a spin/sleep
lock. In any event, that operation is commonly referred to as a
"yield" as its intention is to yield FSB and CPU execution resources.

DS

Chris M. Thomasson

unread,
Oct 17, 2008, 7:24:07 PM10/17/08
to
"David Schwartz" <dav...@webmaster.com> wrote in message
news:e5c6541d-c490-4e82...@v13g2000pro.googlegroups.com...

On Oct 17, 1:09 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> > Sure. With the yield you de-schedule the current process. When you are
> > going to implement spinning, you do not want that because it will not
> > be spinning any more. So simple is it.

> How do you know that these are semantics of his pseudo-code "yield"
> operation? He pretty clearly stated that this was high-level pseudo
> code meant to illustrate a completely different point that has nothing
> whatsoever to do with what belongs inside the loop of a spin/sleep
> lock.

Exactly.


> In any event, that operation is commonly referred to as a
> "yield" as its intention is to yield FSB and CPU execution resources.

Yeah. Here is some fully compliable example code (MSVC) of a extremely
simple, perhaps naive, spinlock and a little test application for Szabolcs
to take a look at:
_______________________________________________________________________
#include <limits.h>


/* Simple IA-32 Spinlock Implementation
_____________________________________________________________*/
typedef __int32 atomic_word_ia32;


typedef char sassert_ia32[
(sizeof(void*) == 32 / CHAR_BIT) &&
(sizeof(void*) == sizeof(atomic_word_ia32))
? 1 : -1
];


typedef atomic_word_ia32 spinlock_ia32;
#define SPINLOCK_IA32_INITIALIZER 0


__declspec(naked) void
spinlock_ia32_acquire(
spinlock_ia32* const self
) {
_asm {
MOV ECX, [ESP + 4]
MOV EAX, 1

spinlock_ia32_acquire_retry:
XCHG [ECX], EAX
TEST EAX, EAX
JNZ spinlock_ia32_acquire_failed
RET

spinlock_ia32_acquire_failed:
PAUSE
JMP spinlock_ia32_acquire_retry
}
}


__declspec(naked) void
spinlock_ia32_release(
spinlock_ia32* const self
) {
_asm {
MOV ECX, [ESP + 4]
MOV DWORD PTR [ECX], 0
RET
}
}


/* Simple IA-32 Spinlock API Abstraction
_____________________________________________________________*/
typedef spinlock_ia32 spinlock;
#define SPINLOCK_INITIALIZER SPINLOCK_IA32_INITIALIZER
#define spinlock_acquire spinlock_ia32_acquire
#define spinlock_release spinlock_ia32_release


/* Simple PThread Test Application
_____________________________________________________________*/
#include <pthread.h>
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>


#define THREADS 32
#define ITERS 1000
#define YIELD 8
#define RUNS 100


static spinlock g_slock = SPINLOCK_INITIALIZER;
static unsigned g_count = 0;


void* thread_entry(void* state) {
unsigned i;
for (i = 0; i < ITERS; ++i) {
spinlock_acquire(&g_slock);
++g_count;
spinlock_release(&g_slock);
if (! (i % YIELD)) {
sched_yield();
}
}
return 0;
}


int main() {
unsigned r, i;
unsigned const expected = (THREADS * ITERS) * RUNS;
pthread_t tid[THREADS];

for (r = 0; r < RUNS; ++r) {
for (i = 0; i < THREADS; ++i) {
pthread_create(&tid[i], NULL, thread_entry, NULL);
}

for (i = 0; i < THREADS; ++i) {
pthread_join(tid[i], NULL);
}
}

printf("g_count == %u / expected == %d\n", g_count, expected);

if (g_count != expected) {
assert(g_count == expected);
abort();
}

return 0;
}
_______________________________________________________________________


I wonder if he even knows what the `PAUSE' instruction does... I wonder if
he understands that the pseudo-code procedure `yield()' resolves to the
PAUSE instruction on an IA-32. Let me preempt his response... He will say
something stupid like:

`Hey, this was about adaptive locks, not spin-locks. Also you have a call to
`sched_yield()' in the test application which means that you need to go to
school. Now kneel before your master.'

Then I will have to inform him that a spin-lock is basically analogous to
the spin portion of an adaptive lock. And the 'sched_yield()' is in there
only to attempt to induce some more fine-grain interleaving of threads.

Chris M. Thomasson

unread,
Oct 17, 2008, 11:46:03 PM10/17/08
to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:q19Kk.3132$TX6....@newsfe06.iad...
[...]

> printf("g_count == %u / expected == %d\n", g_count, expected);

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The line above should read as:

printf("g_count == %u / expected == %u\n", g_count, expected);


of course!


ARGH!

[...]

Serge...@gmail.com

unread,
Oct 20, 2008, 9:47:49 AM10/20/08
to
On Oct 17, 4:40 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Szabolcs Ferenczi" <szabolcs.feren...@gmail.com> wrote in message

This code is somewhat right for single CPU: (Logic: Try lock if it is
already taken gave all time to current lock holder to finish its job)

On multi CPU it is wrong/bad/HORRIBLE.
try_to_lock() - means you pooling lock.
yield() - means you give away remaining part of your time slice.
Effectively your thread is scheduled to the end of your priority
class. Next try_to_lock() happen only after all other threads spend
their time slices including some that also competing for the same
lock. Looks like your thread runs at the lower priority class. On
multi CPU you MUST get rid of yield().


void lock() {
for(unsigned spins = spin_max; spins > 0; --spins) {
if (try_to_lock()) return;
}

blocking_lock();
}

But in this case we waste a lot of time by going in/out kernel mode in
try_to_lock().
So we better of with introduction user-mode spin-lock using interlock
f-n or (like Chris M. Thomasson's XCHG)
instead of try_to_lock(). Opppssss. We reinvent Microsoft
CRITICAL_SECTION. Thank's.

BTW. Wizards of POSIX & unix!!! What is posix equivalent for Microsoft
interlock f-n? What is sparc equivalent for XCHG?

Serge...@gmail.com

unread,
Oct 20, 2008, 9:51:56 AM10/20/08
to

Wow!
Looks great.

B.T.W. Do you know SPARC assembly?
From the top of your head do you know how to do need atomic increment
and exchange?

David Schwartz

unread,
Oct 20, 2008, 2:37:26 PM10/20/08
to
On Oct 20, 6:47 am, Sergey....@gmail.com wrote:

> This code is somewhat right for single CPU: (Logic: Try lock if it is
> already taken gave all time to current lock holder to finish its job)

Actually, you have it backwards. This is horrible for single CPU and
only suitable for multiple CPU.

> On multi CPU it is wrong/bad/HORRIBLE.
> try_to_lock() - means you pooling lock.

You are assuming his 'try_to_lock' function has a deficiency, then
blaming him for the deficiency you assumed. Most likely, his
try_to_lock function uses the following pseudo-code:

1) Read the lock state variable to see if it's locked. If so, return
failed.
2) Atomically test-and-set the lock state. If we set it, return
succeeded.
3) Return failed.

Notice that in the vast majority of cases, it doesn't modify anything.

> yield() - means you give away remaining part of your time slice.

Umm, no it doesn't. It means you give away FSB and CPU execution
resources. Again, you are assuming he used the wrong yield function,
then blaming him for it.

> Effectively your thread is scheduled to the end of your priority
> class. Next try_to_lock() happen only after all other threads spend
> their time slices including some that also competing for the same
> lock. Looks like your thread runs at the lower priority class. On
> multi CPU you MUST get rid of yield().
> void lock() {
>   for(unsigned spins = spin_max; spins > 0; --spins) {
>     if (try_to_lock())   return;
>   }
>   blocking_lock();
>
> }
>
> But in this case we waste a lot of time by going in/out kernel mode in
> try_to_lock().

Umm, you just broke his code. Now it spins too fast. On a hyper-
threaded CPU, the rapid spinning will steal execution resources from
other virtual CPUs, preventing them from releasing the lock. When you
do finally get the lock, you'll take the worst possible branch
misprediction penalty.

You just took elegant, clever code and butchered it.

DS

Szabolcs Ferenczi

unread,
Oct 20, 2008, 2:58:11 PM10/20/08
to
On Oct 20, 8:37 pm, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 20, 6:47 am, Sergey....@gmail.com wrote:
>
> > This code is somewhat right for single CPU: (Logic: Try lock if it is
> > already taken gave all time to current lock holder to finish its job)
>
> Actually, you have it backwards. This is horrible for single CPU and
> only suitable for multiple CPU.

It is nonsense on a single CPU and, we are talking about the yielding
version, on multiple CPU it is not spinning but it is unnecessarily
complicating the blocking waiting (see yielding and pooling) .

Best Regards,
Szabolcs

Serge...@gmail.com

unread,
Oct 20, 2008, 4:26:36 PM10/20/08
to

Hi,


> You just took elegant, clever code and butchered it.

It is all about what you try to achieve.
My question was about to simulate lock_mutex(...) using
try_lock_mutex(...).
Your proposal.


void lock() {
for(unsigned spins = spin_max; spins > 0; --spins) {
if (try_to_lock()) return;

yield();
}
blocking_lock();
}

My understanding of internal of try_lock_mutex(...) is same with
yours.

It would barely work. Lets call your lock DSLock Consider scenario
when
Scenario I
Thread A: for(;;) { lock_mutex(); Task; unlock_mutex(); }
Thread B: for(;;) { lock_mutex(); Task; unlock_mutex(); }

Scenario II
Thread A: for(;;) { DSLock(); Task; unlock_mutex(); }
Thread B: for(;;) { lock_mutex(); Task; unlock_mutex(); }

I actually learn from mistake. I wrote implementation close to yours
several yrs ago.
When Task is next to nothing ThreadB has 2-6 times more chances to own
the lock.
When Task getting bigger ThreadB has over 20 times more chances to own
the lock.

Here is my understanding of try_lock_mutex (same with yours):
Try to take over mutex, if it taken move on. While lock_mutex seams to
queue request. DSLock only succeed if mutex is available at the moment
it try_lock_mutex(). So it has to pool it as frequently as possible to
catch the moment then Thread B just unlock the mutex before and it
claims back. In contrast thread B just has to lay its claim and wait
for notification that mutex become available. Here is number of CPUs
comes to play. On single CPU ThreadA must yield CPU so if thread B
owns mutex it has some CPU to finish Task. On multi CPU box Thread A
better taking as much CPU as possible to increase pooling frequency
and probability to gain the mutex. ThreadB still has other CPUs to do
the Task. That why I would remove yield() from multy CPU
implementation and even complain about cost of getting in/out kernel
mode.
P.S.
I am not sure why you call this code "elegant", it is matte of taste.
You can see it is pure algorithmic thing and has little to do with
FSB, "branch prediction" etc.

David Schwartz

unread,
Oct 20, 2008, 5:38:15 PM10/20/08
to
On Oct 20, 11:58 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> On Oct 20, 8:37 pm, David Schwartz <dav...@webmaster.com> wrote:

> > On Oct 20, 6:47 am, Sergey....@gmail.com wrote:

> > > This code is somewhat right for single CPU: (Logic: Try lock if it is
> > > already taken gave all time to current lock holder to finish its job)

> > Actually, you have it backwards. This is horrible for single CPU and
> > only suitable for multiple CPU.

> It is nonsense on a single CPU

I agree. It is nonsense on a single CPU. It was intended to show an
adaptive lock for a multi-CPU setup. The only way you can argue it's
appropriate on both a single CPU and an SMP system is to assume that
'yield' adjusts its behavior based on the number of CPUs.

> and, we are talking about the yielding
> version, on multiple CPU it is not spinning but it is unnecessarily
> complicating the blocking waiting (see yielding and pooling) .

The yield is needed to prevent several problems:

1) Without it, the spinning thread could monopolize the FSB, slowing
the rest of the system to a crawl.

2) Without it, the spinning thread could monopolize CPU execution
resources. If the thread that held the lock were running on another
virtual CPU that shares this physical core, that could result in poor
performance.

3) Without it, the spinning thread would suffer a maximally painful
mis-predicted branch when it exited the spin loop.

The "yield" pseudo-code in the example is just enough yielding to
avoid these three problems. (Or whatever is their equivalent on your
CPU.) The idea is just to keep the spin from being too tight, if your
CPU requires such a thing.

DS

Szabolcs Ferenczi

unread,
Oct 20, 2008, 5:49:41 PM10/20/08
to
On Oct 20, 3:47 pm, Sergey....@gmail.com wrote:
> On Oct 17, 4:40 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
>
> > "Szabolcs Ferenczi" <szabolcs.feren...@gmail.com> wrote in message
> >news:8a116062-8e06-48d3...@17g2000hsk.googlegroups.com...
>
> > > On Oct 17, 9:52 pm, David Schwartz <dav...@webmaster.com> wrote:
> > > > On Oct 17, 12:55 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>

> > > > wrote:
>
> > > > > Quite simple, my jumping friend. If you de-schedule your thread while
> > > > > awaiting, it is still busy wait but not spinning at all.
>
> > > > De-schedule? Do you have any idea what we're talking about?
> > > Sure. With the yield you de-schedule the current process. When you are
> > > going to implement spinning, you do not want that because it will not
> > > be spinning any more. So simple is it.
>
> > Wow!  :^|
>
> > I suppose you want the contenders to retry as fast as possible and catch on
> > fire in the process. I suppose you would build the lock portion of a
> > spin-lock like:
>
> > void lock(atomicword* this) {
> >   while (XCHG(this, 1));
>
> > }
>
> > That is HORRIBLE! Ouch.
>
> > > > > Your little pet notoriously inserts yields and delays here and there
> > > > > since he is not educated in concurrent programming and he thinks he
> > > > > solves scheduling problems with delays and de-schedules.
>
> > > > Tell me something, in pseudo-code, what do you think the 'lock'
> > > > function should like like for an adaptive (spin/sleep) mutex?
> > > Something your furry pet has put together here but without the yield
> > > and without the infinite loop.
>
> > You NEED the yield in there or else the damn loop will catch on fire! Also,
> > why do you think `yield()' would de-schedule the process? There are more
> > than one form of `yield()'. I ask you specifically what it would resolve to
> > on an x86. This is really easy question.
>
> > > > > Not mentioning that he has hacked together an infinite loop again so
> > > > > he not only has problems with concurrent programming but ...
>
> > > > > <qoute>
> > > > > void lock() {
> > > > > for(unsigned spins = spin_max; spins > 0; --spin) {

> > > > > if (try_to_lock()) {
> > > > > return;
> > > > > }
> > > > > yield();
> > > > > }
> > > > > blocking_lock();}
>
> > > > > </quote>
>
> > > > Note that the code excerpt above would be a perfect answer.
> > > Well, I would not like to use any device that you and your little pet
> > > codes. That will end up in inefficient busy loops and infinite loops
> > > like the pseudo-code above.
>
> > Give me a break; you must be referring to the simple typo: '--spin' should
> > be '--spins'; fine:
>
> > void lock() {
> >   for(unsigned spins = spin_max; spins > 0; --spins) {
> >     if (try_to_lock()) {
> >       return;
> >     }
> >     yield();
> >   }
> >   blocking_lock();
>
> > }
>
> > Where is the infinite loop?

Hmm...

> void lock() {
>   for(unsigned spins = spin_max; spins > 0; --spins) {
>     if (try_to_lock())   return;
>     yield();
>   }
>   blocking_lock();}
>

> This code is somewhat right for single CPU: (Logic: Try lock if it is
> already taken gave all time to current lock holder to finish its job)

On a single CPU you do not need any spin lock and furthermore you do
not need any adaptive lock either. It would not make any sense. If the
lock is locked by another thread, any spinning is useless since that
one must progress so this one must release the CPU, i.e. it must hang
on the lock.

> On multi CPU it is wrong/bad/HORRIBLE.

Yes, that is what I am telling for a while too. On multiple CPUs
yielding does not make sense either. Yielding just destroys spinning
but it is not as good as hanging on the lock immediately and
atomically if it is locked.

> try_to_lock() - means you pooling lock.

> yield() - means you give away remaining part of your time slice.

> Effectively your thread is scheduled to the end of your priority
> class. Next try_to_lock() happen only after all other threads spend
> their time slices including some that also competing for the same
> lock. Looks like your thread runs at the lower priority class. On
> multi CPU you MUST get rid of yield().
> void lock() {
>   for(unsigned spins = spin_max; spins > 0; --spins) {
>     if (try_to_lock())   return;
>   }
>   blocking_lock();
>
> }

As pseudo code, yes, that is it and it is applicable on a multi-core
system but only among threads running on different CPUs. But it is
only a pseudo code, the adaptive lock is not implemented like that.

> But in this case we waste a lot of time by going in/out kernel mode in
> try_to_lock().

> So we better of with introduction user-mode spin-lock using interlock
> f-n or (like Chris M. Thomasson's  XCHG)
> instead of try_to_lock(). Opppssss. We reinvent Microsoft
> CRITICAL_SECTION. Thank's.

Well, any disciplined person uses the programming tools, which are
available on the target system and does not try to re-invent them.
There are some hackers, however, who think they spend the effort
reinventing mutexes and condition variables.

Best Regards,
Szabolcs

Chris M. Thomasson

unread,
Oct 20, 2008, 7:39:26 PM10/20/08
to

"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:83acdf26-c852-4a49...@b2g2000prf.googlegroups.com...

On Oct 20, 3:47 pm, Sergey....@gmail.com wrote:
> > On Oct 17, 4:40 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
[...]

> > > void lock() {
> > > for(unsigned spins = spin_max; spins > 0; --spins) {
> > > if (try_to_lock()) {
> > > return;
> > > }
> > > yield();
> > > }
> > > blocking_lock();
> >
> > > }
> >
> > > Where is the infinite loop?

> Hmm...

Hmmm What!? Where is the infinite loop?

:^|


> > void lock() {
> > for(unsigned spins = spin_max; spins > 0; --spins) {
> > if (try_to_lock()) return;
> > yield();
> > }
> > blocking_lock();}
> >
> > This code is somewhat right for single CPU: (Logic: Try lock if it is
> > already taken gave all time to current lock holder to finish its job)

> On a single CPU you do not need any spin lock and furthermore you do
> not need any adaptive lock either. It would not make any sense. If the
> lock is locked by another thread, any spinning is useless since that
> one must progress so this one must release the CPU, i.e. it must hang
> on the lock.

Who is talking about uni-processor systems? Your brain-damage automatically
thinks single processor; well, okay. live in the past.


> > On multi CPU it is wrong/bad/HORRIBLE.

> Yes, that is what I am telling for a while too. On multiple CPUs
> yielding does not make sense either. Yielding just destroys spinning
> but it is not as good as hanging on the lock immediately and
> atomically if it is locked.

You are so dense that a rock has more sense than you! You don't get it; and
probably never will...

Wow.


> try_to_lock() - means you pooling lock.
> yield() - means you give away remaining part of your time slice.

NO, NO, NO; NO! Do you have a MENTAL BLOCK!?!?

yield() is resolves to the PAUSE instruction on x86. Your brain is numb to
the fact. What does the PAUSE instruction do Szabolcs? You have no idea. I
have given you so many tries, I even gave you CODE! You can't figure it out.
So be it. What's new in the life of our village idiot. Answer: NOTHING!


> > Effectively your thread is scheduled to the end of your priority
> > class. Next try_to_lock() happen only after all other threads spend
> > their time slices including some that also competing for the same
> > lock. Looks like your thread runs at the lower priority class. On
> > multi CPU you MUST get rid of yield().
> > void lock() {
> > for(unsigned spins = spin_max; spins > 0; --spins) {
> > if (try_to_lock()) return;
> > }
> > blocking_lock();
> >
> > }

> As pseudo code, yes, that is it and it is applicable on a multi-core
> system but only among threads running on different CPUs.

ROFL!!!!!


> But it is
> only a pseudo code, the adaptive lock is not implemented like that.

You have NO idea how to implement an adaptive lock on ANY system. This is
another totally ignorant statement from you. Wow.


Your FIRED!


> > But in this case we waste a lot of time by going in/out kernel mode in
> > try_to_lock().
> > So we better of with introduction user-mode spin-lock using interlock
> > f-n or (like Chris M. Thomasson's XCHG)
> > instead of try_to_lock(). Opppssss. We reinvent Microsoft
> > CRITICAL_SECTION. Thank's.

> Well, any disciplined person uses the programming tools, which are
> available on the target system and does not try to re-invent them.
> There are some hackers, however, who think they spend the effort
> reinventing mutexes and condition variables.

Too funny. I created that spinlock code to try and teach you what the
pseudo-code `yield()' would resolve to on an x86. Its the PAUSE instruction
of course. What does that instruction do? Does it de-schedule threads? You
think it does, but your DEAD wrong an all fronts Szabolcs. Well, nothing new
here; move along, move along...

:^/

Chris M. Thomasson

unread,
Oct 20, 2008, 7:49:09 PM10/20/08
to
<Serge...@gmail.com> wrote in message
news:f659b9e3-153a-462f...@k36g2000pri.googlegroups.com...

On Oct 17, 4:40 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
[...]

> > void lock() {
> > for(unsigned spins = spin_max; spins > 0; --spins) {
> > if (try_to_lock()) {
> > return;
> > }
> > yield();
> > }
> > blocking_lock();
> >
> > }
> >
> > Where is the infinite loop?

> void lock() {
> for(unsigned spins = spin_max; spins > 0; --spins) {
> if (try_to_lock()) return;
> yield();
> }
> blocking_lock();
> }
> This code is somewhat right for single CPU: (Logic: Try lock if it is
> already taken gave all time to current lock holder to finish its job)

> On multi CPU it is wrong/bad/HORRIBLE.
> try_to_lock() - means you pooling lock.
> yield() - means you give away remaining part of your time slice.

100% totally backwards. Your listening to the village idiot. He is adept in
giving as$ backwards advise, and misleading people across the board.


> Effectively your thread is scheduled to the end of your priority
> class.
> Next try_to_lock() happen only after all other threads spend
> their time slices including some that also competing for the same
> lock. Looks like your thread runs at the lower priority class. On
> multi CPU you MUST get rid of yield().

NO, NO, NO; NO!!!!! What makes you think the pseudo-code `yield()'
de-schedules threads!?


> void lock() {
> for(unsigned spins = spin_max; spins > 0; --spins) {
> if (try_to_lock()) return;
> }
> blocking_lock();
> }

> But in this case we waste a lot of time by going in/out kernel mode in
> try_to_lock().

WTF?!?!?! What makes you think the pseudo-code `try_to_lock()' has ANYTHING
to do with the Kernel!?

Give me a break.


> So we better of with introduction user-mode spin-lock using interlock
> f-n or (like Chris M. Thomasson's XCHG)
> instead of try_to_lock(). Opppssss. We reinvent Microsoft
> CRITICAL_SECTION. Thank's.

Sorry. You __totally__ misunderstand the point of my post. See, Szabolcs
(the village idiot) think that `yield()' has something to do with thread
scheduling. I challenged him to resolve the pseudo-code `yield()' procedure
on an x86. He totally failed. I post code. Obviously, PAUSE == yield().


> BTW. Wizards of POSIX & unix!!! What is posix equivalent for Microsoft
> interlock f-n? What is sparc equivalent for XCHG?

The deprecated `SWAP' instruction of course. If you want your SPARC asm to
be portable, you need to use `CAS'.


BTW, listening to Szabolcs will only result in brain-damage. This thread
proves it. Period.

P.S.


Sorry for being so harsh, but Szabolcs pissed me off.

;^|

Chris M. Thomasson

unread,
Oct 20, 2008, 7:55:40 PM10/20/08
to

<Serge...@gmail.com> wrote in message
news:5c55bc32-710b-4edf...@q26g2000prq.googlegroups.com...

WRONG! Can you understand what David is writing about? Don't make me cite
Intel arch manual! You will see what the pseudo-code `yield()' resolves to.


[...]


> I am not sure why you call this code "elegant", it is matte of taste.
> You can see it is pure algorithmic thing and has little to do with
> FSB, "branch prediction" etc.

lol.

Chris M. Thomasson

unread,
Oct 20, 2008, 8:04:57 PM10/20/08
to
<Serge...@gmail.com> wrote in message
news:01f85f46-f36b-4830...@g17g2000prg.googlegroups.com...

On Oct 17, 7:24 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > "David Schwartz" <dav...@webmaster.com> wrote in message
> >
> > news:e5c6541d-c490-4e82...@v13g2000pro.googlegroups.com...
> > On Oct 17, 1:09 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> [...]

> Wow!
> Looks great.

> B.T.W. Do you know SPARC assembly?

Yes. BTW, I sold my T2000 server which I won from the following contest:

https://coolthreads.dev.java.net
(I am the vZOOM project)


> From the top of your head do you know how to do need atomic increment
> and exchange?

http://groups.google.com/group/comp.arch/msg/04cb5e2ca2a7e19a

Atomic INC/DEC, well, use a CAS loop. BTW, why don't you use OpenSparc
database?

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/common/atomic/sparcv9/atomic.s

No big deal.

David Schwartz

unread,
Oct 21, 2008, 12:00:38 AM10/21/08
to
On Oct 20, 1:26 pm, Sergey....@gmail.com wrote:

> I actually learn from mistake. I wrote implementation close to yours
> several yrs ago.
> When Task is next to nothing ThreadB has 2-6 times more chances to own
> the lock.
> When Task getting bigger ThreadB has over 20 times more chances to own
> the lock.

What a surprise, his adaptive spin/sleep lock algorithm doesn't work
well under the conditions in which you would never, ever even consider
using an adaptive spin/sleep lock. Who would have thought?!

DS

Serge...@gmail.com

unread,
Oct 21, 2008, 11:17:39 AM10/21/08
to
On Oct 20, 7:49 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> <Sergey....@gmail.com> wrote in message

Hi,

>WTF?!?!?! What makes you think the pseudo-code `try_to_lock()' has ANYTHING
>to do with the Kernel!?

My understanding that try_to_lock() is system call ( it must result in
go in and out of kernel). Time spend for checking lock availability is
nothing compare to time for going in/out kernel mode.
I may be wrong about how mutex calls are implemented.

Description for sched_yield() :http://www.opengroup.org/onlinepubs/
007908799/xsh/sched_yield.html
The sched_yield() function forces the running thread to relinquish
the processor until it again becomes the head of its thread list.

It sound that control goes back to scheduler. But I see your point why
we need PAUSE here. I was under impression that it was just NOP (I was
until P4).
Thank you for PAUSE.


Thank you also for help on SPARC instructions.
I am new to unix/linux and don't want to reinvent bicycle or re-write
own locks for unix :).

Szabolcs Ferenczi

unread,
Oct 21, 2008, 12:18:32 PM10/21/08
to
On Oct 17, 3:12 am, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 16, 7:02 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>

> wrote:
>
> > There are adaptive mutexes, however, this pseudo code does not
> > describe one.
>
> > Because of the yield, you do not gain anything here. It is not
> > spinning at all but de-schedules itself a couple of times before it
> > would be de-scheduled awaiting for the lock to be available.
>
> > It makes some sense only if you omit the yield, then, it is something
> > like a pseudo code for adaptive mutexes.
>
> Wtf?!?!

Since you are so intelligently react to the adaptive mutex subject as
above, I can help you with some very basic stuff. Please read it very
carefully:

"... On a multiprocessor system, an adaptive mutex starts as a
standard semaphore implemented as a spinlock. If the data are locked
and therefore already in use, the adaptive mutex does one of two
things. If the lock is held by a thread that is currently running on
another CPU, the thread spins while waiting for the lock to become
available, because the thread holding the lock is likely to finish
soon. If the thread holding the lock is not currently in run state,
the thread blocks, going to sleep until it is awakened by the release
of the lock. It is put to sleep so that it will not spin while
waiting, since the lock will not be feed very soon. A lock held by a
sleeping thread is likely to be in this category. On a single-
processor system, the thread holding the lock is never running if the
lock is being tested by another thread, because only one thread can
run at a time. Therefore, on this type of system, threads always sleep
rather than spin if they encounter a lock."

pp. 218-219 in Silberschatz et al, Operating System Concepts, 7th ed,
2005, John Wiley & Sons, NJ

Furthermore, I have looked up one version of some pseudo code (Solaris
version) for you and for your little pet with the hope that finally
you can learn from it. By the way can you to tell me where is any
yield in it?

Multithreading in the Solaris Operating Environment
A Technical White Paper
http://www.sun.com/software/whitepapers/solaris9/multithread.pdf
p.11
"Solaris 2.6 software onwards implements adaptive locking for mutexes
initialized as:
- PTHREAD_MUTEX_PRIVATE (Pthreads)
- USYNC_THREAD (UI threads)
The following pseudocode illustrates the adaptive mutex lock
implementation:
if (test-and-set succeeds)
return /* lock acquired */
spin count = 0
while (lock holder on cpu AND spin count < max spins) {
if (lock not still held) /* ordinary load */
if (test-and-set succeeds)
return /* lock acquired */
increment spin count
}
mutex lock system call /* always acquired the lock */
return /* lock acquired */
"

I do hope I was of help.

Best Regards,
Szabolcs

David Schwartz

unread,
Oct 21, 2008, 1:26:15 PM10/21/08
to
On Oct 21, 8:17 am, Sergey....@gmail.com wrote:

> My understanding that try_to_lock() is system call ( it must result in
> go in and out of kernel).

Then you're not on the same page as everyone else. This was meant to
demonstrate a spin/sleep lock, so 'try_to_lock' would have to be an
atomic compare-and-swap instruction or similar.

Here's a real implementaion:

bool try_to_lock(void)
{
if(*(volatile int *)spinlock==SPIN_LOCKED) return false;
return CompareExchange(spinlock, SPIN_UNLOCKED,
SPIN_LOCKED)==SPIN_UNLOCKED;
}

Where 'CompareExchange' atomically compares the contents spinlock to
SPIN_UNLOCK and if it is SPIN_UNLOCKED sets it to SPIN_LOCKED,
returning the old value. On x86, it could be 'lock cmpxchgl'.

The first (apparently redundant) test is needed to avoid dirtying the
cache line containing 'spinlock' if the 'spinlock' has been held for a
long time. Otherwise, two threads calling try_to_lock in rapid
alternation could saturate the FSB, preventing the thread that holds
the spinlock from making forward progress at a normal rate.

> Time spend for checking lock availability is
> nothing compare to  time  for going in/out kernel mode.

That's why only an idiot would do that.

> I may be wrong about how mutex calls are implemented.

Obviously. "Here's my broken implementation of your algorithm that
shows it's broken."

> Description for sched_yield() :http://www.opengroup.org/onlinepubs/
> 007908799/xsh/sched_yield.html
>     The sched_yield() function forces the running thread to relinquish
> the processor until it again becomes the head of its thread list.
>
> It sound that control goes back to scheduler. But I see your point why
> we need PAUSE here. I was under impression that it was just NOP (I was
> until P4).
> Thank you for PAUSE.

You need PAUSE because without it, a thread that spins on the lock may
starve another thread running in the same physical core by consuming
CPU execution resources. It also mitigates the mispredicted branch
penalty. You also need it to prevent the spinning core from heating
the CPU and wasting power or even triggering thermal throttling.

In essence, you need it because the architecture documentation says
you need it. It does whatever architecture magic is needed to make it
work.

DS

David Schwartz

unread,
Oct 21, 2008, 1:29:46 PM10/21/08
to
On Oct 21, 9:18 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> you can learn from it. By the way can you to tell me where is any
> yield in it?

>    if (test-and-set succeeds)


>       return /* lock acquired */
>    spin count = 0
>    while (lock holder on cpu AND spin count < max spins) {
>       if (lock not still held) /* ordinary load */
>          if (test-and-set succeeds)
>             return /* lock acquired */
>       increment spin count
>    }
>    mutex lock system call /* always acquired the lock */
>    return /* lock acquired */

On this platform-specific implementation, the only 'yield' needed is
the '}'. The same is *NOT* true for a generic implementation and it is
specifically not true for x86. On this platform, 'yield' would be a no-
op.

As I've explained to you at least three times, the 'yield' function is
pseudo-code for whatever yielding is needed on your platform.

This code would not be suitable on any hardware that supports multiple
virtual CPUs per physical CPU. The spinning thread does not yield the
CPU execution resources and may spin so fast that the thread that
holds the lock, if running in the same physical CPU, is unable to make


forward progress at a normal rate.

Can you stop being an idiot now?

DS

Szabolcs Ferenczi

unread,
Oct 21, 2008, 1:41:15 PM10/21/08
to
On Oct 21, 7:29 pm, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 21, 9:18 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> wrote:
>
> > you can learn from it. By the way can you to tell me where is any
> > yield in it?
> >    if (test-and-set succeeds)
> >       return /* lock acquired */
> >    spin count = 0
> >    while (lock holder on cpu AND spin count < max spins) {
> >       if (lock not still held) /* ordinary load */
> >          if (test-and-set succeeds)
> >             return /* lock acquired */
> >       increment spin count
> >    }
> >    mutex lock system call /* always acquired the lock */
> >    return /* lock acquired */
>
> On this platform-specific implementation, the only 'yield' needed is
> the '}'.

So you call '}' a yield. Congratulations.

: The same is *NOT* true for a generic implementation and it is


> specifically not true for x86. On this platform, 'yield' would be a no-
> op.

The generic pseudo code is not specifying any yield since yield has a
well defined meaning and no matter whether you and your little pet is
trying to re-define it, you fail.

Here you find a generic pseudo code:

Code Example 7-4 A Simple Spin Lock (thread_extensions.c)
/* Don’t use this code! */
spin_lock(mutex_t *m)
{ int i;
for (i=0; i < SPIN_COUNT; i++)
{if (pthread_mutex_trylock(m) != EBUSY)
return; } /* got the lock! */
pthread_mutex_lock(m); /* give up and block. */
return; } /* got the lock after blocking! */

p. 136 in
http://www.cs.umu.se/kurser/TDBC64/VT03/pthreads/pthread-primer.pdf

Where can you see any yield? Well, if a yield is `}' for you, then you
can see two of them, don't you?

Best Regards,
Szabolcs

Chris M. Thomasson

unread,
Oct 21, 2008, 3:24:11 PM10/21/08
to
>
"Szabolcs Ferenczi" <szabolcs...@gmail.com> wrote in message
news:81ea95bc-bebb-41bb...@e17g2000hsg.googlegroups.com...

On Oct 21, 7:29 pm, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 21, 9:18 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> > wrote:
> >
> > > you can learn from it. By the way can you to tell me where is any
> > > yield in it?
> > > if (test-and-set succeeds)
> > > return /* lock acquired */
> > > spin count = 0
> > > while (lock holder on cpu AND spin count < max spins) {
> > > if (lock not still held) /* ordinary load */
> > > if (test-and-set succeeds)
> > > return /* lock acquired */
> > > increment spin count
> > > }
> > > mutex lock system call /* always acquired the lock */
> > > return /* lock acquired */
> >
> > On this platform-specific implementation, the only 'yield' needed is
> > the '}'.

> So you call '}' a yield. Congratulations.

A NOP.

> : The same is *NOT* true for a generic implementation and it is
> > specifically not true for x86. On this platform, 'yield' would be a no-
> > op.

> The generic pseudo code is not specifying any yield since yield has a
> well defined meaning and no matter whether you and your little pet is
> trying to re-define it, you fail.

You brain is made up of a bag of hammers. The pseudo-code yield resolves to
whats need on your architecture. On x86 its the PAUSE instruction.


> Here you find a generic pseudo code:

> Code Example 7-4 A Simple Spin Lock (thread_extensions.c)
> /* Don’t use this code! */
> spin_lock(mutex_t *m)
> { int i;
> for (i=0; i < SPIN_COUNT; i++)
> {if (pthread_mutex_trylock(m) != EBUSY)
> return; } /* got the lock! */
> pthread_mutex_lock(m); /* give up and block. */
> return; } /* got the lock after blocking! */

> Where can you see any yield? Well, if a yield is `}' for you, then you
> can see two of them, don't you?

This generic pseudo-code is totally busted because it lacks the yield. Here
is REAL code for a spin-lock on a x86, notice how the `yield()' procedure
resolves to the PAUSE instruction:
___________________________________________________________________
#include <limits.h>

spinlock_ia32_acquire_failed:
PAUSE
JMP spinlock_ia32_acquire_retry
}
}

___________________________________________________________________


The pseudo-code you posted for the spin-lock portion of an adaptive mutex is
busted. Your hopeless. Beyond repair. If I followed your advise, well, then
the lock potion of the code would look like this:


__declspec(naked) void
spinlock_ia32_acquire_szabolcs_brain_damage_version(


spinlock_ia32* const self
) {
_asm {
MOV ECX, [ESP + 4]
MOV EAX, 1

spinlock_ia32_acquire_retry:
XCHG [ECX], EAX
TEST EAX, EAX

JNZ spinlock_ia32_acquire_retry
RET
}
}


That is busted on a x86. Sorry.

David Schwartz

unread,
Oct 21, 2008, 4:12:53 PM10/21/08
to
On Oct 21, 10:41 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> The generic pseudo code is not specifying any yield since yield has a


> well defined meaning and no matter whether you and your little pet is
> trying to re-define it, you fail.

I've answered this point at least three times now. Again, here's the
answer:

"This code would not be suitable on any hardware that supports
multiple
virtual CPUs per physical CPU. The spinning thread does not yield the
CPU execution resources and may spin so fast that the thread that
holds the lock, if running in the same physical CPU, is unable to make
forward progress at a normal rate."

Some type of yield operation is needed to allow the spinning thread to
yield CPU execution resources.

If you would like to address this argument, that would be great. But
I've made it at least three times and each time you've ignored it.

DS

Szabolcs Ferenczi

unread,
Oct 21, 2008, 4:46:15 PM10/21/08
to
On Oct 21, 10:12 pm, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 21, 10:41 am, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> wrote:
>
> > The generic pseudo code is not specifying any yield since yield has a
> > well defined meaning and no matter whether you and your little pet is
> > trying to re-define it, you fail.
>
> I've answered this point at least three times now. Again, here's the
> answer:
>
> "This code would not be suitable on any hardware that supports
> multiple
> virtual CPUs per physical CPU.

You might check it with the author of the book Pthread Primer.

Anyway, it is suitable for illustrating the principles and it is not
intended to be used blindly on any hardware. Please refrain from your
hacker mentality if possible. It is a pseudo code illustrating the
principle only and not going into the small details. See the warning
of the author. Do not blindly try to use it. No. You should use
standard programming tools and do not hack your own control
primitives. I warned you, my friend.

<quote>


Code Example 7-4 A Simple Spin Lock (thread_extensions.c)
/* Don’t use this code! */
spin_lock(mutex_t *m)
{ int i;
for (i=0; i < SPIN_COUNT; i++)
{if (pthread_mutex_trylock(m) != EBUSY)
return; } /* got the lock! */
pthread_mutex_lock(m); /* give up and block. */
return; } /* got the lock after blocking! */

</quote>

> The spinning thread does not yield the
> CPU execution resources and may spin so fast that the thread that
> holds the lock, if running in the same physical CPU, is unable to make
> forward progress at a normal rate."

That is the issue with the concrete implementation but not an issue
with the pseudo code. The pseudo code gives an illustration about the
principle.

Besides, if you want to think about something more sophistical, think
about the pseudo code of the Solaris version. That makes sense as the
solution on multiple-CPUs rather than any simplistic yielding.

You might learn it finally.

> Some type of yield operation is needed to allow the spinning thread to
> yield CPU execution resources.

Yield is not a solution. Well, you should not consume CPU cycles when
it is obvious that it is useless. The Solaris solution addresses this
issue but your simple yield does not. However, any yielding is
something amateur stuff since no matter how busy you are in yielding,
you do consume the resource unnecessarily. Only you make some major
overhead with it which is not necessary at all.

Relax my friend and try to learn something professional, finally.

Best Regards,
Szabolcs

David Schwartz

unread,
Oct 21, 2008, 7:12:13 PM10/21/08
to
On Oct 21, 1:46 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
wrote:

> Relax my friend and try to learn something professional, finally.

Anyone reading your "reply" should be able to see that you are unable
to learn anything and unable to admit when you are wrong.

Your big criticism has now reduced to Chris showing a detail you don't
think is important. However, the fact that you didn't understand it or
appreciate how important it is shows that it was important.

The future trend is definitely going to be more than one thread
running on a physical CPU at a time. Threads sharing execution
resources with other threads is going to soon be the rule rather than
the exception. Pointing out that algorithms need to take this into
account, in the specific cases where they really do need to, is not
only valid but commendable.

DS

Zeljko Vrba

unread,
Oct 23, 2008, 3:52:52 AM10/23/08
to
On 2008-10-21, David Schwartz <dav...@webmaster.com> wrote:
>
> Here's a real implementaion:
>
> bool try_to_lock(void)
> {
> if(*(volatile int *)spinlock==SPIN_LOCKED) return false;
> return CompareExchange(spinlock, SPIN_UNLOCKED,
> SPIN_LOCKED)==SPIN_UNLOCKED;
> }
>
Why did you use volatile here?

Chris M. Thomasson

unread,
Oct 23, 2008, 4:24:00 AM10/23/08
to

<Serge...@gmail.com> wrote in message
news:9ac82556-0acb-4556...@m44g2000hsc.googlegroups.com...

> On Oct 20, 7:49 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
[...]

> Thank you also for help on SPARC instructions.
> I am new to unix/linux and don't want to reinvent bicycle or re-write
> own locks for unix :).

Well, then don't!!! =^)

A POSIX Thread impl should have a fairly/very nice production of
synchronization on your version of UNIX/Linux; use POSIX and rest well my
friend. Us hardcore "diehard", perhaps, """nutcases""", who actually
___ENJOY___ creating/inventing new, and perhaps state-of-the-art
synchronization algorithms are __VERY__ rare at best... However, if you keep
following my advise and use POSIX impl, well, who knows. Perhaps you will be
using one of our algorihtms and not even know it... Shi% happens!

;^D


I need to go to work now. I hack at night.

David Schwartz

unread,
Oct 23, 2008, 6:06:50 AM10/23/08
to
On Oct 23, 12:52 am, Zeljko Vrba <zvrba.nos...@ieee-sb1.cc.fer.hr>
wrote:

Just for simplicity. It should actually be:

if(InterlockedGet(spinlock)==SPIN_LOCKED) return false;

DS

Szabolcs Ferenczi

unread,
Oct 23, 2008, 6:43:06 AM10/23/08
to
On Oct 22, 1:12 am, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 21, 1:46 pm, Szabolcs Ferenczi <szabolcs.feren...@gmail.com>
> wrote:
>
> [...]

> Your big criticism has now reduced to Chris showing a detail you don't
> think is important. However, the fact that you didn't understand it or
> appreciate how important it is shows that it was important.

Negative. It is not only not important but a definite mistake. I have
proven it by counter examples.

> The future trend is definitely going to be more than one thread
> running on a physical CPU at a time.

That is not any future trend since that is included into the meaning
of the thread. A thread is a light-weight process that shares the
resources of its hosting heavy-weight process. Any process is a
virtual processor, which can be mapped to a physical processor alias
CPU. So, by definition the current situation is: "more than one thread
running on a physical CPU at a time." It is not "the future trend."

> Threads sharing execution
> resources with other threads is going to soon be the rule rather than
> the exception.

Again, it is not any exception right now since it is included into the
concept of the thread. Not all processes share "execution resources"
but threads do by definition.

> Pointing out that algorithms need to take this into
> account, in the specific cases where they really do need to, is not
> only valid but commendable.

No, algorithms should not take the small details into account. That is
a big mistake at the architecture level. At the user level algorithms
should be at a certain abstract level too. Any system level SW is a
bridge between the specific details of the platform where you should
take it into account but only there. (The spinlock belongs to the
system level software. It must be transparent to the user level
algorithms since it plays role on certain platforms. In certain
platforms spinning helps, on others it makes response time worth. It
is not a user level concern.)

The example which is under discussion here is a spinlock which belongs
to the system level. At the user level you are not even know it
directly whether a lock is implemented with this technique or not. At
the user level it does not belong to the problem set.

Nevertheless, trying to implement a spinlock at the user level
indicates a kind of incompetence. That does not mean that we should
not talk about the principles of it or the prototype of it, however,
any abstract level should obviously neglect the small details which
are to be solved differently on different platforms. What is not
platform specific belongs to the abstract description.

I hope I was of help.

Best Regards,
Szabolcs

David Schwartz

unread,
Oct 23, 2008, 7:29:35 AM10/23/08
to
[snip]

Nothing snipped is of any merit whatsoever. It's all complete nonsense
and a further demonstration that Szabolcs Ferenczi is incapable of
learning.

One serious error he makes is in concluding that user-space code never
needs to change due to hyper-threading. It is actually a fundamental
change that cannot be hidden from user space. Hyper-threading makes
CPU execution resources a separate resource from CPU time. Prior to
hyper-threading, there was no particular reason to care how much CPU
execution resources you were consuming for a given amount of CPU time
-- it made no difference.

Previously, if you were scheduled on that CPU, all the CPU execution
resources were yours and yours alone. There was downside to keeping
all the execution units as busy as you could for the same amount of
run time. With hyper-threading, even though you are scheduled on the
CPU, you still want to minimize your use of execution resources as
wasting them will slow down any other threads running in that core.

As a simple example, assume that there are two things we could
compute, A and B. If A is greater than zero, we only need A, which it
will be almost all of the time (fast path). In the rare case that A is
less than or equal to zero, we need B (slow path).

Without HT, this is optimal:
1) Compute A and B at the same time in different execution units.
2) If A>0, output A.
3) Else, output B.

This is, in the fast path where we use A, 2 cycles but 3 execution
units. We only compute B because it's free and makes the slow path
faster. However, this will always compute B which we will almost never
need. It's free, and we might need it, so why not.

With HT, this is optimal:
1) Compute A.
2) If A>0, output A.
3) Compute B.
4) Output B.

This is, in the fast path where we use A, 2 cycles and 2 execution
units. This is a user-space optimization for a CPU with HT. (Since it
deprives the thread in the other slot of one fewer execution unit in
the fast path.)

I am not saying this to educate SF, as he has proven to be incapable
of learning anything. However, it's an interesting point that is not
very well known, so it's worth the time and effort to explain it.

How much do you want to bet SF will neither admit he is wrong nor
explain why I am wrong?

DS

Szabolcs Ferenczi

unread,
Oct 23, 2008, 9:44:18 AM10/23/08
to
On Oct 23, 1:29 pm, David Schwartz <dav...@webmaster.com> wrote:
> [snip]
>
> Nothing snipped is of any merit whatsoever. It's all complete nonsense
[snip]

Right. "Nothing snipped is of any merit whatsoever." Well, if you
cannot understand something, it does not follow that it would be
nonsense.

Rather, it follows that you deserve some basic course in concurrent
programming if you cannot follow such basic issues.

I advise you to try to learn instead of hacking around and telling
nonsense.

Relax, my friend.

Best Regards,
Szabolcs

Ian Collins

unread,
Oct 23, 2008, 2:54:00 PM10/23/08
to
David Schwartz wrote:

> As a simple example, assume that there are two things we could
> compute, A and B. If A is greater than zero, we only need A, which it
> will be almost all of the time (fast path). In the rare case that A is
> less than or equal to zero, we need B (slow path).
>
> Without HT, this is optimal:
> 1) Compute A and B at the same time in different execution units.
> 2) If A>0, output A.
> 3) Else, output B.
>
> This is, in the fast path where we use A, 2 cycles but 3 execution
> units. We only compute B because it's free and makes the slow path
> faster. However, this will always compute B which we will almost never
> need. It's free, and we might need it, so why not.
>
> With HT, this is optimal:
> 1) Compute A.
> 2) If A>0, output A.
> 3) Compute B.
> 4) Output B.
>
> This is, in the fast path where we use A, 2 cycles and 2 execution
> units. This is a user-space optimization for a CPU with HT. (Since it
> deprives the thread in the other slot of one fewer execution unit in
> the fast path.)
>

Is this an optimisation issue for the programmer, or the compiler writer?

--
Ian Collins

David Schwartz

unread,
Oct 23, 2008, 6:04:09 PM10/23/08
to
On Oct 23, 11:54 am, Ian Collins <ian-n...@hotmail.com> wrote:

> Is this an optimisation issue for the programmer, or the compiler writer?

It's actually both. For one thing, performance-critical inner loops
are sometimes hand-optimized in assembly. But the main issue is in
performance testing. If you are comparing two implementations to see
which has a performance advantage, you cannot just look at the one
with the best CPU time when it has exclusive use of the core.

In an ideal world, it would only be an issue for the compiler writer.
In the real world, it's both.

DS

0 new messages