Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pthread_once and double checked locking

153 views
Skip to first unread message

u.int...@gmail.com

unread,
Jan 1, 2007, 1:18:57 AM1/1/07
to
Hi,

regarding:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/threads/pthread.c#134

This pthread_once implementation here seems to be using double checked
locking. I am told this is broken. I even mostly understand why.

So why is the above using double checked locking? Or is it some
variation that works? From all I understand on the topic, the only true
and correct solution is a solution that involves hardware specific
behaviour.

Thanks for your response,

Sohail

Joe Seigh

unread,
Jan 1, 2007, 8:07:45 AM1/1/07
to

It's hardware specific. I believe Solaris only runs in TSO memory mode
on sparc so the only required membar would be #StoreLoad with DCL does
not need. Also once variable logic works even if access is not atomic.
It may or may not be correct on x86 depending on what the memory model
actually is.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Joe Seigh

unread,
Jan 1, 2007, 8:55:58 AM1/1/07
to

> It may or may not be correct on x86 depending on what the memory model
> actually is.
>
There could be a control dependency that may make it work on x86.
Somebody who's played with control dependencies could answer this
better. Nothing like not documenting anything in the code.

loic...@gmx.net

unread,
Jan 1, 2007, 9:09:10 AM1/1/07
to
Hi guys,

> > regarding:
> > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/threads/pthread.c#134
> >
> > This pthread_once implementation here seems to be using double checked
> > locking. I am told this is broken. I even mostly understand why.
> >
> > So why is the above using double checked locking? Or is it some
> > variation that works? From all I understand on the topic, the only true
> > and correct solution is a solution that involves hardware specific
> > behaviour.
>
> It's hardware specific. I believe Solaris only runs in TSO memory mode
> on sparc so the only required membar would be #StoreLoad with DCL does
> not need. Also once variable logic works even if access is not atomic.
> It may or may not be correct on x86 depending on what the memory model
> actually is.

I am not a hardware expert, so please forgive me if my answer is
naive...

The potential problems with this code are:

1) The variable is once->once_flag is not read atomically. However, I
think that the code is correct; i.e. once->once_flag will have the
value PTHREAD_ONCE_NOTDONE only if the /init_routine()/ has been
called.

2) Multi-processor cache coherency. Some architectures, like Alpha or
Itanium, perform aggressive memory caching operation which reorder
reads and writes, and hence require memory barriers. The risk in such
an environment would be that the shared data initialized by the
/init_routine()/ is not yet stable when the once->once_flag variable
indicates DONE. However, note that there is a /_private_mutex_lock()/
operation between the call to /init_routine()/ and the assignment to
once->once_flag. My guess is that this /_private_mutex_lock()/
introduces the requisite memory barrier, ensuring that the initialized
data is stable before the once->once_flag modification appears on any
other processor.

Happy new year 2007!
Loic.

u.int...@gmail.com

unread,
Jan 2, 2007, 11:10:29 AM1/2/07
to
Thanks for your posts Joe and Loic. Your responses indicated that I was
on the right track.

With Solaris running on x86 now, you'd wonder why there isn't specific
code for this. Perhaps an email to their mailing lists is in order.

Thanks again!

roger.f...@sun.com

unread,
Jan 17, 2007, 8:55:26 AM1/17/07
to

loic...@gmx.net wrote:
> Hi guys,
>
> > > regarding:
> > > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/threads/pthread.c#134
> > >
> > > This pthread_once implementation here seems to be using double checked
> > > locking. I am told this is broken. I even mostly understand why.
> > >
> > > So why is the above using double checked locking? Or is it some
> > > variation that works? From all I understand on the topic, the only true
> > > and correct solution is a solution that involves hardware specific
> > > behaviour.
> >
> > It's hardware specific. I believe Solaris only runs in TSO memory mode
> > on sparc so the only required membar would be #StoreLoad with DCL does
> > not need. Also once variable logic works even if access is not atomic.
> > It may or may not be correct on x86 depending on what the memory model
> > actually is.
>
> I am not a hardware expert, so please forgive me if my answer is
> naive...
>
> The potential problems with this code are:
>
> 1) The variable is once->once_flag is not read atomically. However, I
> think that the code is correct; i.e. once->once_flag will have the
> value PTHREAD_ONCE_NOTDONE only if the /init_routine()/ has been
> called.

once->once_flag is a uint32_t and is properly aligned.
So it is read and written atomically, no problem there.

> 2) Multi-processor cache coherency. Some architectures, like Alpha or
> Itanium, perform aggressive memory caching operation which reorder
> reads and writes, and hence require memory barriers. The risk in such
> an environment would be that the shared data initialized by the
> /init_routine()/ is not yet stable when the once->once_flag variable
> indicates DONE. However, note that there is a /_private_mutex_lock()/
> operation between the call to /init_routine()/ and the assignment to
> once->once_flag. My guess is that this /_private_mutex_lock()/
> introduces the requisite memory barrier, ensuring that the initialized
> data is stable before the once->once_flag modification appears on any
> other processor.

No, _private_mutex_lock is just an alias for pthread_mutex_lock.
It has no magical properties.

As Joe said, the Solaris code for pthread_once( ) works only because
Solaris runs in TSO (Total Store Ordering) mode. This will probably
change in the future, so the code should be fixed by inserting a
memory barrier before assigning PTHREAD_ONCE_DONE to
once->once_flag.

See the bug report I filed (and am fixing):

6513516 double checked locking code needs a memory barrier

There are other instances of double checked locking in Solaris
that need to be fixed, besides pthread_once().

Roger Faulkner
Sun Microsystems

loic...@gmx.net

unread,
Jan 17, 2007, 9:55:27 AM1/17/07
to
Hello Roger,

> No, _private_mutex_lock is just an alias for pthread_mutex_lock.
> It has no magical properties.
>
> As Joe said, the Solaris code for pthread_once( ) works only because
> Solaris runs in TSO (Total Store Ordering) mode. This will probably
> change in the future, so the code should be fixed by inserting a
> memory barrier before assigning PTHREAD_ONCE_DONE to
> once->once_flag.
>
> See the bug report I filed (and am fixing):
>
> 6513516 double checked locking code needs a memory barrier
>
> There are other instances of double checked locking in Solaris
> that need to be fixed, besides pthread_once().

Thanks for your detailed answer. Actually I am starting to study
'advanced topic' like DCL, lock free data structure, etc... Needless to
say, but the world of the memory sub-systems is new to me (so far,
Pthreads took care of that for me).

Cheers,
Loic.
--
" The road to wisdom? Well it's plain and simple to express. To err and
err and err again, but less and less and less ". Piet Hien.

u.int...@gmail.com

unread,
Jan 22, 2007, 10:27:24 PM1/22/07
to
roger.f...@sun.com wrote:

> See the bug report I filed (and am fixing):
>
> 6513516 double checked locking code needs a memory barrier

Did you file this because of the post?

roger.f...@sun.com

unread,
Jan 22, 2007, 11:29:37 PM1/22/07
to

Yes, I did. Thanks for the post.

Roger Faulkner
Sun Microsystems

u.int...@gmail.com

unread,
Jan 23, 2007, 1:45:46 AM1/23/07
to
roger.f...@sun.com wrote:
> u.int...@gmail.com wrote:
> > roger.f...@sun.com wrote:
> >
> > > See the bug report I filed (and am fixing):
> > >
> > > 6513516 double checked locking code needs a memory barrier
> >
> > Did you file this because of the post?
>
> Yes, I did. Thanks for the post.

Cool :-)

0 new messages