A bit more portable DCL (DCI/DCCI) implementation

Dmitriy Vyukov

unread,

Jul 19, 2007, 3:39:40 PM7/19/07

to

T* volatile instance = 0;
mutex_t mutex;

T& dcl_singleton()
{
if (!instance) // first check
{
lock_t lock (mutex);
if (!instance) // second check
{
T* local = 0;
{
// main feature:
// instance is created under second mutex
mutex_t mutex2;
lock_t lock2 (mutex2);
local = new T();
}
instance = local;
}
}
return *instance;
}

1. Suppress compiler reordering w/o need to cope with 'volatile' magic
2. Suppress hardware reordering w/o need to cope with 'membar' magic

Almost POSIX compatible. As portable as you can get. Work on all
platforms that respect data-dependency.

Dmitriy V'jukov

Joe Seigh

unread,

Jul 19, 2007, 8:04:15 PM7/19/07

to

Lock implementations will likely have a release membar as part of
their unlock logic. However, strictly speaking there is no requirement
by the apocryphal formal lock semantics for such a membar.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Dmitriy Vyukov

unread,

Jul 20, 2007, 4:28:14 AM7/20/07

to

On Jul 20, 4:04 am, Joe Seigh <jseigh...@xemaps.com> wrote:

> Lock implementations will likely have a release membar as part of
> their unlock logic. However, strictly speaking there is no requirement
> by the apocryphal formal lock semantics for such a membar.

Hmmm...
But if mutex don't have release membar as part of his unlock logic,
then, I think, all algorithms that use mutexes will be broken...
Because this means code "leakage" from critical section...

Dmitriy V'jukov

Torsten Robitzki

unread,

Jul 20, 2007, 4:51:31 AM7/20/07

to

Dmitriy Vyukov wrote:

> Almost POSIX compatible. As portable as you can get. Work on all
> platforms that respect data-dependency.

Why so complicated when you want POSIX compliant code, you can just use
POSIX:

static pthread_once_t once = PTHREAD_ONCE_INIT;
static T* instance = 0;

static extern "C" void init_instance() {
instance = new T();
}

T& dcl_singleton() {
pthread_once(&once, init_instance);

return *instance;
}

best regards,
Torsten

Dmitriy Vyukov

unread,

Jul 20, 2007, 5:07:45 AM7/20/07

to

Good point.
First, I don't know whether pthread_once widespread implementations
have fast-path w/o membars.
Second, Win32 API has pthread_once analog only in latest versions.
Third, many people still make own implementations... and those
implementations are usually wrong :)

Dmitriy V'jukov

Joe Seigh

unread,

Jul 20, 2007, 5:36:04 AM7/20/07

to

It's leakage into the critical section that's the problem. The store
into the global pointer of the object address could be moved into the
critical region.

Dmitriy Vyukov

unread,

Jul 20, 2007, 5:57:15 AM7/20/07

to

On Jul 20, 1:36 pm, Joe Seigh <jseigh...@xemaps.com> wrote:

> It's leakage into the critical section that's the problem. The store
> into the global pointer of the object address could be moved into the
> critical region.

I bow to Joe Seigh ;)

So third mutex solves the problem :)

T& dcl_singleton()
{
if (!instance) // first check
{
lock_t lock (mutex);
if (!instance) // second check
{
T* local = 0;
{
// main feature:
// instance is created under second mutex
mutex_t mutex2;
lock_t lock2 (mutex2);
local = new T();
}
// main feature:

// instance pointer is setup under third mutex
mutex_t mutex3;
lock_t lock3 (mutex3);

instance = local;
}
}
return *instance;
}

All this stuff is still in slow-path and executed only be one thread.

Dmitriy V'jukov

David Schwartz

unread,

Jul 20, 2007, 5:51:49 PM7/20/07

to

On Jul 20, 2:57 am, Dmitriy Vyukov <dvyu...@gmail.com> wrote:

> T& dcl_singleton()
> {
> if (!instance) // first check
> {
> lock_t lock (mutex);
> if (!instance) // second check
> {
> T* local = 0;
> {
> // main feature:
> // instance is created under second mutex
> mutex_t mutex2;
> lock_t lock2 (mutex2);
> local = new T();
> }
> // main feature:
> // instance pointer is setup under third mutex
> mutex_t mutex3;
> lock_t lock3 (mutex3);
> instance = local;
> }
> }
> return *instance;
>
> }

What keeps another thread from *seeing* the 'instance=local' before it
sees all the data written by 'new T()'? All of this lock stuff is
great for the thread that acquires the locks, but it has no effect on
a second thread that doesn't acquire the locks.

DS

Chris Thomasson

unread,

Jul 20, 2007, 5:42:50 AM7/20/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:1184873980.4...@m37g2000prh.googlegroups.com...

This is like the method I use in vZOOM:

http://groups.google.com/group/comp.programming.threads/msg/b7814ee86cd4b54d

Chris Thomasson

unread,

Jul 20, 2007, 5:49:57 AM7/20/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:3dudnU11affBHTzb...@comcast.com...

This works for producing a data object into a lock-free reader pattern. You
not depending on a subsequent read to another location in the producer
thread to be executed after the release barrier in the prior unlock of the
critical section.

Dmitriy Vyukov

unread,

Jul 21, 2007, 3:05:36 AM7/21/07

to

On 21 , 01:51, David Schwartz <dav...@webmaster.com> wrote:

> What keeps another thread from *seeing* the 'instance=local' before it
> sees all the data written by 'new T()'? All of this lock stuff is
> great for the thread that acquires the locks, but it has no effect on
> a second thread that doesn't acquire the locks.

I rely on data-dependency between instance pointer and singleton
itself.
So it must work on all widespread platforms except Alpha.

Dmitriy V'jukov

Dmitriy Vyukov

unread,

Jul 21, 2007, 4:08:18 AM7/21/07

to

On 20 , 13:49, "Chris Thomasson" <cris...@comcast.net> wrote:

> > This is like the method I use in vZOOM:

Hmm. Yes. Nothing new.
But I don't see anybody apply this trick to DCL.

And I am still thinking that two mutexes is very expensive in vZOOM
context. In DCL context those two mutexes on slow-path.

Dmitriy V'jukov

Dmitriy Vyukov

unread,

Jul 21, 2007, 4:53:24 AM7/21/07

to

On 20 , 13:57, Dmitriy Vyukov <dvyu...@gmail.com> wrote:
> So third mutex solves the problem :)

Hmmm....
More reasonable implementation (only 1 additional mutex):

> T& dcl_singleton()
> {
> if (!instance) // first check
> {
> lock_t lock (mutex);
> if (!instance) // second check
> {
> T* local = 0;
> {

> local = new T();
> // main feature:
> mutex_t mutex2;
> lock_t lock2 (mutex2);
> }

> instance = local;
> }
> }
> return *instance;
>
> }

Dmitriy V'jukov

David Schwartz

unread,

Jul 21, 2007, 9:14:45 AM7/21/07

to

In other words, it's platform-specific code. Such code should *always*
contain a portable equivalent and appropriate ways to decide when and
whether the platform-specific code is appropriate. It should default
off on platforms where it is not known safe.

DS

David Schwartz

unread,

Jul 21, 2007, 9:16:54 AM7/21/07

to

On Jul 20, 1:28 am, Dmitriy Vyukov <dvyu...@gmail.com> wrote:

> Hmmm...
> But if mutex don't have release membar as part of his unlock logic,
> then, I think, all algorithms that use mutexes will be broken...
> Because this means code "leakage" from critical section...

Nope. All that the mutex standard requires is that code that precisely
adheres to that standard work as expected. It says nothing about code
that "sort of" follows the rules or that the release must have release
membars. For example, there could be a platform that only requires a
barrier on lock acquisition (for example, consider a platform where
each CPU can flush the other CPU's write posting buffers).

The problem with this code is that there are cases where no lock is
acquired. If synchronization is performed on lock acquisition (rather
than being sufficiently performed on lock release) then this code is
broken.

DS

Joe Seigh

unread,

Jul 21, 2007, 2:08:59 PM7/21/07

to

The problem is Posix pthreads never formally defined their memory model,
and by extension, the semantics for mutexes and other synchronization
primatives. They expected that the implementers would just do the right
thing whatever that was. Your DCL implementation depends on the mutex
semantics and consequent implementations being exactly how you imagine
that the Posix implementers imagined they would be. And that anything
you can't imagine can't happen.

You're not using mutexes in a conventional manner. What you need to do
is formally prove that if a particular mutex implementation breaks your
DCL implementation, it will break a conventional mutex usage. This will
be rather difficult because of the lack of formal semantics.

Dmitriy Vyukov

unread,

Jul 22, 2007, 12:39:42 PM7/22/07

to

On 21 , 22:08, Joe Seigh <jseigh...@xemaps.com> wrote:

> You're not using mutexes in a conventional manner. What you need to do
> is formally prove that if a particular mutex implementation breaks your
> DCL implementation, it will break a conventional mutex usage. This will
> be rather difficult because of the lack of formal semantics.

It is impossible to write fully portable lock-free (not ordinary lock-
based) code in C/C++ now. So probably it is better to say not "Almost
POSIX compatible" but "It will work on most modern OSes and hardware
platforms and with high probability on your particular platform, but
you still need to manually check it's operation" :)
Is this formulation provoke objections? :)

Dmitriy V'jukov

Dmitriy Vyukov

unread,

Jul 22, 2007, 1:02:07 PM7/22/07

to

On 21 , 17:16, David Schwartz <dav...@webmaster.com> wrote:

> Nope. All that the mutex standard requires is that code that precisely
> adheres to that standard work as expected. It says nothing about code
> that "sort of" follows the rules or that the release must have release
> membars. For example, there could be a platform that only requires a
> barrier on lock acquisition (for example, consider a platform where
> each CPU can flush the other CPU's write posting buffers).
>
> The problem with this code is that there are cases where no lock is
> acquired. If synchronization is performed on lock acquisition (rather
> than being sufficiently performed on lock release) then this code is
> broken.

Please see my answer to Joe Seigh here:
http://groups.google.ru/group/comp.programming.threads/msg/af3b151cbe8d0138

Dmitriy V'jukov

Chris Thomasson

unread,

Jul 22, 2007, 1:17:26 AM7/22/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1185005298.5...@22g2000hsm.googlegroups.com...

Well, you can use the vZOOM method on the slow-path as well:

<pseudo-code>

template<typename T>
class vzOnce {
T* volatile m_Obj; // = 0
pthread_mutex_t m_Lock;

public:
T* LoadPtr() {
T *Ptr = vzAtomic_Load_Depends(&m_Obj);
if (! Ptr) {
vzoom_t* const vzthis = pthread_getspecific(...);
pthread_mutex_lock(&m_Lock);
Ptr = m_Obj;
if (! Ptr) {
Ptr = new T;
pthread_mutex_unlock(&vzthis->memory_lock);
pthread_mutex_lock(&vzthis->memory_lock);
vzAtomic_Store_Naked(&m_Obj, Ptr);
}
pthread_mutex_unlock(&m_Lock);
}
return Ptr;
}
};

Dmitriy Vyukov

unread,

Jul 23, 2007, 1:09:46 AM7/23/07

to

On Jul 22, 9:17 am, "Chris Thomasson" <cris...@comcast.net> wrote:

> > Hmm. Yes. Nothing new.
> > But I don't see anybody apply this trick to DCL.
>
> > And I am still thinking that two mutexes is very expensive in vZOOM
> > context. In DCL context those two mutexes on slow-path.
>
> Well, you can use the vZOOM method on the slow-path as well:

You forgot to say how mach I must pay for this :)))

Dmitriy V'jukov

Joe Seigh

unread,

Jul 23, 2007, 6:35:22 AM7/23/07

to

Sun has a mutex implementation that avoid having a store/load membar.
It's part of the waiter notification logic, not the "release" semantics
logic but it's an indication that vendors will eliminate membar overhead
if they can think of a way to do it.

You'd have a hard way to prevent lock critical regions from being moved
into another lock's critical region. Deadlock rules don't apply since
deadlock is an artifact of lock implementation. If programs w/ locks
can be shown to be semamtically and lexically equivalent to programs using
STM, you'd have a hard time arguing otherwise.

Plus, I can think of at least one formal definition of mutex semantics that
does not require memory barriers.

Chris Thomasson

unread,

Jul 22, 2007, 11:01:43 PM7/22/07

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:Kv%oi.6935$XL4.944@trndny04...

> Dmitriy Vyukov wrote:
>> On 21 , 22:08, Joe Seigh <jseigh...@xemaps.com> wrote:
>>
>>
>>>You're not using mutexes in a conventional manner. What you need to do
>>>is formally prove that if a particular mutex implementation breaks your
>>>DCL implementation, it will break a conventional mutex usage. This will
>>>be rather difficult because of the lack of formal semantics.
>>
>>
>> It is impossible to write fully portable lock-free (not ordinary lock-
>> based) code in C/C++ now. So probably it is better to say not "Almost
>> POSIX compatible" but "It will work on most modern OSes and hardware
>> platforms and with high probability on your particular platform, but
>> you still need to manually check it's operation" :)
>> Is this formulation provoke objections? :)
>>
>
> Sun has a mutex implementation that avoid having a store/load membar.
> It's part of the waiter notification logic, not the "release" semantics
> logic but it's an indication that vendors will eliminate membar overhead
> if they can think of a way to do it.

Is it something like this:

http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/pact/2004/2229/00/2229toc.xml&DOI=10.1109/PACT.2004.10001

?

Chris Thomasson

unread,

Jul 22, 2007, 11:06:52 PM7/22/07

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:Kv%oi.6935$XL4.944@trndny04...

> Dmitriy Vyukov wrote:
>> On 21 , 22:08, Joe Seigh <jseigh...@xemaps.com> wrote:
>>
>>
>>>You're not using mutexes in a conventional manner. What you need to do
>>>is formally prove that if a particular mutex implementation breaks your
>>>DCL implementation, it will break a conventional mutex usage. This will
>>>be rather difficult because of the lack of formal semantics.
>>
>>
>> It is impossible to write fully portable lock-free (not ordinary lock-
>> based) code in C/C++ now. So probably it is better to say not "Almost
>> POSIX compatible" but "It will work on most modern OSes and hardware
>> platforms and with high probability on your particular platform, but
>> you still need to manually check it's operation" :)
>> Is this formulation provoke objections? :)
>>
>
> Sun has a mutex implementation that avoid having a store/load membar.
> It's part of the waiter notification logic, not the "release" semantics
> logic but it's an indication that vendors will eliminate membar overhead
> if they can think of a way to do it.

[...]

I can see how to skip the #StoreLoad ordering constraint in lock
acquisition, but I think #StoreStore would need to be in there somewhere...
Have you seen a mutex impl that does not have any #StoreStore barriers?

Chris Thomasson

unread,

Jul 22, 2007, 11:09:17 PM7/22/07

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:Kv%oi.6935$XL4.944@trndny04...

> Dmitriy Vyukov wrote:
>> On 21 , 22:08, Joe Seigh <jseigh...@xemaps.com> wrote:
>>
>>
>>>You're not using mutexes in a conventional manner. What you need to do
>>>is formally prove that if a particular mutex implementation breaks your
>>>DCL implementation, it will break a conventional mutex usage. This will
>>>be rather difficult because of the lack of formal semantics.
>>
>>
>> It is impossible to write fully portable lock-free (not ordinary lock-
>> based) code in C/C++ now. So probably it is better to say not "Almost
>> POSIX compatible" but "It will work on most modern OSes and hardware
>> platforms and with high probability on your particular platform, but
>> you still need to manually check it's operation" :)
>> Is this formulation provoke objections? :)
>>
>
> Sun has a mutex implementation that avoid having a store/load membar.

Is the implementation within the scope of a JVM?

Steve Watt

unread,

Jul 23, 2007, 6:52:34 PM7/23/07

to

In article <1184922465.8...@z24g2000prh.googlegroups.com>,

Dmitriy Vyukov <dvy...@gmail.com> wrote:
>On Jul 20, 12:51 pm, Torsten Robitzki <MyFirstn...@Robitzki.de> wrote:
>> Dmitriy Vyukov wrote:
>> > Almost POSIX compatible. As portable as you can get. Work on all
>> > platforms that respect data-dependency.
>>
>> Why so complicated when you want POSIX compliant code, you can just use
>> POSIX:
>>
>> static pthread_once_t once = PTHREAD_ONCE_INIT;
>> static T* instance = 0;
>>
>> static extern "C" void init_instance() {
>> instance = new T();
>>
>> }
>>
>> T& dcl_singleton() {
>> pthread_once(&once, init_instance);
>>
>> return *instance;
>>
>> }
>

>First, I don't know whether pthread_once widespread implementations
>have fast-path w/o membars.

Any implementation that cares about performance of correctly-written
libraries will probably have a good fast-pathed pthread_once()
implementation.

More importantly, the implementation of pthread_once() is,
practically by definition, less portable than the application that
would use it. Pthreads implementors are free to use the basic DCL
pattern when their implementation won't ever run on a platform that
could break the basic pattern.

>Second, Win32 API has pthread_once analog only in latest versions.

Different problem. Microsoft has repeatedly demonstrated their
abilities[1] for making adequate thread APIs.

>Third, many people still make own implementations... and those
>implementations are usually wrong :)

Many people also attempt to make their own mutex implementations,
often with spinlocks, usually with unpleasant results.

Spending one's time on reimplementing pthread_once() is almost
as valuable as reimplementing printf(). While there's always
some learning value in there, it's just not a place that talented
folks should be spending time without some cause.

[1] I state no preference whether their abilities are good or
bad, simply that they've been demonstrated.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.5" / 37N 20' 15.3"
Internet: steve @ Watt.COM Whois: SW32-ARIN
Free time? There's no such thing. It just comes in varying prices...

David Schwartz

unread,

Jul 23, 2007, 7:26:44 PM7/23/07

to

On Jul 22, 10:02 am, Dmitriy Vyukov <dvyu...@gmail.com> wrote:

> Please see my answer to Joe Seigh
> here:

> http://groups.google.ru/group/comp.programming.threads/msg/af3b151cbe...

You have not held up your end of that bargain. Even though, in that
post, you made it clear that you understand this type code is not
portable, you *still* misrepresent its portability.

You said, "Almost POSIX compatible. As portable as you can get. Work
on all
platforms that respect data-dependency." You agreed that it was not
good to say things like that, but still you do.

I don't know what you mean by "data-dependency" here. In actual fact,
your code will work only on platforms where all the necessary
synchronization is done by a mutex release and no synchronization is
needed during mutex acquisition.

For example, a platform where on mutex acquisition, a CPU flushes all
other CPU's posted write buffers, this code will not work. The
constructor's side effects could still be in the posted write buffer
of the CPU that constructed it, even though it released a mutex.

I don't see what that has to do with data dependency.

If you would in fact make it clear that this code is not portable and
relies on complex and not-well-understood details of the platform,
then I would not complain. (Did it ever occur to you that this would
break on platforms where the synchronization is done on mutex
acquisition rather than release?)

You need to put big huge warnings on this code every time you post it.
And when you actually use it, you need to interrogate the environment
(assuming it's code that's not embedded) and if it's not an
environment the code is known to work on, you need to fall back to
portable code.

Your failure to do this makes you part of the plague of low quality
code that breaks on the next OS, the next CPU, or the next threading
library. You know better.

DS

Chris Thomasson

unread,

Jul 23, 2007, 1:40:23 AM7/23/07

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:1185233204.4...@i13g2000prf.googlegroups.com...

> On Jul 22, 10:02 am, Dmitriy Vyukov <dvyu...@gmail.com> wrote:
>
>> Please see my answer to Joe Seigh
>> here:
>> http://groups.google.ru/group/comp.programming.threads/msg/af3b151cbe...
>
> You have not held up your end of that bargain. Even though, in that
> post, you made it clear that you understand this type code is not
> portable, you *still* misrepresent its portability.
>
> You said, "Almost POSIX compatible. As portable as you can get. Work
> on all
> platforms that respect data-dependency."

http://groups.google.com/group/comp.programming.threads/msg/b7814ee86cd4b54d

I would have to say that is about as portable as you can get without
resorting to the direct use of atomic operations or explicit membar calls...

Dmitriy Vyukov

unread,

Jul 24, 2007, 4:29:03 AM7/24/07

to

On Jul 24, 3:26 am, David Schwartz <dav...@webmaster.com> wrote:

> I don't know what you mean by "data-dependency" here. In actual fact,
> your code will work only on platforms where all the necessary
> synchronization is done by a mutex release and no synchronization is
> needed during mutex acquisition.

Last version here:
http://groups.google.com/group/comp.programming.threads/msg/b00713e73e0322ff?hl=en&

Looks like this:

T* local = 0;
{
local = new T();

mutex_t mutex2;
mutex2.lock();
mutex2.unlock();
}
instance = local;

Here release *and* acquire between construction and global pointer
installation.

> I don't see what that has to do with data dependency.

data dependency needed to omit any synchronization on "reader" side

> If you would in fact make it clear that this code is not portable and
> relies on complex and not-well-understood details of the platform,
> then I would not complain. (Did it ever occur to you that this would
> break on platforms where the synchronization is done on mutex
> acquisition rather than release?)
>
> You need to put big huge warnings on this code every time you post it.
> And when you actually use it, you need to interrogate the environment
> (assuming it's code that's not embedded) and if it's not an
> environment the code is known to work on, you need to fall back to
> portable code.
>
> Your failure to do this makes you part of the plague of low quality
> code that breaks on the next OS, the next CPU, or the next threading
> library. You know better.

Well. I make one more try :)
You can use this implementation on platform that lacks of pthread_once
(or analog), still not resorting to platform-specific asm. But it's
your responsibility to ensure it's operation. So it's just comfortable
form of writing down (because it doesn't contain asm and actually can
be used for several platforms without modification and ifdefs).

Dmitriy V'jukov

Zeljko Vrba

unread,

Jul 24, 2007, 4:59:02 AM7/24/07

to

On 2007-07-24, Dmitriy Vyukov <dvy...@gmail.com> wrote:
>
> Well. I make one more try :)
> You can use this implementation on platform that lacks of pthread_once
> (or analog), still not resorting to platform-specific asm. But it's
> your responsibility to ensure it's operation. So it's just comfortable
> form of writing down (because it doesn't contain asm and actually can
> be used for several platforms without modification and ifdefs).
>

What is the point of avoiding platform-specific ASM? Can you give me an
example of a platform for which 1) there exist C,C++ compilers, 2) which has
*secret* specification of the underlying instruction set, *and* 3) for which
the vendor does not supply platform-specific libraries to access the
needed functionality? This is basically the only case which comes to my
mind where this hack can be useful.

Plus, if the underlying instruction set is kept secret, there's no reason
to believe that the memory model is specified somewhere. Can you write a
test program which proves or disproves the assumptions which must be met
so that your code functions as intended?

Dmitriy Vyukov

unread,

Jul 24, 2007, 5:34:41 AM7/24/07

to

On Jul 24, 12:59 pm, Zeljko Vrba <zvrba.nos...@ieee-sb1.cc.fer.hr>
wrote:

I consider C++ as default choice and asm as resort. Not vice versa.
And you are free to choose.

Dmitriy V'jukov

Zeljko Vrba

unread,

Jul 24, 2007, 6:06:40 AM7/24/07

to

On 2007-07-24, Dmitriy Vyukov <dvy...@gmail.com> wrote:
>>

>> Plus, if the underlying instruction set is kept secret, there's no reason
>> to believe that the memory model is specified somewhere. Can you write a
>> test program which proves or disproves the assumptions which must be met
>> so that your code functions as intended?
>
> I consider C++ as default choice and asm as resort. Not vice versa.
> And you are free to choose.
>

You still haven't addressed my last question: can you provide a test program
which (dis)proves assumptions under which your proposed code is correct?

Dmitriy Vyukov

unread,

Jul 24, 2007, 6:48:55 AM7/24/07

to

On Jul 24, 2:06 pm, Zeljko Vrba <zvrba.nos...@ieee-sb1.cc.fer.hr>
wrote:

No. It's inexpedient. It's like "Can you write a test program which
proves or disproves that some mutex implementation functions as
intended?".

Nevertheless I can provide list of assumptions:
- acquire-release must be two-sided store compiler barrier
- acquire-release must be two-sided store hardware barrier
- hardware must respect data-dependency

By "acquire-release" I mean:
mutex_t mutex;
mutex.lock();
mutex.unlock();

Those assumptions are correct for:
- all versions on Win32 on x86, x86-64, IA64
- NPTL on x86, x86-64, IA64, PPC
- FreeBSD on x86, IA64
and definitely on many other platforms

Dmitriy V'jukov

Chris Thomasson

unread,

Jul 23, 2007, 11:31:27 PM7/23/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1185265743.4...@j4g2000prf.googlegroups.com...

> On Jul 24, 3:26 am, David Schwartz <dav...@webmaster.com> wrote:
>
>> I don't know what you mean by "data-dependency" here. In actual fact,
>> your code will work only on platforms where all the necessary
>> synchronization is done by a mutex release and no synchronization is
>> needed during mutex acquisition.
>
> Last version here:
> http://groups.google.com/group/comp.programming.threads/msg/b00713e73e0322ff?hl=en&
>
> Looks like this:
>
> T* local = 0;
> {
> local = new T();
> mutex_t mutex2;
> mutex2.lock();
> mutex2.unlock();
> }
> instance = local;

[...]

I think the call to mutex2.lock/unlock can be optimized away...

Logan Shaw

unread,

Jul 24, 2007, 8:39:58 PM7/24/07

to

Chris Thomasson wrote:
> "Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
> news:1185265743.4...@j4g2000prf.googlegroups.com...

>> Looks like this:

>>
>> T* local = 0;
>> {
>> local = new T();
>> mutex_t mutex2;
>> mutex2.lock();
>> mutex2.unlock();
>> }
>> instance = local;
> [...]
>
> I think the call to mutex2.lock/unlock can be optimized away...

By what? Are there C++ compilers that optimize away calls to
external (that is, outside the compilation unit) functions on the
assumption that the functions will have no side effects? If so,
I don't want to use one of those compilers.

- Logan

David Schwartz

unread,

Jul 25, 2007, 3:57:48 AM7/25/07

to

On Jul 24, 5:39 pm, Logan Shaw <lshaw-use...@austin.rr.com> wrote:

> By what? Are there C++ compilers that optimize away calls to
> external (that is, outside the compilation unit) functions on the
> assumption that the functions will have no side effects? If so,
> I don't want to use one of those compilers.

Why would you care, assuming that:

1) The compiler is careful not to break compliant code.

2) Your code is compliant.

Oh, you want to write code that's *not* compliant and have it "just
happen to" work? If so, I don't want to run your code or hire you to
write any for me.

I have seen too much code like that break when compilers get smarter.
They will, I promise you. And it's dumb to insist compilers generate
poor output on compliant code just so that broken code will continue
to happen to work.

DS

Chris Thomasson

unread,

Jul 26, 2007, 11:49:44 AM7/26/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1185167386....@m37g2000prh.googlegroups.com...

I was testing the public waters with the following post:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e0b3166db3c5518d

The post is correct, just contact me via. e-mail. Alas, I have not received
any public/private flames and/or inquriries. So, imvho, just remember the
following posts should hold true for a quite a long time:

http://groups.google.com/group/comp.programming.threads/msg/5a098fe50679c296

http://groups.google.com/group/comp.arch/msg/1b9e405080e93149
(last paragraph...)

http://groups.google.com/group/comp.programming.threads/msg/03da0468e1472b44

http://groups.google.com/group/comp.programming.threads/msg/da0648e38c8194ec

http://groups.google.com/group/comp.programming.threads/msg/b0efcf84079aa780

Oh well... At least I have managed to get a handful of interested entities
to use it... Its hard times out there in the world of expert threading...

;^(...

Chris Thomasson

unread,

Jul 29, 2007, 3:23:36 AM7/29/07

to

"Logan Shaw" <lshaw-...@austin.rr.com> wrote in message
news:46a69bde$0$12191$4c36...@roadrunner.com...

> Chris Thomasson wrote:
>> "Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
>> news:1185265743.4...@j4g2000prf.googlegroups.com...
>
>>> Looks like this:
>>>
>>> T* local = 0;
>>> {
>>> local = new T();
>>> mutex_t mutex2;
>>> mutex2.lock();
>>> mutex2.unlock();
>>> }
>>> instance = local;
>> [...]
>>
>> I think the call to mutex2.lock/unlock can be optimized away...
>
> By what?

[...]

A compiler that can notice that the special synchronization object was
created on the stack and actually protects nothing that it can determine as
critical.

Logan Shaw

unread,

Jul 29, 2007, 11:52:42 AM7/29/07

to

Chris Thomasson wrote:
> "Logan Shaw" <lshaw-...@austin.rr.com> wrote in message
> news:46a69bde$0$12191$4c36...@roadrunner.com...
>> Chris Thomasson wrote:
>>> "Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
>>> news:1185265743.4...@j4g2000prf.googlegroups.com...

>>>> T* local = 0;

>>>> {
>>>> local = new T();
>>>> mutex_t mutex2;
>>>> mutex2.lock();
>>>> mutex2.unlock();
>>>> }
>>>> instance = local;

>>> I think the call to mutex2.lock/unlock can be optimized away...

>> By what?

> A compiler that can notice that the special synchronization object was

> created on the stack and actually protects nothing that it can determine
> as critical.

You're talking about a compiler that is allowed to make assumptions about
the semantics of methods on an object of type mutex_t. I guess whether it
can do that would depend on how smart the compiler is and on whether it
has access to the definitions of those methods at the time it is compiling
this particular compilation unit. So I guess it depends on where the
implementation of mutex_t comes from. I was assuming it's external, i.e.
part of some library. If that's the case, I don't see how the compiler
knows anything about lock() and unlock() other than how to call the methods.

- Logan

Dmitriy Vyukov

unread,

Jul 30, 2007, 1:06:54 AM7/30/07

to

On Jul 24, 7:31 am, "Chris Thomasson" <cris...@comcast.net> wrote:

> I think the call to mutex2.lock/unlock can be optimized away...

Hmmm....
I think mutex2 can be moved to global scope - near to mutex variable.
Anyway I consider this implementation as "back-off", and definitely
user must make some investigation whether this implementation is
correct in particular "unusual" environment :)

Dmitriy V'jukov

Chris Thomasson

unread,

Jul 30, 2007, 9:35:14 AM7/30/07

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:Kv%oi.6935$XL4.944@trndny04...

> Dmitriy Vyukov wrote:
>> On 21 , 22:08, Joe Seigh <jseigh...@xemaps.com> wrote:
>>
>>
>>>You're not using mutexes in a conventional manner. What you need to do
>>>is formally prove that if a particular mutex implementation breaks your
>>>DCL implementation, it will break a conventional mutex usage. This will
>>>be rather difficult because of the lack of formal semantics.
>>
>>
>> It is impossible to write fully portable lock-free (not ordinary lock-
>> based) code in C/C++ now. So probably it is better to say not "Almost
>> POSIX compatible" but "It will work on most modern OSes and hardware
>> platforms and with high probability on your particular platform, but
>> you still need to manually check it's operation" :)
>> Is this formulation provoke objections? :)
>>
>
> Sun has a mutex implementation that avoid having a store/load membar.

I think its this one:

http://portal.acm.org/citation.cfm?doid=1167515.1167496

Humm...

David Schwartz

unread,

Jul 30, 2007, 6:44:32 PM7/30/07

to

On Jul 29, 8:52 am, Logan Shaw <lshaw-use...@austin.rr.com> wrote:

> You're talking about a compiler that is allowed to make assumptions about
> the semantics of methods on an object of type mutex_t. I guess whether it
> can do that would depend on how smart the compiler is and on whether it
> has access to the definitions of those methods at the time it is compiling
> this particular compilation unit. So I guess it depends on where the
> implementation of mutex_t comes from. I was assuming it's external, i.e.
> part of some library. If that's the case, I don't see how the compiler
> knows anything about lock() and unlock() other than how to call the methods.

This is precisely the type of argument I maintain should *always* be
rejected. The "I can't think of a way it can break" argument is not
the same as "the relevant standards guarantee that it will work". This
type of argument has no place in a thread whose subject has the word
"portable" in it.

In other words, all you are saying is it might work on some platforms.
The problem is, there is no known way to test whether a given platform
is one of those platforms.

DS

Logan Shaw

unread,

Jul 30, 2007, 9:13:09 PM7/30/07

to

I guess I just expect compilers not to second-guess the semantics of
functions. Maybe the compiler can optimize away a printf() here or
there as well?

For what it's worth, I am operating under the assumption that the
mutex_t referred to in this thread is a hypothetical type, not
something that's really described by standards. I didn't see any
#include or anything to indicate where it's coming from. So if we
want to get really picky, there are no relevant standards and there
are no rules, and no possible solution is guaranteed to work.
Either that, or mutex_t is intended to be a concrete thing of some
sort that I'm not familiar with.

- Logan

Zeljko Vrba

unread,

Jul 31, 2007, 1:33:16 AM7/31/07

to

On 2007-07-31, Logan Shaw <lshaw-...@austin.rr.com> wrote:
>
> I guess I just expect compilers not to second-guess the semantics of
> functions. Maybe the compiler can optimize away a printf() here or
> there as well?
>

In fact, they do. gcc sometimes replaces printf() with fputs() if there
are no format flags in the string, i.e. in cases like printf("zzz\n").

Chris Thomasson

unread,

Jul 31, 2007, 1:46:30 AM7/31/07

to

"Logan Shaw" <lshaw-...@austin.rr.com> wrote in message

news:46ae8ca6$0$20546$4c36...@roadrunner.com...

[...]

You have to make sure that your compiler adheres to some sort of recognized
standard in order to get any definitive guarantees. The C standard _alone_
is not going to cut it here... This is why POSIX has some fairly special
requirements that it puts on any compiler that "expects" to be able to call
it self standard conforming within the context of a compliant PThread
library... When you use threading and a standard C compiler that does not
follow the POSIX standard you are by definition off in undefined behavior
land...

However, some guarantees might be defined by a particular compilers
documentation, but if it doesn't follow a "well known" and "well respected"
standard such as POSIX, then they can "do what every they want". Microsoft
can be a case in point... They went in a different direction than POSIX wrt
their threading API's; one went left, and the standard went right. If you
use a Microsoft compiler, well, your "portable" within the realm of
Microsoft and no where else...

Dmitriy Vyukov

unread,

Jul 31, 2007, 7:16:39 PM7/31/07

to

On 31 , 09:46, "Chris Thomasson" <cris...@comcast.net> wrote:

> The C standard _alone_ is not going to cut it here...

ISO C++ compliant compiler can't remove any accesses to volatile
variables. This can be enough sometimes for synchronization. I think
that C standard has similar requirements.

Dmitriy V'jukov

Dmitriy Vyukov

unread,

Jul 31, 2007, 7:23:05 PM7/31/07

to

On 29 , 19:52, Logan Shaw <lshaw-use...@austin.rr.com> wrote:

> I guess whether it
> can do that would depend on how smart the compiler is and on whether it
> has access to the definitions of those methods at the time it is compiling
> this particular compilation unit.

Google by words 'Whole Program Optimization/Global Optimizations/Link-
Time code generation'. At least since 2003 C++ compilers can look into
other translation units and inline functions which definitions is not
visible in translation unit.

Dmitriy V'jukov

Chris Thomasson

unread,

Aug 1, 2007, 12:35:33 AM8/1/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:1185923799.2...@o61g2000hsh.googlegroups.com...

> On 31 , 09:46, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> The C standard _alone_ is not going to cut it here...
>
> ISO C++ compliant compiler can't remove any accesses to volatile
> variables. This can be enough sometimes for synchronization.

Humm. The keyword is 'sometimes'... ;^)

> I think
> that C standard has similar requirements.

Yup.

David Schwartz

unread,

Aug 1, 2007, 5:10:45 AM8/1/07

to

On Jul 31, 4:16 pm, Dmitriy Vyukov <dvyu...@gmail.com> wrote:

> ISO C++ compliant compiler can't remove any accesses to volatile
> variables.

The as-if rule allows the C++ compiler to remove anything it wants to
provided compliant code can't tell the difference. And don't tell me
that volatile accesses are defined to be observable behavior because
that's completely meaningless.

> This can be enough sometimes for synchronization. I think
> that C standard has similar requirements.

The requirements are incomprehensible because they don't define what
an 'access' is nor where one is 'observed'. For example, consider:

volatile int i, j;
i++;
j++;

Now, the standard says these two accesses must occur in the specified
order. But *where*? On the front side bus? Between the CPU and the
cache? Between the CPU and physical memory? Does this mean no
speculative fetching is allowed? What if the hardware has no way to
avoid speculative fetching?

In the context of the C/C++ abstract machine, it is not clear what an
'access' is nor what it means to 'observe' one. As a result, the
semantics of 'volatile' are implementation defined. (With the
exception of things like signals and logjmp, of course.)

DS

Dmitriy Vyukov

unread,

Aug 1, 2007, 6:09:43 AM8/1/07

to

On 1 , 13:10, David Schwartz <dav...@webmaster.com> wrote:

> volatile int i, j;
> i++;
> j++;
>
> Now, the standard says these two accesses must occur in the specified
> order. But *where*? On the front side bus? Between the CPU and the
> cache? Between the CPU and physical memory? Does this mean no
> speculative fetching is allowed? What if the hardware has no way to
> avoid speculative fetching?
>
> In the context of the C/C++ abstract machine, it is not clear what an
> 'access' is nor what it means to 'observe' one. As a result, the
> semantics of 'volatile' are implementation defined. (With the
> exception of things like signals and logjmp, of course.)

I don't care where accesses will be since I don't write code like
this:
volatile int* p = 0x123;
set_memory_as_uncachable(0x123);
(this is definitely not portable C++ code)

But I think that requirements is still sufficient to write code like
this:
volatile int f = 0;
void f1()
{
f = 1;
}
void f2()
{
while (!f);
//...
}

So, as I understand, standart requires that accesses to volatile
variables must be as "memory accesses" in generated target code (in
the sense what is meant by "memory accesses" in the terms of target
code). Note that this is unimportant what is exactly "memory accesses"
in the terms of target code.

Dmitriy V'jukov

David Schwartz

unread,

Aug 1, 2007, 9:34:30 AM8/1/07

to

On Aug 1, 3:09 am, Dmitriy Vyukov <dvyu...@gmail.com> wrote:

> So, as I understand, standart requires that accesses to volatile
> variables must be as "memory accesses" in generated target code (in
> the sense what is meant by "memory accesses" in the terms of target
> code). Note that this is unimportant what is exactly "memory accesses"
> in the terms of target code.

No, you are mistaken. The problem is precisely that what exactly
"memory accesses" is is undefined, so you can't know that you're
getting what you need. That is, you need something specific, but you
are guaranteed something unspecific. So you can't be sure you're
actually getting what you need.

The C and C++ standards do not say anything about what generated
target code must contain. They say what that code must make the system
do.

Think about this for a moment because it's a very important
distinction. If the C standard says X has to come before Y, that
doesn't mean the assembly instructions for X must occur before Y in
memory or even in the instruction stream. It means X must actually
take place before Y when the code is actually running. What assembly
code does this is not anything the standard cares about. The standard
says what the generated code must *do* when it runs, not how it must
be structures.

So if the C standard were truly able to say that accesses to
'volatile' variables had to occur in order, that doesn't mean the
compiler generates assembly code with the accesses in order. That
means the compiler generates assembly code to ensure the accesses
actually occur in order. (In other words, memory barriers would be
required.)

Note that compilers *DO* *NOT* do this. This is not because the
standards say things about the generated code. This is because the
requirement you think exists does not exist. It is destroyed by three
things:

1) The as-if rule.

2) The inability to define 'visible' in terms of an abstract machine.

3) The inability to define 'access' in terms of an abstract machine.

DS

Dmitriy Vyukov

unread,

Aug 1, 2007, 10:19:17 AM8/1/07

to

Probably you are right.

So this code:

volatile int f = 0;
void f1()
{
f = 1;
}
void f2()
{
while (!f);
//...
}

is not working too. Am I understand you correctly?

And it is impossible to write even device drivers on C++ (single-
threaded). Am I understand you correctly?
But I think that creators of ISO C/C++ was wanted that it would be
possible to create at least device drivers on C/C++... Bjarne
Stroustrup wrote some papers about device drivers in C++...

Dmitriy V'jukov

David Schwartz

unread,

Aug 1, 2007, 10:36:49 AM8/1/07

to

On Aug 1, 7:19 am, Dmitriy Vyukov <dvyu...@gmail.com> wrote:
> On Aug 1, 5:34 pm, David Schwartz <dav...@webmaster.com> wrote:

> Probably you are right.
>
> So this code:
> volatile int f = 0;
> void f1()
> {
> f = 1;}
>
> void f2()
> {
> while (!f);
> //...
>
> }
>
> is not working too. Am I understand you correctly?

It relies on implementation-defined behavior. It will work on an
implementation where the behavior is defined the way you want it and
not where it isn't.

> And it is impossible to write even device drivers on C++ (single-
> threaded). Am I understand you correctly?

No. I could imagine a platform or set of platforms where all you need
in addition to C++ is provided by the platform. But I don't think you
can write a device driver in pure C++ because the standard doesn't
provide the necessary APIs. It may be that all you need are the APIs.

> But I think that creators of ISO C/C++ was wanted that it would be
> possible to create at least device drivers on C/C++... Bjarne
> Stroustrup wrote some papers about device drivers in C++...

And he succeeded. It is in fact possible to write device drivers in
both C and C++. Of course, you do need at least some things that are
not in the standard to do so.

I think you are confusing a whole bunch of completely separate issues.
The C/C++ language are perfectly suitable to writing multithreaded
device drivers, but they're not going to be portable across C/C++
implementations unless those platforms all provide the same "stuff" to
make it work.

DS