membars and DCLP yet again

frege

unread,

Feb 17, 2007, 11:55:54 PM2/17/07

to

So, to recap, given:

initItem - the object being initted
initFlag - the flag saying it has been initted

starting with the write/release barrier:

initItem = foo;
initFlag = true;

So I understand why we need a release barrier here - otherwise someone
could read initFlag and assume initItem was initted, when it wasn't
(because the order of these two lines - or more likely the order of
the memory writes - was changed by the compiler/cpu/architecture/etc).

So, where *exactly* does the barrier go? I would suspect it should be
*between* the 2 lines:

initItem = foo;
release_barrier;
initFlag = true;

but typically, it is done 'during' the initFlag write. What does
during mean? ie:

InterlockedIncrementRelease(&initFlag);

or
LOCK cmpxchg initFlag, ... ; something like that in pseudo-asm
etc

Do we just assume the barrier is in the right place? Or should the
'InterlockedIncrement' actually be on the initItem? I suspect (mostly
because of common usage) that the Interlock goes on the flag. But it
would be good to know for sure.

Anyhow, here's the bigger question: the read barrier: typically the
code checks if (initFlag == false) and then inits, but the important
part is actually the initFlag == true case, so:

if (initFlag)
{
// hey initItem is initted, use it:
initItem.bar(); // boom?!
}

So assuming we used the proper write barrier, how can the reads get
reordered?:
- given that there is an 'if' in there, I don't think the compiler is
going to reorder the instructions
- are we concerned that the CPU does speculative execution are reads
initItem before reading initFlag? Or that there might be speculative
execution coupled with memory queue reordering? I think so.

So what about making initItem DEPEND on initFlag somehow. ie what if
initFlag = [ 0 or ptr to initItem ]. So:

if (initFlagAndPointer)
{
// hey, initItem is initted and we have a pointer to it, use it:
initFlagAndPointer->bar();
}

Am I correct to assume that the CPU can't 'speculate' on what
initFlagAndPointer points to before reading it for the 'if'? And
similarly, can't get past the 'if' until the memory queue has actually
retrieved the pointer, so no memory reordering can happen?

So is a read barrier still needed in this case, or is this a case of
the 'dependent load' stuff that Chris 'SenderX' :-) keeps talking
about?

Thanks for reading to the end :-)
- Tony

Chris Thomasson

unread,

Feb 19, 2007, 6:53:33 PM2/19/07

to

"frege" <gottlo...@gmail.com> wrote in message
news:1171774554.5...@m58g2000cwm.googlegroups.com...
> So, to recap, given:
[...]

> So I understand why we need a release barrier here -

[...]

> So, where *exactly* does the barrier go? I would suspect it should be
> *between* the 2 lines:
>
> initItem = foo;
> release_barrier;
> initFlag = true;
>
> but typically, it is done 'during' the initFlag write.

[...]

> Or should the
> 'InterlockedIncrement' actually be on the initItem? I suspect (mostly
> because of common usage) that the Interlock goes on the flag. But it
> would be good to know for sure.

This is architecture dependant I'm afraid. The older x86's did not have
actual memory barrier instructions; you just rely on the LOCK prefix. Now
they have mfence, so, you could do it like this:

initItem = foo;
mfence
initFlag = true;

instead of this:

initItem = foo;
InterlockedExchange(&initFlag, true);

> Anyhow, here's the bigger question: the read barrier: typically the

[...]

> So assuming we used the proper write barrier, how can the reads get
> reordered?:

They will be reordered in your example simply because you have "introduced a
level of indirection with the 'init flag' ". You don't have to use a flag!
:^) Instead, you can simply use a pointer to the object as the flag itself
and use a NULL value to differentiate the state of the DCL.

[...]

> So what about making initItem DEPEND on initFlag somehow. ie what if
> initFlag = [ 0 or ptr to initItem ]. So:

[...]

You can use the dependant load ordering "optimization" on DCL when you
"remove all forms of indirection." After you do that, the dependant load
semantics work on every "current" architecture out there EXCEPT the Alpha.
The Alpha requires you to use a barrier after the load. Okay... Just to sum
up... Here is correct pseudo-code for a DCL pattern that does not use that
level of indirection (e.g., extra flag var):

/* First things first... We need to declare/define a load with dependant
membar semantics: */
-----------------------------------
/* declaration */
extern "C" void* atomic_loadptr_depends(void* volatile*);

/* definition */
#if ! defined(ArchAlpha)
void* atomic_loadptr_depends(void * volatile *_this) {
return *_this;
}
#else
void* atomic_loadptr_depends(void * volatile *_this) {
void *ptr = *_this;
membar #LoadLoad;
return ptr;
}
#endif
-----------------------------------

/* Now we need to declare/define a store with release membar semantics: */
-----------------------------------
/* declare */
extern "C" void* atomic_storeptr_release(void * volatile*, void*);

/* define */
void* atomic_storeptr_release(void * volatile *_this, void *ptr) {
membar #LoadStore | #StoreStore;
*_this = ptr;
return ptr;
}
-----------------------------------

/* Okay, lets build the algorithm: */
-----------------------------------
typedef struct dcl_s dcl_t;

struct dcl_s {
void * volatile ptr;
};

#define DCL_STATICINIT() {0}
extern "C" void* dcl_load(dcl_t*, void* (*fp_ctor) (void));

void* dcl_load(dcl_t *_this, void* (*fp_ctor) (void)) {
void *ptr = atomic_loadptr_depends(&_this->ptr);
if (! ptr) {
static_mutex_lock(...);
ptr = _this->ptr;
if (! ptr) {
ptr = atomic_storeptr_release(&_this->ptr, fp_ctor());
}
static_mutex_unlock(...);
}
return ptr;
}
-----------------------------------

/* Okay, lets use the algorithm: */
-----------------------------------
static dcl_t mydcl = DCL_STATICINIT();

/* my ctor function */
void* myctor(void) {
void *_this = /* construct object */
return _this;
}

/* whatever! ;^) */
void whatever(...) {
void *_this = dcl_load(&mydcl, myctor);
if (_this) {
/* use _this all you want! ;^) */
}
}
-----------------------------------

Well, barring any typos in my pseudo-code... Any thoughts anybody?

;^)

Chris Thomasson

unread,

Feb 19, 2007, 7:10:59 PM2/19/07

to

[...]

> *exactly* does the barrier go?

[...]

You don't want a release barrier on the ptr because the subsequent setting
of the flag can be hoisted above it:

1: initPtr = new obj;
2: membar #LoadStore | #StoreStore;
3: initFlag = true;

Its 100% legit for step 3 to be executed before steps 1 or 2. When you use
the indirection via the initflag you need to use a full memory barrier
instead of a classic release.

frege

unread,

Feb 20, 2007, 12:19:40 AM2/20/07

to

On Feb 19, 6:53 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
> > So assuming we used the proper write barrier, how can the reads get
> > reordered?:
>
> They will be reordered in your example simply because you have "introduced a
> level of indirection with the 'init flag' ".

I don't want just 'rules' to follow. I'm hoping to actually
understand what's going on. But maybe each architecture is different
enough that all you can do is walk around with a set of rules in your
head. :-(. Anyhow, that's why I listed some reasons why things could
get reordered - ie 'speculative execution'.

> You don't have to use a flag!

yes, that's what I was leading up to.

> :^) Instead, you can simply use a pointer to the object as the flag itself
> and use a NULL value to differentiate the state of the DCL.
>
> [...]
>
> > So what about making initItem DEPEND on initFlag somehow. ie what if
> > initFlag = [ 0 or ptr to initItem ]. So:
>
> [...]
>
> You can use the dependant load ordering "optimization" on DCL when you
> "remove all forms of indirection." After you do that, the dependant load
> semantics work on every "current" architecture out there EXCEPT the Alpha.

So here, again, I'm hoping to actually understand. To me it makes
sense that, assuming we got the write handled correctly, the read has
a dependency:

if (ptr)
{
read data at ptr
}

ie the *understanding* that I'm guessing at is that it is *impossible*
for the architecture to read data at ptr BEFORE checking that ptr is
not null.

> The Alpha requires you to use a barrier after the load.

So that's the real interesting case - how and why can the Alpha
require that? We already got things written in the correct order.
We've setup the read to check the pointer first. Is it that the Alpha
doesn't actually re-read the memory, and might return old data?
That's the kind of understanding I'm looking for. ie, for a Singleton
object, do we need to worry about 're-read' since it was never read
(on the CPU) the first time, and once read, maybe doesn't need to be
read again?

ie I need understanding to be able to apply the rules in new
situations.

> Okay... Just to sum
> up... Here is correct pseudo-code for a DCL pattern that does not use that
> level of indirection (e.g., extra flag var):
>

[code...]

> Well, barring any typos in my pseudo-code... Any thoughts anybody?
>

The code looks good, makes sense, should work. But it is, to me, just
rules, not understanding.

Tony

Chris Thomasson

unread,

Feb 20, 2007, 1:12:17 AM2/20/07

to

"frege" <gottlo...@gmail.com> wrote in message

news:1171948780.6...@p10g2000cwp.googlegroups.com...

> On Feb 19, 6:53 pm, "Chris Thomasson" <cris...@comcast.net> wrote:

[...]

>> The Alpha requires you to use a barrier after the load.
>

> So that's the real iteresting case - how and why can the Alpha
> require that?

I would advise you to ask this question over in comp.arch. They can tell you
how the Alphas hardware works it all out... If you want that kind of detail,
well, read the architecture manual. Seriously though... If your going to do
lock-free programming, well, reading he architectures ISA manual is a MUST
anyway.

FWIW, the Alpha's way of doing things can allow you to scale real good...
Its an optimization to leave out the barrier. The architectures that do the
barrier automatically are basically sacrificing performance for simplicity.
Without the implicit/automatic barrier you have the ability to amortize many
difference algorithms by the means of reading batches of pointer values
before you actually execute the load barrier and actually start
dereferencing them... Can't do that with the x86 because each load is an
implied membar #LoadStore | #LoadLoad... Each load on the Alpha is truly
naked!

David Hopwood

unread,

Feb 21, 2007, 8:56:07 PM2/21/07

to

Chris Thomasson wrote:
> [...]
>
>>*exactly* does the barrier go?
>
> [...]
>
> You don't want a release barrier on the ptr because the subsequent setting
> of the flag can be hoisted above it:
>
> 1: initPtr = new obj;
> 2: membar #LoadStore | #StoreStore;
> 3: initFlag = true;
>
> Its 100% legit for step 3 to be executed before steps 1 or 2.

You seem confused. By definition the #StoreStore barrier prevents that (as far
as is visible).

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Chris Thomasson

unread,

Feb 22, 2007, 5:32:50 PM2/22/07

to

"David Hopwood" <david.nosp...@blueyonder.co.uk> wrote in message
news:XK6Dh.356638$MO2.1...@fe3.news.blueyonder.co.uk...

> Chris Thomasson wrote:
>> [...]
>>
>>>*exactly* does the barrier go?
>>
>> [...]
>>
>> You don't want a release barrier on the ptr because the subsequent
>> setting
>> of the flag can be hoisted above it:
>>
>> 1: initPtr = new obj;
>> 2: membar #LoadStore | #StoreStore;
>> 3: initFlag = true;
>>
>> Its 100% legit for step 3 to be executed before steps 1 or 2.
>
> You seem confused.

Okay. Step 3 was a store. Yikes! To clear things up, if Step 3 was a load
from initFlag, then it could be hoisted above Steps 1 or 2:

http://groups.google.com/group/comp.programming.threads/msg/2f2ec4c60b6a5d7c