Race in TBB?

Dmitriy V'jukov

unread,

May 8, 2008, 7:16:53 PM5/8/08

to

I am looking at the sources of Intel TBB (20080408oss version), and I
can't understand one moment. It looks like a data race which can lead
to worker thread blocking for arbitrary amount of time even if there
is work to steal. I don't fully examine all the code, so maybe I am
wrong... I hope so.

It's about class Gate (actually eventcount) and function
GenericScheduler::spawn().

My understanding is that spawn() must be implemented this way:

// pseudo-code
void spawn(task* t)
{
make_task_available_for_other_threads(t);
store_load_memory_fence(); // crucial!
gate_state_t state = gate.get_state();
if (need_signalling(state))
gate.signal();
}

In TBB implementation I don't see store_load_memory_fence().
Here is cuts from TBB implementation with my comments:

void GenericScheduler::spawn( task& first, task*& next ) {

// a lot of irrelevant code skipped

if( d>deepest )
deepest = d;
if( d<tp->prefix().steal_begin )
tp->prefix().steal_begin = d;

// at this point task is available to other threads

// no store-load fence, only store-release
release_task_pool();

mark_pool_full();
}

inline void GenericScheduler::mark_pool_full() {
// naked load
Gate::state_t snapshot = arena->prefix().gate.get_state();
// check whether we need to signal gate
if( snapshot!=SNAPSHOT_FULL
&& snapshot!=SNAPSHOT_PERMANENTLY_OPEN )
// signalling
arena->prefix().gate.try_update(
SNAPSHOT_FULL, SNAPSHOT_PERMANENTLY_OPEN,
true );
}

state_t Gate::get_state() const {
// naked load
return state;
}

So I don't see store_load_memory_fence() between making task available
to other threads and loading of gate (eventcount) state. This can lead
to situation when thread, which executes GenericScheduler::spawn(),
will incorrectly decide to *NOT* signal gate, when it actually has to.

Dmitriy V'jukov

Chris Thomasson

unread,

May 8, 2008, 7:54:51 PM5/8/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
news:4c2b16ac-0b68-4581...@d77g2000hsb.googlegroups.com...

>I am looking at the sources of Intel TBB (20080408oss version), and I
> can't understand one moment. It looks like a data race which can lead
> to worker thread blocking for arbitrary amount of time even if there
> is work to steal. I don't fully examine all the code, so maybe I am
> wrong... I hope so.
>
> It's about class Gate (actually eventcount) and function
> GenericScheduler::spawn().

[...]

Humm, it seems like the developers have studied up on eventcounts; I did not
know that Intel TBB uses them. How far off is their implementation from by
original algorithm?

Chris Thomasson

unread,

May 8, 2008, 7:59:17 PM5/8/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
news:4c2b16ac-0b68-4581...@d77g2000hsb.googlegroups.com...

>I am looking at the sources of Intel TBB (20080408oss version), and I
> can't understand one moment. It looks like a data race which can lead
> to worker thread blocking for arbitrary amount of time even if there
> is work to steal. I don't fully examine all the code, so maybe I am
> wrong... I hope so.

[...]

> So I don't see store_load_memory_fence() between making task available
> to other threads and loading of gate (eventcount) state. This can lead
> to situation when thread, which executes GenericScheduler::spawn(),
> will incorrectly decide to *NOT* signal gate, when it actually has to.

I kind of looks like they reinvented the AppCore version _before_ I put the
following fix in:

http://groups.google.com/group/comp.programming.threads/msg/b1a8659cd71456b2

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/cf4a952ff5af7d

What do you think? Its missing a #StoreLoad barrier in the exact same place
as the AppCore bug in the eventcount did... I think you did find a
race-condition in TBB indeed.

:^o

Chris Thomasson

unread,

May 8, 2008, 8:04:33 PM5/8/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message
news:4c2b16ac-0b68-4581...@d77g2000hsb.googlegroups.com...

>I am looking at the sources of Intel TBB (20080408oss version), and I
> can't understand one moment. It looks like a data race which can lead
> to worker thread blocking for arbitrary amount of time even if there
> is work to steal. I don't fully examine all the code, so maybe I am
> wrong... I hope so.
>
> It's about class Gate (actually eventcount) and function
> GenericScheduler::spawn().
>
> My understanding is that spawn() must be implemented this way:
>
> // pseudo-code
> void spawn(task* t)
> {
> make_task_available_for_other_threads(t);
> store_load_memory_fence(); // crucial!
> gate_state_t state = gate.get_state();
> if (need_signalling(state))
> gate.signal();
> }

I would say that spawn should look like:

void spawn(task* t)
{
store_release_memory_fence();
make_visible_to_other_threads(t);

store_load_memory_fence(); // crucial!
gate_state_t state = gate.get_state();
if (need_signalling(state))
gate.signal();
}

[...]

?

Dmitriy V'jukov

unread,

May 9, 2008, 4:56:01 AM5/9/08

to

On 9 май, 03:54, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Dmitriy V'jukov" <dvyu...@gmail.com> wrote in message

It's very similar. Here is code for Gate (eventcount):

class Gate {
public:
typedef intptr state_t;
private:
//! If state==0, then thread executing wait() suspend until state
becomes non-zero.
state_t state;
CRITICAL_SECTION critical_section;
HANDLE event;
public:
//! Initialize with count=0
Gate() :
state(0)
{
event = CreateEvent( NULL, true, false, NULL );
InitializeCriticalSection( &critical_section );
}
~Gate() {
CloseHandle( event );
DeleteCriticalSection( &critical_section );
}
//! Get current state of gate
state_t get_state() const {
return state;
}
//! Update state=value if state==comparand (flip==false) or state!
=comparand (flip==true)
void try_update( intptr value, intptr comparand, bool flip=false )
{
__TBB_ASSERT( comparand!=0 || value!=0, "either value or
comparand must be non-zero" );
EnterCriticalSection( &critical_section );
state_t old = state;
if( flip ? old!=comparand : old==comparand ) {
state = value;
if( !old )
SetEvent( event );
else if( !value )
ResetEvent( event );
}
LeaveCriticalSection( &critical_section );
}
//! Wait for state!=0.
void wait() {
if( state==0 ) {
WaitForSingleObject( event, INFINITE );
}
}
};

Here is signalling:

inline void GenericScheduler::mark_pool_full() {
// Double-check idiom

Gate::state_t snapshot = arena->prefix().gate.get_state();

if( snapshot!=SNAPSHOT_FULL && snapshot!
=SNAPSHOT_PERMANENTLY_OPEN )

arena->prefix().gate.try_update( SNAPSHOT_FULL,
SNAPSHOT_PERMANENTLY_OPEN, true );
}

Waiting is a bit complicated:

bool GenericScheduler::wait_while_pool_is_empty() {
for(;;) {

Gate::state_t snapshot = arena->prefix().gate.get_state();

switch( snapshot ) {
case SNAPSHOT_EMPTY:
arena->prefix().gate.wait();
return true;
case SNAPSHOT_FULL: {
// Use unique id for "busy" in order to avoid ABA
problems.
const Gate::state_t busy = Gate::state_t(this);
// Request permission to take snapshot
arena->prefix().gate.try_update( busy,
SNAPSHOT_FULL );
if( arena->prefix().gate.get_state()==busy ) {
// Got permission. Take the snapshot.
size_t n = arena->prefix().limit;
size_t k;
for( k=0; k<n; ++k )
if( arena->slot[k].steal_end>=0 )
break;
// Test and test-and-set.
if( arena->prefix().gate.get_state()==busy ) {
if( k>=n ) {
arena-
>prefix().gate.try_update( SNAPSHOT_EMPTY, busy );
continue;
} else {
arena-
>prefix().gate.try_update( SNAPSHOT_FULL, busy );
}
}
}
return false;
}
default:
// Another thread is taking a snapshot or gate is
permanently open.
return false;
}
}
}

Dmitriy V'jukov

unread,

May 9, 2008, 4:56:42 AM5/9/08

to

On 9 май, 03:59, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Dmitriy V'jukov" <dvyu...@gmail.com> wrote in message

>
> news:4c2b16ac-0b68-4581...@d77g2000hsb.googlegroups.com...
>
> >I am looking at the sources of Intel TBB (20080408oss version), and I
> > can't understand one moment. It looks like a data race which can lead
> > to worker thread blocking for arbitrary amount of time even if there
> > is work to steal. I don't fully examine all the code, so maybe I am
> > wrong... I hope so.
> [...]
> > So I don't see store_load_memory_fence() between making task available
> > to other threads and loading of gate (eventcount) state. This can lead
> > to situation when thread, which executes GenericScheduler::spawn(),
> > will incorrectly decide to *NOT* signal gate, when it actually has to.
>
> I kind of looks like they reinvented the AppCore version _before_ I put the
> following fix in:
>

> http://groups.google.com/group/comp.programming.threads/msg/b1a8659cd...
>
> http://groups.google.com/group/comp.programming.threads/browse_frm/th...

>
> What do you think? Its missing a #StoreLoad barrier in the exact same place
> as the AppCore bug in the eventcount did... I think you did find a
> race-condition in TBB indeed.
>
> :^o

Exactly! :)

Dmitriy V'jukov

unread,

May 9, 2008, 5:02:00 AM5/9/08

to

On 9 май, 04:04, "Chris Thomasson" <cris...@comcast.net> wrote:

> I would say that spawn should look like:
>
> void spawn(task* t)
> {
> store_release_memory_fence();
> make_visible_to_other_threads(t);
> store_load_memory_fence(); // crucial!
> gate_state_t state = gate.get_state();
> if (need_signalling(state))
> gate.signal();
>
> }
>
> [...]
>
> ?

Yes, you are right. There is mutex acquire (with full fence) or
something like that before make_visible_to_other_threads(t)
(implementation is a bit complicated). So I don't think that
store_release_memory_fence() is missing.

Dmitriy V'jukov

Chris Thomasson

unread,

May 12, 2008, 6:32:37 PM5/12/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:d3f59e02-376b-4de7...@s50g2000hsb.googlegroups.com...

On 9 май, 03:54, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Dmitriy V'jukov" <dvyu...@gmail.com> wrote in message
> >
> > news:4c2b16ac-0b68-4581...@d77g2000hsb.googlegroups.com...>I
> > am looking at the sources of Intel TBB > (20080408oss version), and I
> > > can't understand one moment. It looks like a data race which can lead
> > > to worker thread blocking for arbitrary amount of time even if there
> > > is work to steal. I don't fully examine all the code, so maybe I am
> > > wrong... I hope so.
> >
> > > It's about class Gate (actually eventcount) and function
> > > GenericScheduler::spawn().
> >
> > [...]
> >
> > Humm, it seems like the developers have studied up on eventcounts; I did
> > not
> > know that Intel TBB uses them. How far off is their implementation from
> > by
> > original algorithm?

> It's very similar. Here is code for Gate (eventcount):

It looks more futex-like in that the state variable is able to be set by the
calling thread in 'try_update()'. My eventcount only maintains a count and
waiters-bit and this information is not mutated with values determined by
any client code. I am not sure I like the ability to flip the slow-path gate
on and off. There might be a race-condition with the way they are using the
windows event object. Humm, interesting.

> class Gate {
> public:
> typedef intptr state_t;
> private:
> //! If state==0, then thread executing wait() suspend until state
> becomes non-zero.
> state_t state;
> CRITICAL_SECTION critical_section;
> HANDLE event;
> public:
> //! Initialize with count=0
> Gate() :
> state(0)
> {
> event = CreateEvent( NULL, true, false, NULL );

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I wonder what TBB does when event just happens to be NULL!?

They need something like:

Gate() :
state(0), event(CreateEvent( NULL, TRUE, FALSE, NULL ))
{
if (! event) {
throw tbb::error::out_of_sync_resources();
}

> InitializeCriticalSection( &critical_section );
> }

[...]

Also, why don't they ever check return values from any WinAPI calls? What
happens if shi%t hits the fan and one of these functions fail... There is no
means for any attempt at a shutdown and possible recover procedure... Which
version are you looking at?

Anyway, IMVHO, their signaling and waiting look overly complex when compared
to the simplicity of something like my algorithm. Their wait while pool is
empty can be reduced to the following pseudo-code:

bool try_to_process_arena(
arena* _this,
snapshot* s,
void (*fp_process) (snapshot*)
) {
s->take(_this);
if (! s->is_empty()) {
fp_process(s);
return true;
}
return false;
};

void wait_for_arena_processing(
arena* _this,
snapshot* s,
void (*fp_process) (snapshot*)
) {
while (! try_to_process_pool(_this, s, fp_process)) {
eventcount::count_type const ekey = arena->ecount->get();
if (try_to_process_pool(_this, s, fp_process)) { break; }
arena->ecount->wait(ekey);
}
}

void spawn(arena* _this, task* t) {
arena->push(t);
arena->ecount->signal(membar::store_load);
}

Dmitriy V'jukov

unread,

May 13, 2008, 1:51:16 AM5/13/08

to

On 13 май, 02:32, "Chris Thomasson" <cris...@comcast.net> wrote:

> > > Humm, it seems like the developers have studied up on eventcounts; I did
> > > not
> > > know that Intel TBB uses them. How far off is their implementation from
> > > by
> > > original algorithm?
> > It's very similar. Here is code for Gate (eventcount):
>
> It looks more futex-like in that the state variable is able to be set by the
> calling thread in 'try_update()'. My eventcount only maintains a count and
> waiters-bit and this information is not mutated with values determined by
> any client code. I am not sure I like the ability to flip the slow-path gate
> on and off. There might be a race-condition with the way they are using the
> windows event object. Humm, interesting.

What race?

> > event = CreateEvent( NULL, true, false, NULL );
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> I wonder what TBB does when event just happens to be NULL!?

If CreateEvent() will fail, then, I think, TBB will be just working in
'busy-waiting' mode :)

> Also, why don't they ever check return values from any WinAPI calls? What
> happens if shi%t hits the fan and one of these functions fail... There is no
> means for any attempt at a shutdown and possible recover procedure... Which
> version are you looking at?

Latest stable version: 20_20080408oss:
http://www.threadingbuildingblocks.org/file.php?fid=77

> Anyway, IMVHO, their signaling and waiting look overly complex when compared
> to the simplicity of something like my algorithm. Their wait while pool is
> empty can be reduced to the following pseudo-code:

Sometimes (when library shutdown begins) they set Gate's state to
SNAPSHOT_PERMANENTLY_OPEN. It's the 'final' state, i.e. state can't
switch to
SNAPSHOT_EMPTY/SNAPSHOT_FULL anymore.
Can this be modelled with eventcount?

Dmitriy V'jukov

Chris Thomasson

unread,

May 13, 2008, 8:17:38 PM5/13/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:d46c1362-6cb3-48e1...@a1g2000hsb.googlegroups.com...

> On 13 май, 02:32, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>
> > > > Humm, it seems like the developers have studied up on eventcounts; I
> > > > did
> > > > not
> > > > know that Intel TBB uses them. How far off is their implementation
> > > > from
> > > > by
> > > > original algorithm?
> > > It's very similar. Here is code for Gate (eventcount):
> >
> > It looks more futex-like in that the state variable is able to be set by
> > the
> > calling thread in 'try_update()'. My eventcount only maintains a count
> > and
> > waiters-bit and this information is not mutated with values determined
> > by
> > any client code. I am not sure I like the ability to flip the slow-path
> > gate
> > on and off. There might be a race-condition with the way they are using
> > the
> > windows event object. Humm, interesting.

> What race?

Not sure there is one... Well, there is one point... It seems like they are
assuming that a call to SetEvent on a manual reset event object will wake
all currently waiting threads. This is not the case. If there are 10 waiting
threads on a manual event, a call to SetEvent might only wake 1, and when
the state swings back to a point where it can be flipped, well, there could
still be those 9 threads in there. Microsoft documentation explicitly
mentions this. This is an inherent race-condition associated with "certain
uses" of manual reset events...

> > > event = CreateEvent( NULL, true, false, NULL );
> >
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > I wonder what TBB does when event just happens to be NULL!?

> If CreateEvent() will fail, then, I think, TBB will be just working in
> 'busy-waiting' mode :)

Ouch.

> > Also, why don't they ever check return values from any WinAPI calls?
> > What
> > happens if shi%t hits the fan and one of these functions fail... There
> > is no
> > means for any attempt at a shutdown and possible recover procedure...
> > Which
> > version are you looking at?

> Latest stable version: 20_20080408oss:
> http://www.threadingbuildingblocks.org/file.php?fid=77

I wonder why they fail to check the return value from winapi calls. IMVHO,
that just bad practice for any commercial-grade software package to follow.

> > Anyway, IMVHO, their signaling and waiting look overly complex when
> > compared
> > to the simplicity of something like my algorithm. Their wait while pool
> > is
> > empty can be reduced to the following pseudo-code:

> Sometimes (when library shutdown begins) they set Gate's state to
> SNAPSHOT_PERMANENTLY_OPEN. It's the 'final' state, i.e. state can't
> switch to
> SNAPSHOT_EMPTY/SNAPSHOT_FULL anymore.
> Can this be modelled with eventcount?

Well, my eventcounts state cannot be set to any user-defined values.
However, you should be able to add this into a user-defined predicate
without problems.

Dmitriy V'jukov

unread,

May 14, 2008, 5:01:59 AM5/14/08

to

Chris Thomasson wrote:

>> What race?
>
> Not sure there is one... Well, there is one point... It seems like they
> are assuming that a call to SetEvent on a manual reset event object will
> wake all currently waiting threads. This is not the case. If there are
> 10 waiting threads on a manual event, a call to SetEvent might only wake
> 1, and when the state swings back to a point where it can be flipped,
> well, there could still be those 9 threads in there. Microsoft
> documentation explicitly mentions this. This is an inherent
> race-condition associated with "certain uses" of manual reset events...

Hmmm.. Can you point to documentation. I can't find this moment.

I am trying to figure out, whether race on Event is problem or not.
If Event's state flipped back to non-signaled, then it means that work-
stealing deque is empty, so there is no reason to wake any threads...
I.e if thread doesn't wake-up until Event's state flipped back to non-
signaled, so let it go on sleeping :)

>> > version are you looking at?
>
>> Latest stable version: 20_20080408oss:
>> http://www.threadingbuildingblocks.org/file.php?fid=77
>
> I wonder why they fail to check the return value from winapi calls.
> IMVHO, that just bad practice for any commercial-grade software package
> to follow.

Totally agree.

>> > Anyway, IMVHO, their signaling and waiting look overly complex when
>> > compared
>> > to the simplicity of something like my algorithm. Their wait while
>> pool > is
>> > empty can be reduced to the following pseudo-code:
>
>
>> Sometimes (when library shutdown begins) they set Gate's state to
>> SNAPSHOT_PERMANENTLY_OPEN. It's the 'final' state, i.e. state can't
>> switch to
>> SNAPSHOT_EMPTY/SNAPSHOT_FULL anymore.
>> Can this be modelled with eventcount?
>
> Well, my eventcounts state cannot be set to any user-defined values.
> However, you should be able to add this into a user-defined predicate
> without problems.

Just associate additional bool variable SNAPSHOT_PERMANENTLY_OPEN with
eventcount. And check it before signaling and waiting. Right?

Dmitriy V'jukov

Chris Thomasson

unread,

May 14, 2008, 5:38:13 AM5/14/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> wrote in message

news:2018886a-2950-49e2...@k1g2000prb.googlegroups.com...

> Chris Thomasson wrote:
>
>>> What race?
>>
>> Not sure there is one... Well, there is one point... It seems like they
>> are assuming that a call to SetEvent on a manual reset event object will
>> wake all currently waiting threads. This is not the case. If there are
>> 10 waiting threads on a manual event, a call to SetEvent might only wake
>> 1, and when the state swings back to a point where it can be flipped,
>> well, there could still be those 9 threads in there. Microsoft
>> documentation explicitly mentions this. This is an inherent
>> race-condition associated with "certain uses" of manual reset events...
>
>
> Hmmm.. Can you point to documentation. I can't find this moment.

http://msdn.microsoft.com/en-us/library/ms686211(VS.85).aspx

"The state of a manual-reset event object remains signaled until it is set
explicitly to the nonsignaled state by the ResetEvent function. Any number
of waiting threads, or threads that subsequently begin wait operations for
the specified event object by calling one of the wait functions, can be
released while the object's state is signaled."

also

"Setting an event that is already set has no effect."

Reading all of that, AFAICT, any number of waiting threads, which includes
but is not limited to zero, __can__ get released when a SetEvent is called
on a manual reset event object.

Perhaps I am misunderstanding something. However, the wording of the rules
seems to "suggest" that calling SetEvent on a manual reset event object does
not necessarily guarantee that "all" of the fully established waiters will
be atomically released from the influenced of the event object _before_ the
function returns...

Perhaps MS doc is CRAP!

:^/

Any thoughts on this?

> I am trying to figure out, whether race on Event is problem or not.
> If Event's state flipped back to non-signaled, then it means that work-
> stealing deque is empty, so there is no reason to wake any threads...
> I.e if thread doesn't wake-up until Event's state flipped back to non-
> signaled, so let it go on sleeping :)

There is probably nothing wrong, however, using manual reset events can be
tricky to say the least... Especially when you take the documentation into
account...

>>> > version are you looking at?
>>
>>> Latest stable version: 20_20080408oss:
>>> http://www.threadingbuildingblocks.org/file.php?fid=77
>>
>> I wonder why they fail to check the return value from winapi calls.
>> IMVHO, that just bad practice for any commercial-grade software package
>> to follow.
>
>
> Totally agree.

I don't understand why they don't do that. Weird. I personally tend to have
a lot of respect for Intel's hard work. This makes me worry a bit...

;^(...

>>> > Anyway, IMVHO, their signaling and waiting look overly complex when
>>> > compared
>>> > to the simplicity of something like my algorithm. Their wait while
>>> pool > is
>>> > empty can be reduced to the following pseudo-code:
>>
>>
>>> Sometimes (when library shutdown begins) they set Gate's state to
>>> SNAPSHOT_PERMANENTLY_OPEN. It's the 'final' state, i.e. state can't
>>> switch to
>>> SNAPSHOT_EMPTY/SNAPSHOT_FULL anymore.
>>> Can this be modelled with eventcount?
>>
>> Well, my eventcounts state cannot be set to any user-defined values.
>> However, you should be able to add this into a user-defined predicate
>> without problems.
>
>
> Just associate additional bool variable SNAPSHOT_PERMANENTLY_OPEN with
> eventcount. And check it before signaling and waiting. Right?

Well, yeah. I don't see why it could not be make to work within the realm of
a user-defined predicate:

while (! try_to_act_upon_user_predicate()) {
eventcount::count_type const ekey = ecount.get();
if (try_to_act_upon_user_predicate()) { break; }
ecount.wait(ekey);
}

If the user predicate contains SNAPSHOT_PERMANENTLY_OPEN, well,
'try_to_act_upon_user_predicate()' function should always return true...
Humm, does that seem Kosher to you Dmitriy?