Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
[thread] boost::call_once() optimization idea on Win32
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Thomas Jarosch  
View profile  
 More options Nov 17 2010, 11:15 am
From: Thomas Jarosch <thomas.jaro...@intra2net.com>
Date: Wed, 17 Nov 2010 17:15:38 +0100
Local: Wed, Nov 17 2010 11:15 am
Subject: [Boost-users] [thread] boost::call_once() optimization idea on Win32

Hello,

here's a small optimization idea for boost::call_once() on Win32.
The code currently looks like this:
----------------------------------------------
template<typename Function>
void call_once(once_flag& flag,Function f)
{
    // Try for a quick win: if the procedure has already been called
    // just skip through:
    long const function_complete_flag_value=0xc15730e2;

    if(::boost::detail::interlocked_read_acquire(&flag)!=function_complete_flag _value)
    {
        void* const mutex_handle(::boost::detail::create_once_mutex(&flag));
        BOOST_ASSERT(mutex_handle);
        detail::win32::handle_manager const closer(mutex_handle);
        detail::win32_mutex_scoped_lock const lock(mutex_handle);

        if(flag!=function_complete_flag_value)
        {
            f();
            BOOST_INTERLOCKED_EXCHANGE(&flag,function_complete_flag_value);
        }
    }

}

----------------------------------------------

The initial flag value of zero (=BOOST_ONCE_INIT)
is set at load time via static initialization.

It should be safe to do a non-interlocked read of the flag like this:
----------------------------------------------
if(flag!=function_complete_flag_value &&
  ::boost::detail::interlocked_read_acquire(&flag)!=function_complete_flag_va lue)
----------------------------------------------

If our non-interlocked read would see something different than
"function_complete_flag_value", we still do the interlocked read, too.
For normal operation inside a thread safe singleton,
this saves us from always issuing a memory barrier.

Any flaws with this approach?

Also one more (silly?) question: "flag" is not a volatile variable.
Does boost::detail::interlocked_read_acquire() make sure the value
doesn't get cached inside a register? IMHO we lock the mutex
and we might still hold a cached, old value of "flag" in a register.
-> Do we need an interlocked read here, too?
   Or mark the flag type "volatile"?

(http://msdn.microsoft.com/en-us/magazine/cc163405.aspx#S3
 look at figure 7 and the text below)

Best regards,
Thomas Jarosch

  boost-call-once-optimize-read.patch
< 1K Download

_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gottlob Frege  
View profile  
 More options Nov 17 2010, 11:13 pm
From: Gottlob Frege <gottlobfr...@gmail.com>
Date: Wed, 17 Nov 2010 23:13:03 -0500
Local: Wed, Nov 17 2010 11:13 pm
Subject: Re: [Boost-users] [thread] boost::call_once() optimization idea on Win32
On Wed, Nov 17, 2010 at 11:15 AM, Thomas Jarosch

Yes.

Your thread will see flag correctly, but the "important data" being
protected/initted by the once, might not bee seen correctly.

Typically the code essentially boils down to (if you imagine the
call_once being "inlned", since to the CPU, it is all "inline"):

First Thread in ("wins", thus does the init)
   important_data = new ...;  // the init call, protected by call_once()
   flag = function_complete_flag_value;

Second Thread:
   if (flag == function_complete_flag_value)  // inside call_once()
       use_important_data(important_data);    // outside call_once()

The CPU (not just the compiler, but the CPU, memory subsystem, etc)
can reorder that as:

First Thread:
   flag = function_complete_flag_value;
   important_data = new ...;

Second Thread:
   register temp = important_data;  // start memory read early, so
that we don't wait for reads to complete
   if (flag == function_complete_flag_value)
       use_important_data(temp);

See the problem?  important_data may be read before it is ready.

You need a memory barrier *in each thread* to ensure neither
reordering happens.  In particular, the memory barrier in the
interlocked_read_acquire(&flag) prevents the reordering in the Second
Thread.

> Also one more (silly?) question: "flag" is not a volatile variable.
> Does boost::detail::interlocked_read_acquire() make sure the value
> doesn't get cached inside a register? IMHO we lock the mutex
> and we might still hold a cached, old value of "flag" in a register.
> -> Do we need an interlocked read here, too?
>   Or mark the flag type "volatile"?

volatile is almost useless is threaded programming.  It is typically
both insufficient for threads (ie no memory barrier) and at the same
time superfluous - when inside a mutex - as the mutex handles the
barrier for you.

volatile only helps with the compiler - it doesn't control what the
CPU might then do to your instructions (like reorder them, do
speculative execution, etc), so it doesn't help much with threads
running on separate CPUs.  And MS's version of volatile (which _does_
enforce memory barriers) is non-standard.  So don't use it in portable
code.  ie don't use it at all.

> (http://msdn.microsoft.com/en-us/magazine/cc163405.aspx#S3
>  look at figure 7 and the text below)

> Best regards,
> Thomas Jarosch

Tony

P.S. I will hopefully be doing another talk on this stuff at BoostCon
in May - you should go!
_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Jarosch  
View profile  
 More options Nov 18 2010, 5:50 am
From: Thomas Jarosch <thomas.jaro...@intra2net.com>
Date: Thu, 18 Nov 2010 11:50:55 +0100
Local: Thurs, Nov 18 2010 5:50 am
Subject: Re: [Boost-users] [thread] boost::call_once() optimization idea on Win32
Hello Tony,

On Thursday, 18. November 2010 05:13:03 Gottlob Frege wrote:

> The CPU (not just the compiler, but the CPU, memory subsystem, etc)
> can reorder that as:

> First Thread:
>    flag = function_complete_flag_value;
>    important_data = new ...;

> Second Thread:
>    register temp = important_data;  // start memory read early, so
> that we don't wait for reads to complete
>    if (flag == function_complete_flag_value)
>        use_important_data(temp);

> See the problem?  important_data may be read before it is ready.

Thanks for the detailed explanation! As you pointed out, the second thread
will mess up without the barrier. Time to fix my code :)

One more small thing: IMHO the write reordering in the first
thread can't happen because of the memory barrier created
by interlockedExchange(). The first thread translates to this:

    important_data = new ...;  // the init call

    lock                       // create memory barrier:
                               // - Don't allow reordering
                               //   from below the barrier
                                                           // - Finish all outstanding writes

    flag = function_complete_flag_value;

> > Also one more (silly?) question: "flag" is not a volatile variable.
> > Does boost::detail::interlocked_read_acquire() make sure the value
> > doesn't get cached inside a register? IMHO we lock the mutex
> > and we might still hold a cached, old value of "flag" in a register.
> > -> Do we need an interlocked read here, too?
> >   Or mark the flag type "volatile"?

> volatile is almost useless is threaded programming.  It is typically
> both insufficient for threads (ie no memory barrier) and at the same
> time superfluous - when inside a mutex - as the mutex handles the
> barrier for you.

True that.

> volatile only helps with the compiler - it doesn't control what the
> CPU might then do to your instructions (like reorder them, do
> speculative execution, etc), so it doesn't help much with threads
> running on separate CPUs.  And MS's version of volatile (which _does_
> enforce memory barriers) is non-standard.  So don't use it in portable
> code.  ie don't use it at all.

Let me illustrate it a bit more:

- register temp = interlocked_read(&flag)     // fetch from mem location xyz
  -> Init flag not set, so execute init code:

  - create mutex (also creates memory barrier)

  - Another thread 2 already entered the mutex,
    executes the init code and does an interlocked
    write of "flag". Then it leaves the mutex
    so thread 1 can continue.

  - Thread 1 re-reads the flag without interlocked read or volatile:
    The compiler recognizes it's the same memory location xyz
    and uses the cached value from "register temp".
    So we would redo the initialization.

    Don't we need an interlocked read here, too?

    Or does the mutex/memory barrier ensure the compiler
    isn't allowed to do register caching?

> P.S. I will hopefully be doing another talk on this stuff at BoostCon
> in May - you should go!

Nice! Too bad "www.boostcon.com" currently issues
a "500 - Internal Server Error" :o)

Best regards,
Thomas Jarosch
_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Jarosch  
View profile  
 More options Nov 18 2010, 6:34 am
From: Thomas Jarosch <thomas.jaro...@intra2net.com>
Date: Thu, 18 Nov 2010 12:34:20 +0100
Local: Thurs, Nov 18 2010 6:34 am
Subject: Re: [Boost-users] [thread] boost::call_once() optimization idea on Win32
On Thursday, 18. November 2010 11:50:55 you wrote:

>     Or does the mutex/memory barrier ensure the compiler
>     isn't allowed to do register caching?

Ok, quickly checked this with gcc 4.4.4 on linux:
-test.c------------------
int *foobar = 0x1234;
int a;

int main(void)
{
  __sync_synchronize();
  a = *foobar;
  __sync_synchronize();
  return *foobar;

}

-------------------------

gcc -O4 -S test.c:
main:
.LFB0:
        .cfi_startproc
        mfence
        movq    foobar(%rip), %rax
        movl    (%rax), %eax
        movl    %eax, a(%rip)
        mfence
        movq    foobar(%rip), %rax
        movl    (%rax), %eax
        ret
        .cfi_endproc

-> The memory barrier invalidates the register cache.

Without the second memory barrier in the C source,
the compiler uses the register cached value of *foobar.

Thanks for your help on this again, Tony.

Cheers,
Thomas
_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Abrahams  
View profile  
 More options Nov 18 2010, 12:40 pm
From: Dave Abrahams <d...@boostpro.com>
Date: Thu, 18 Nov 2010 12:40:38 -0500
Local: Thurs, Nov 18 2010 12:40 pm
Subject: Re: [Boost-users] [thread] boost::call_once() optimization idea on Win32

On Wed, Nov 17, 2010 at 11:13 PM, Gottlob Frege <gottlobfr...@gmail.com> wrote:
>> Any flaws with this approach?

> Yes.

> Your thread will see flag correctly, but the "important data" being
> protected/initted by the once, might not bee seen correctly.

<snip nice long explanation>

Another way to say it is that you are essentially trying to do the
classic "double-checked locking" (DCL) pattern, which is well-known
not to work and you can find plenty of explanations for.   In fact,
the BOOST_THREAD_ONCE* stuff is there to solve exactly that problem.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gottlob Frege  
View profile  
 More options Nov 19 2010, 12:33 am
From: Gottlob Frege <gottlobfr...@gmail.com>
Date: Fri, 19 Nov 2010 00:33:32 -0500
Local: Fri, Nov 19 2010 12:33 am
Subject: Re: [Boost-users] [thread] boost::call_once() optimization idea on Win32
On Thu, Nov 18, 2010 at 5:50 AM, Thomas Jarosch

<thomas.jaro...@intra2net.com> wrote:
> Hello Tony,

> Thanks for the detailed explanation! As you pointed out, the second thread
> will mess up without the barrier. Time to fix my code :)

> One more small thing: IMHO the write reordering in the first
> thread can't happen because of the memory barrier created
> by interlockedExchange(). The first thread translates to this:

Yes, of course.  Sorry, I removed that interlock as well in my
example, since that question also often comes up, so I was basically
just giving my standard explanation.  I should have been more clear.

> Let me illustrate it a bit more:

..
>    Don't we need an interlocked read here, too?

>    Or does the mutex/memory barrier ensure the compiler
>    isn't allowed to do register caching?

Technically, since C++ still doesn't know anything about threads,
there could be some register caching.  But there isn't, because the
compiler writers and thread packages work this all out for us, using
whatever non-standard calls or extensions that are required.

There are some papers out there such as "Threads cannot be done in a
library" by Boehm
http://www.stanford.edu/class/cs240/readings/p261-boehm.pdf going into
more detail about this. In particular, exotic cases such as:

if (some_condition)
    lock(mutex);

for (i = 0; i < N; i++)
{
   do_stuff();
   if (some_condition)
       x++;

}

if (some_condition)
    unlock(mutex);

So the programmer's intent is that x is only updated sometimes, and
that happens to be the only time that the mutex is needed.
This is a case that, without being careful, compilers can optimize
incorrectly.  C++0x will take care of these cases.

But beyond some exotic cases with aggresive optimizations, normal
mutexes with non-volatiles just works.  And I don't think volatiles
fixes the cases listed in the paper.

> Nice! Too bad "www.boostcon.com" currently issues
> a "500 - Internal Server Error" :o)

!
_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Abrahams  
View profile  
 More options Nov 19 2010, 12:45 pm
From: Dave Abrahams <d...@boostpro.com>
Date: Fri, 19 Nov 2010 12:45:14 -0500
Local: Fri, Nov 19 2010 12:45 pm
Subject: Re: [Boost-users] [thread] boost::call_once() optimization idea on Win32
On Thu, Nov 18, 2010 at 5:50 AM, Thomas Jarosch

<thomas.jaro...@intra2net.com> wrote:
> Nice! Too bad "www.boostcon.com" currently issues
> a "500 - Internal Server Error" :o)

Fixed, thanks.  Next time please make a louder stink on this list,
though (like in a subject line); I could have easily missed this.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »