There is a compiler intrinsic function of _InterlockedCompareExchange64 (see http://msdn2.microsoft.com/en-us/library/ttk2z1ws(VS.80).aspx ), but this intrinsic seems to be available only for more recent versions of VC++ (such as VC++ 2005), and it's not available as an intrinsic for VC++ 6.0.
So, I think that I need an in-line assembly function which, like the 2005 compiler intrinsic, uses the cmpxchg8b instruction. In-line assembly, however, is beyond my capability.
I found user-mode code that uses in-line assembly and the cmpxchg8b instruction at this thread in microsoft.public.win32.programmer.kernel (see the next-to-last post): http://groups.google.com/group/microsoft.public.win32.programmer.kern... . The code doesn't work for me. Here's what I mean by "doesn't work": I implemented the function in a singly-threaded environment, simply to test whether the function properly stores the correct double value at the destination location. As expected (since we're in a singly-threaded environment), the function's return value indicates that the current value of the destination agrees with the comparand's value, and that the exchange was successful. However, the value that actually gets stored at the destination address is bizarrely corrupted, and does not agree with the exchange value.
Can anyone offer any insights (other than a suggestion to upgrade to VC++ 2005)? Maybe someone knows of the assembly code used by the current compiler's intrinsic?
Michael K. O'Neill wrote: > I am trying to implement an atomic compare-and-exchange of a "double" type, > under VC++ 6.0 for WinXP.
> The Windows API function of InterlockedCompareExchange64 is not available > for WinXP (it's only available for Vista, see > http://msdn.microsoft.com/library/en-us/dllproc/base/interlockedcompa... ) > . ... > Can anyone offer any insights (other than a suggestion to upgrade to VC++ > 2005)? Maybe someone knows of the assembly code used by the current > compiler's intrinsic?
Thank you for your response. Using it, I was able to find the following code, which works perfectly for me. I'm using it instead of your code, for the perhaps unprincipled reason that it seems to involve fewer instructions, and I need the code in an inner-most loop. The code is slightly adapted, but the original was obtained from http://www.audiomulch.com/~rossb/code/lockfree/ATOMIC.H
> Thank you for your response. Using it, I was able to find the following > code, which works perfectly for me. I'm using it instead of your code, for > the perhaps unprincipled reason that it seems to involve fewer instructions, > and I need the code in an inner-most loop. The code is slightly adapted, > but the original was obtained from > http://www.audiomulch.com/~rossb/code/lockfree/ATOMIC.H
I use the IBM style cas because it's easier to program with. You should verify that you are actually executing fewer instructions. x86 doesn't exactly have a lot of registers for the compiler to work with. With the Microsoft style case, you need to make copy of the returned value, compare it the comparand value, and copy it to the comparand value if a retry is needed. So an extra compare and extra memory store/load instructions possibly. Plus extra overhead for the parameter list since you're handling 64 bit values rather than 32 bit references.
-- Joe Seigh
When you get lemons, you make lemonade. When you get hardware, you make software.
> Thank you for your response. Using it, I was able to find the following > code, which works perfectly for me. I'm using it instead of your code, > for > the perhaps unprincipled reason that it seems to involve fewer > instructions, > and I need the code in an inner-most loop. The code is slightly adapted, > but the original was obtained from > http://www.audiomulch.com/~rossb/code/lockfree/ATOMIC.H
Humm, they mention atomic reference-counted pointers in there. However, after taking a brief look, I realized that they are not atomic; they are only as safe as boost::shared_ptr... Also, there is no support for "fine-grained" memory barriers.
> > Thank you for your response. Using it, I was able to find the following > > code, which works perfectly for me. I'm using it instead of your code, > > for > > the perhaps unprincipled reason that it seems to involve fewer > > instructions, > > and I need the code in an inner-most loop. The code is slightly adapted, > > but the original was obtained from > > http://www.audiomulch.com/~rossb/code/lockfree/ATOMIC.H
> Humm, they mention atomic reference-counted pointers in there. However, > after taking a brief look, I realized that they are not atomic; they are > only as safe as boost::shared_ptr... Also, there is no support for > "fine-grained" memory barriers.
All that might be true, and the atomic.h file indeed seems to be directed to templated classes that are claimed to provide atomic reference-counted pointers based on boost::shared_ptr.
However, I was uninterested in any of that.
All I wanted was an in-line assembly function that used a "locked" cmpxchg8b instruction to perform an atomic 64-bit compare-and-exchange. I think I got that, from code that appears near the top fifth of the page, as indicated above.
> > Thank you for your response. Using it, I was able to find the following > > code, which works perfectly for me. I'm using it instead of your code, for > > the perhaps unprincipled reason that it seems to involve fewer instructions, > > and I need the code in an inner-most loop. The code is slightly adapted, > > but the original was obtained from > > http://www.audiomulch.com/~rossb/code/lockfree/ATOMIC.H
> I use the IBM style cas because it's easier to program with. You should > verify that you are actually executing fewer instructions. x86 doesn't > exactly have a lot of registers for the compiler to work with. With > the Microsoft style case, you need to make copy of the returned value, > compare it the comparand value, and copy it to the comparand value if > a retry is needed. So an extra compare and extra memory store/load > instructions possibly. Plus extra overhead for the parameter list since > you're handling 64 bit values rather than 32 bit references.
> -- > Joe Seigh
> When you get lemons, you make lemonade. > When you get hardware, you make software.
You might be correct, and in all candor, I am simply unequipped to determine if fewer instructions are in fact executed.