I'm using the Intel compiler for Windows/Linux (IA32). I have a
change-stamp model that indicates a "garbage collection" event. I
would like to have a simple C++ object (part of the GC singleton
class) that atomically increments a member variable in such a way that
all future unprotected (no mutex) fetches of the variables value see
the new value, even if immediately thereafter on dual-cpu.
Is this a simple assembly instruction? Can anyone give me pointers on
how to write such a C++ class for the Intel Compiler? At least, for
x86, what's the assembly instructions needed.
Thanks,
Andy
What are you trying to do? COW, PCOW, STM?
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
Assuming the fetch on the other CPU is atomic, then an "inc mem" with
a lock prefix will do the trick.
If you have several cores, terms future/further/previous are very
uncertain. You can't even define them in general case. What is your
definition?
Dmitriy V'jukov
I'm trying to figure out how to write the following class on Windows/
Linux.
class AtomicCounter {
long val;
// atomically increment val any way possible
void Increment ();
// return the val as quickly as possible in such a way
// that even if another thread on another CPU changed it
// just instantly before, the call to ValueOf() sees the change.
// I don't want to use a mutex in this function
// I don't want a smart compiler hoisting this function call!
long ValueOf ();
};
Is "inc mem" some kind of assembly instruction on x86? The member
data type is a 32-bit integer for example. Do I need an assembly
instruction for the fetch as well. What do you mean by a lock prefix?
Using an ordinary load? Such an instruction sequence does not
exist.
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
Yes, that is what you asked for, after all.
Anyway, x86 inc can increment a register or a memory location, and can
do 8, 16 and 32 bit operations (and 64 bit on x86-64). The lock
prefix causes the CPU to execute the instruction with the bus locked,
so that the memory update is atomic (in practice on modern CPUs, it
usually doesn't do a bus lock as such, but rather acquires exclusive
ownership of the cache line in question first, which has the same
result).
Your OS probably provides some utility functions that do the atomic
increment for you. InterlockedIncrement() in Windows, for example.
If the other reader then accesses the variable atomically, you're
guaranteed that it will see either the old value or the new value, and
not some intermediate value. On the x86 an aligned 32 bit load or
store will be atomic, so if that's what your compiler generates,
you're all set.
As was mentioned elsewhere in this thread, your use of the terms
"future" and "immediately" need some clarification. You can assume
that all stores by a single processor will be seen in the order issued
by the program running on that processor, when observed by all other
processors in the system.
1. --------------
# GCC asm-source file
# x86 Atomic Fetch-And-Add /w Fence MemBar
.GLOBL X86_FETCH_AND_ADD_MBFENCE
X86_FETCH_AND_ADD_MBFENCE:
MOVL 4(%ESP), %ECX
MOVL 8(%ESP), %EAX
LOCK XADDL %EAX, (%ECX)
RET
# please note that you may need to prefix function names
# with an underscore on certian platforms. (e.g., gcc on windows - mingw)
------------
2. ------------
/* c-include file */
extern "C" void* X86_FETCH_AND_ADD_MBFENCE(void* volatile*, void*);
------------
3. ------------
/* abstract */
#define AtomicFetchAndAdd_mbFence X86_FETCH_AND_ADD_MBFENCE
------------
assemble asm-source into object file, and link your c app against it...
P.S.
--------
; code for Intel syntax
X86_FETCH_AND_ADD_MBFENCE:
MOV ECX, [ESP + 4]
MOV EAX, [ESP + 8]
LOCK XADD [ECX], EAX
RET
If it didn't return it "instantly", how would you know?
If the answer is you couldn't know, than the quickest way to do it is
just do a plain load. Maybe making it volatile if that's needed
to get the compiler to not optimize it. It also needs to be atomic.
You'd have to check if your compiler handles long atomically.
If the anser is you could know, then you will need some
synchronization and/or memory barriers to make your
"knowing" usage pattern work correctly. The synchronization
will slow things down somewhat.
> Future means that after incrementing the count atomically, another cpu
> that executes a fetch against that count will immediately "see" the
> change.
This is ill-defined. In your example, there is nothing that enforces
ordering across CPUs, so whether another instruction executes after or
before is arbitrary.
Suppose one CPU does this:
1) Some stuff.
2) Increment.
3) Some other stuff.
And another CPU does this:
1) Some stuff.
2) Read the value.
3) Some other stuff.
How can you determine whether the read is "before" or "after" the
increment? The answer is you can't, so there's no way you can say that
one value is right and the other is wrong.
On the flip side, if you acquire a mutex, you *can* determine what
happens first and what happens next because it is guaranteed that one
thread will get the mutex first and one will get it next.
You cannot use terms like "immediately" or "after" meaningfully unless
something defines the sequencing, timing, or ordering.
DS