The question is related to a multiprocessor machine.
Lets say I have couple class member variables.
Is it enough to protect access to those variables with
Enter/LeaveCriticalSection so I can safely use C/C++
operators inside?
Or I have to use Interlocked* functions instead of C/C++
operators anyways to guaranty this will work property
on multiprocessor machine?
In other words can LeaveCriticalSecrtion be considered as
a memory barrier?
Vladimir.
That said, I don't think this is a problem, because the actual updating
of memory on one CPU (at least if the memory item is properly aligned,
e.g., a 4-byte ULONG on a 4-byte boundary updated by a 4-byte operation)
should cause other CPUs to invalidate their caches for that piece of
memory. See "Intel Architecture Software Developer's Manual," vol. 3,
1997, page 9-4, specifically in reference to "snooping."
Obviously, the compiler has to generate code to make proper use of the
x86-architecture cache, else all may be for naught. For example, it
would not do to fetch the value of variable X before entering the
critical section, since cache invalidation applies only at fetch and not
at some later arbitrary point. But in my test, the compiler behaved
correctly:
21: ULONG X = 0;
00401028 mov dword ptr [ebp-4],0
22: CRITICAL_SECTION critX;
23:
24: InitializeCriticalSection(&critX);
0040102F mov esi,esp
00401031 lea eax,[ebp-1Ch]
00401034 push eax
00401035 call dword ptr [__imp__InitializeCriticalSection@4
(0042a15c)]
0040103B cmp esi,esp
0040103D call __chkesp (004010a0)
25: EnterCriticalSection(&critX);
00401042 mov esi,esp
00401044 lea ecx,[ebp-1Ch]
00401047 push ecx
00401048 call dword ptr [__imp__EnterCriticalSection@4
(0042a158)]
0040104E cmp esi,esp
00401050 call __chkesp (004010a0)
26: X++;
00401055 mov edx,dword ptr [ebp-4]
00401058 add edx,1
0040105B mov dword ptr [ebp-4],edx
27: LeaveCriticalSection(&critX);
0040105E mov esi,esp
00401060 lea eax,[ebp-1Ch]
00401063 push eax
00401064 call dword ptr [__imp__LeaveCriticalSection@4
(0042a154)]
0040106A cmp esi,esp
0040106C call __chkesp (004010a0)
So only one instance of the program can be accessing X at any time -- by
software protocol --, and the CPU architecture ensures -- by snooping or
some other mechanism -- that the program sees the latest value of X.
Vladimir Petter wrote:
> Intel documentation (24547109.pdf and 24547209.pdf) both are saying
> that LOCK signal guaranties thatbus will be locked during prefixed
> operation, but it does not promise flashing processor's cash.Am I
> missing something?
--
If replying by e-mail, please remove "nospam." from the address.
James Antognini
Windows DDK MVP
Thanks for response.
> That said, I don't think this is a problem, because the actual updating
> of memory on one CPU (at least if the memory item is properly aligned,
> e.g., a 4-byte ULONG on a 4-byte boundary updated by a 4-byte operation)
> should cause other CPUs to invalidate their caches for that piece of
> memory. See "Intel Architecture Software Developer's Manual," vol. 3,
> 1997, page 9-4, specifically in reference to "snooping."
Yes! That was the missing part! I felt that I am missing something.
Would it be correct to say that inside critical section it would NOT be a
problem to manipulate on a non properly aligned data (cause critical section
will make this operation atomic)?
Thanks,
Vladimir.
Now I admit I could be wrong on points like this, but I hew to the policy of
following practices like proper alignment to minimize the chance that my code,
running on an oddball CPU or some future super-duper CPU, might fail. And,
remember, when and if it does fail in that way, it's going to be very difficult
to diagnose the problem and near-impossible to reproduce it.
Vladimir Petter wrote:
> Would it be correct to say that inside critical section it would NOT be a
> problem to manipulate on a non properly aligned data (cause critical section
> will make this operation atomic)?
--
[...]
> Is it enough to protect access to those variables with
> Enter/LeaveCriticalSection so I can safely use C/C++
> operators inside?
Yes.
> Or I have to use Interlocked* functions instead of C/C++
> operators anyways to guaranty this will work property
> on multiprocessor machine?
No.
> In other words can LeaveCriticalSecrtion be considered as
> a memory barrier?
Yes.
The locked instructions do not flush the cache. But they force strong
ordering. For example:
[begin quote 'IA-32 Software Dev. Manual', Vol. 3, Section 7.1.2.2]
Locked operations are atomic with respect to all other memory operations and
all externally visible events. Only instruction fetch and page table
accesses can pass locked instructions. Locked instructions can be used to
synchronize data written by one processor and read by another processor.
For the P6 family processors, locked operations serialize all outstanding
load and store operations (that is, wait for them to complete). This rule is
also true for the Pentium 4 and Intel Xeon processors, with one exception:
load operations that reference weakly ordered memory types (such as the WC
memory type) may not be serialized.
[end quote 'IA-32 Software Dev. Manual', Vol. 3, Section 7.1.2.2]
On architectures other than IA-32, MS guarantees that critical sections will
be memory barriers.
S
"Slava M. Usov" wrote:
> > Is it enough to protect access to those variables with
> > Enter/LeaveCriticalSection so I can safely use C/C++
> > operators inside?
>
> Yes.
>
> > Or I have to use Interlocked* functions instead of C/C++
> > operators anyways to guaranty this will work property
> > on multiprocessor machine?
>
> No.
>
> > In other words can LeaveCriticalSecrtion be considered as
> > a memory barrier?
>
> Yes.
>
> The locked instructions do not flush the cache. But they force strong
> ordering.
--
I might be a way off here, but AFAIK the hardware does not guarantee
automicity. On a real hardware operations like that are merely sequentially
consistent. And thus strong ordering is important.
-Kirk
[...]
> The machine architecture ensures that fetch and store (on proper
> boundaries at least) will work atomically, as observed by other CPUs at
> the instant of fetch or store.
You're correct saying "atomically", but "at the instant of fetch or store"
is somewhat lacking. The fetches and stores do not propagate as soon as they
happen, and as reads can be out of order, it can potentially happen that a
CPU modifies a few words guarded, then another CPU reads the same words and
receives, say, stale data for some and new for the others. Even though each
word gets updated atomically. To prevent this from happening, the second CPU
will have to ensure that before it gets into the CS, no reads of the guarded
data have been attempted. And the lock prefix ensures just that.
S
> You're correct saying "atomically", but "at the instant of fetch or store"
> is somewhat lacking. The fetches and stores do not propagate as soon as
they
> happen, and as reads can be out of order, it can potentially happen that a
> CPU modifies a few words guarded, then another CPU reads the same words
and
> receives, say, stale data for some and new for the others. Even though
each
> word gets updated atomically. To prevent this from happening, the second
CPU
> will have to ensure that before it gets into the CS, no reads of the
guarded
> data have been attempted. And the lock prefix ensures just that.
Does that mean that you would recoment to use Interlocked* functions even
inside critical section?
Vladimir.
> Does that mean that you would recoment to use Interlocked* functions even
> inside critical section?
Of course not! I thought I would clarify the issue but I only caused more
confusion. Sigh.
One locked instruction will prevent the CPU from speculative reads. That one
locked instruction happens in the CS entry sequence. That guarantees that
after a CS has been acquired by CPU A, all the writes that had been done by
any other CPU before it released the CS will be visible at CPU A. So it is a
memory barrier.
S
Vladimir.
Isn't that the definition for sequential consistency and not automicity?
-Kirk
> Isn't that the definition for sequential consistency and not automicity?
It is. I was stressing the point that atomicity had nothing to do with that,
except making things a bit easier. I guess that message of mine should be
just ignored, because everybody seems to misunderstand it. My bad.
S
James Antognini wrote:
> Strong ordering is,
> too, part of the architecture, but I don't see it having a role: There is no way
> strong ordering or the lack of it could make a difference, given the critical
> section and the atomic-operation guarantee.
--