Here's a toy example
#include <windows.h>
#include <process.h>
#include <iostream>
HANDLE event;
unsigned __stdcall thread1(void*)
{
std::cout << "this must appear first\n";
SetEvent(event);
// more thread work
return 0;
}
unsigned __stdcall thread2(void*)
{
WaitForSingleObject(event, INFINITE);
std::cout << "this must appear second\n";
// more thread work
return 0;
}
int main()
{
event = CreateEvent(0, TRUE, FALSE, 0);
HANDLE threads[2];
threads[1] = (HANDLE)_beginthreadex(0, 0, thread2, 0, 0, 0);
threads[0] = (HANDLE)_beginthreadex(0, 0, thread1, 0, 0, 0);
WaitForMultipleObjects(2, threads, TRUE, INFINITE);
}
The two prints to standard out must appear in the order indicated, and using
an event object does this nicely.
Here's my questionable alternative
#include <windows.h>
#include <process.h>
#include <iostream>
volatile LONG event = 0;
unsigned __stdcall thread1(void*)
{
std::cout << "this must appear first\n";
event = 1;
return 0;
}
unsigned __stdcall thread2(void*)
{
while (event == 0)
Sleep(0);
std::cout << "this must appear second\n";
return 0;
}
int main()
{
HANDLE threads[2];
threads[1] = (HANDLE)_beginthreadex(0, 0, thread2, 0, 0, 0);
threads[0] = (HANDLE)_beginthreadex(0, 0, thread1, 0, 0, 0);
WaitForMultipleObjects(2, threads, TRUE, INFINITE);
}
Is this OK, simply using a global variable and a while loop? I've read that
all read and writes of suitably aligned DWORDs are atomic on the Win32
platform so I think that makes the code OK, but I'm not really sure. I'm not
too concerned about the busy wait loop because I know that in practice the
first thread will always get to the point where it can let the second thread
proceed very quickly.
Any advice appreciated.
john
my question would be why you would want to do the second? In the first
version the second thread waits the appropriate ammount of time and
wastes little cpu whilst doing so. The second version offers no
advantage whatsoever. You can guarantee the second version is atomic by
using the Interlocked* API's.
Partly I just want to learn what the possibilities are. One issue for me is
whether the second version is atomic already without using the Interlocked
API. I read that all read and writes of suitable aligned DWORDs are atomic,
so doesn't that mean I don't need the Interlocked API in this case? That's
the part I'm really uncertain of.
Presumably there is an overhead in creating an event object, and there is
also a overhead in going into a CPU intensive loop. Which overhead is
greater would seem to depend on circumstances.
john
There is nothing wrong with the way you had it (aside from, as
you noted, burning cycles). One of my pet peeves is throwing
around Interlocked or other synchronization mechanisms just
because you _think_ you _might_ need it. It'll make life much
easier in the future if you figure out right now exactly what are
the implications of multithreaded situations as above.
In your particular case, you don't need the synchronization even
if the variables _aren't_ properly aligned. Think about what
could possibly happen. The "event" variable must be 0 at program
startup. One single thread reads the value. One single thread
writes the value 1. Even with unaligned values, the single byte
of the value that contains the set bit _must_ be written in a
single cycle. Therefore your "second" thread is guaranteed to see
a 0 before the "first" thread does it's thing, and to see a 1
afterward. There is no way to get a race condition, or corrupt
data, or any other problem.
Unless you're dealing with values other than 0 and 1
(specifically, values where the set bits will span more than a
single byte), or possibly are using a very peculiar
architecture/compiler (since we're on c.o.m.p.win32 we can
neglect the DS9000 and such), simply reading OR writing even an
unaligned DWORD value in any one thread, as you are doing, is
guaranteed to be atomic.
More generally, and as you suggested, reading OR writing an
aligned DWORD of any value is also guaranteed atomic and the
Interlocked functions are a waste of typing.
Cris
Thanks a lot. At the moment multi threading is very scary. Your response put
my mind at rest on this topic at least, but there are plenty more issues
I've got to get to grips with!
john
[snip]
>Unless you're dealing with values other than 0 and 1 (specifically,
>values where the set bits will span more than a single byte), or
>possibly are using a very peculiar architecture/compiler (since we're
>on c.o.m.p.win32 we can neglect the DS9000 and such), simply reading OR
>writing even an unaligned DWORD value in any one thread, as you are
>doing, is guaranteed to be atomic.
Go and slap yourself around, to save me the bother. :) What about the
situation where there are two or more processors? The first processor,
doing an unaligned write for thread 1, will write the low-addressed
part, then issue another memory cycle for the high-addressed part. The
second processor, doing the unaligned read of the same location in
thread 2, can interrupt the first in between its two memory cycles, and
might read inconsistent data. The only thing that is guaranteed atomic
without assistance[1] is a single byte.
[1] Aligning the data to the correct boundary is a form of assistance.
Also, if the unaligned data crosses a page boundary, the second page
might be marked "not present", and that will cause non-atomic references
to the data even on a uniprocessor system.
The moral of this sorry tale? Use aligned data, and make flag variables
as small as possible.
--
SteveR
(throw away the dustbin, send to stever@... instead)
Humans are way too stupid to be dumb animals.
http://www.accidentalcreditor.org.uk/
I'll resist the urge to tell you to take your own advice - about
the slapping bit, I mean - because upon second reading, perhaps I
didn't make myself entirely clear.
Okay, the read or write of unaligned data is not, strictly
speaking, atomic. But you also neglected my caveat, that we're
dealing with flags with values only of zero or one. Though
writing a one into an unaligned memory location is not,
strictly-speaking, atomic, the result _cannot_ be inconsistent
data, regardless of where multiple processors may interleave
their memory accesses, since all the bytes except one are simply
writing back in the same value that is already there, namely
zero. In other words, we're effectively dealing with a single
byte here, access to which, as you note, is guaranteed atomic
without any help whatsoever.
Two points to try to conclude this thread, since the OP seems to
have dropped out:
1) While use of the "zero-or-one-only" value in the above
analysis may be asking for trouble, it was really only to
illustrate to the OP what the issue was (possible inconsistent
data) and while you are correct that accessing unaligned data
can, in the general case, lead to inconsistent or invalid values,
in the case of flags with values of only zero or one, it cannot.
2) I'll concede that one should indeed generally use aligned
data, relying on my above explanation would be in very poor
practice, and I would be sorely tempted myself to slap around
anyone I caught trying it =)
Cris