----------------------------------------
Also I've finally tested FlushProcessWriteBuffers() on dual-core with
following test:
unsigned __stdcall thread(void* p)
{
unsigned __int64 t1 = __rdtsc();
volatile __int64 data = 0;
unsigned __int64 const count = 1000000000;
for (unsigned __int64 i = 0; i != count; ++i)
{
data *= data;
}
unsigned __int64 t2 = __rdtsc();
printf("time=%u\n", (unsigned)((t2-t1)*1000/count));
return 0;
}
int main()
{
HANDLE t = (HANDLE)_beginthreadex(0, 0, thread, 0,
CREATE_SUSPENDED, 0);
SetThreadAffinityMask(t, 2);
SetThreadAffinityMask(GetCurrentThread(), 1);
ResumeThread(t);
unsigned __int64 tmin = 1000000000;
unsigned __int64 tmax = 0;
unsigned __int64 tsum = 0;
unsigned __int64 tcount = 0;
while (WAIT_TIMEOUT == WaitForSingleObject(t, 0))
{
#ifdef DO_FLUSH
unsigned __int64 t1 = __rdtsc();
FlushProcessWriteBuffers();
unsigned __int64 t2 = __rdtsc() - t1;
if (t2 < tmin)
tmin = t2;
if (t2 > tmax)
tmax = t2;
tsum += t2;
tcount += 1;
#endif
}
printf("min=%u, max=%u, mean=%d, count=%d\n", (unsigned)tmin,
(unsigned)tmax, tcount ? (unsigned)(tsum / tcount) : 0,
(unsigned)tcount);
}
Without DO_FLUSH I've get (from 3 runs):
time=15991
time=15825
time=15865
With DO_FLUSH:
time=28950
min=418, max=18783970, mean=1611, count=10725801
time=28663
min=418, max=31881715, mean=1599, count=10706885
time=29029
min=418, max=56535079, mean=1592, count=11036976
It's clearly seen than FlushProcessWriteBuffers() affects not only
current processor, but also all other processors. Average overhead on
host processor is 1600 cycles, which is probably not so many (however
it's question what time will be on quad-core). Average overhead on
remote processor is 1300 cycles (if my math is correct).
I hope than on quad-core processor OS will issue IPIs in parallel to
all processors. So the total overhead per FlushProcessWriteBuffers()
can be roughly estimated as 1500 cycles * number_of_processors. If
will take into account that single cache miss on a distributed system
may take 300-1000 cycles, then mentioned per-epoch overhead looks not
so serious.
--
Dmitriy V'jukov
Yes; AFAICT active synchronization epoch detection via
`FlushProcessWriteBuffers()' works very well with asymmetric
synchronization patterns in general. I was mainly interested in how
the slow-path (e.g., writer in asymmetric read-write lock) responds to
periods of load. IIRC the email had something to do with:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/abb3622071b0d52f
(a hack for passive sync epoch detection in Windows)
BTW, could you perhaps provide a patch for Relacy that has
`FlushProcessWriteBuffers()' in the near future? I would be very
interested in modeling all of my algorithms that depend on automatic
sync epoch detection.