Hello,
My Scalable RWLocks were updated to version 4.32
I have just used the Windows FlushProcessWriteBuffers() API in my new scalable LW_Fast_RWLockX and in my new scalable Fast_RWLockX
FlushProcessWriteBuffers() API does the following:
- Implicitly execute full memory barrier on all other processors.
- Generates an interprocessor interrupt (IPI) to all processors that
are part of the current process affinity.
- Uses IPI to "synchronously" signal all processors.
- It guarantees the visibility of write operations performed on one
processor to the other processors.
- Supported since Windows Vista and Windows Server 2008
If we investigate the cost of IPIs, when i do simple synthetic test on a Quad core machine I've obtained following numbers.
420 cycles is the minimum cost of the FlushProcessWriteBuffers()
function on issuing core.
1600 cycles is mean cost of the FlushProcessWriteBuffers() function on
issuing core.
1300 cycles is mean cost of the FlushProcessWriteBuffers() function on
remote core.
Note that, as far as I understand, the function issues IPI to remote
core, then remote core acks it with another IPI, issuing core waits for
ack IPI and then returns.
And the IPIs have indirect cost of flushing the processor pipeline.
My new scalable LW_Fast_RWLockX and in my scalable Fast_RWLockX
are starvation-free and fair, and if the write section and the read section are of the of same size , it will scale to 1333x, and with the same scenario my other scalable RWLocks will scale also to 1333x.
You can download my scalable RWLocks inventions that are powerful from:
https://sites.google.com/site/scalable68/scalable-rwlock
Thank you,
Amine Moulay Ramdane.