Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

About the Linux sys_membarrier() expedited and the windows FlushProcessWriteBuffers()..

1 view
Skip to first unread message

amin...@gmail.com

unread,
Dec 4, 2019, 4:08:31 PM12/4/19
to
Hello...


About the Linux sys_membarrier() expedited and the windows FlushProcessWriteBuffers()..

I have just read the following webpage:

https://lwn.net/Articles/636878/


And it is interesting and it says:

---

Results in liburcu:

Operations in 10s, 6 readers, 2 writers:

memory barriers in reader: 1701557485 reads, 3129842 writes
signal-based scheme: 9825306874 reads, 5386 writes
sys_membarrier expedited: 6637539697 reads, 852129 writes
sys_membarrier non-expedited: 7992076602 reads, 220 writes

---


Look at how "sys_membarrier expedited" is powerful.

So as you have noticed i have already implemented my following scalable scalable Asymmetric RWLocks that use the windows FlushProcessWriteBuffers(), they are called Fast_RWLockX and LW_Fast_RWLockX and they are limited to 400 threads but you can manually extended the maximum number of threads by setting the NbrThreads parameter of the constructor, and you have to start once and for all your threads and work with all your threads, don't start every time a thread and exit from the thread. Fast_RWLockX and LW_Fast_RWLockX don't use any atomic operations and/or StoreLoad style memory barriers on the reader side, so they are scalable and very fast, and i will soon port them to Linux and they will support both sys_membarrier expedited and sys_membarrier non-expedited.

You can download my inventions of scalable Asymmetric RWLocks that use
IPIs and that are costless on the reader side from here:

https://sites.google.com/site/scalable68/scalable-rwlock

Cache-coherency protocols do not use IPIs, and as a user-space level developer you do not care about IPIs at all. One is most interested in the cost of cache-coherency itself. However, Win32 API provides a function that issues IPIs to all processors (in the affinity mask of the current process) FlushProcessWriteBuffers(). You can use it to investigate the cost of IPIs.

When i do simple synthetic test on a dual core machine I've obtained following numbers.

420 cycles is the minimum cost of the FlushProcessWriteBuffers() function on issuing core.

1600 cycles is mean cost of the FlushProcessWriteBuffers() function on issuing core.

1300 cycles is mean cost of the FlushProcessWriteBuffers() function on remote core.

Note that, as far as I understand, the function issues IPI to remote core, then remote core acks it with another IPI, issuing core waits for ack IPI and then returns.

And the IPIs have indirect cost of flushing the processor pipeline.



Thank you,
Amine Moulay Ramdane.

amin...@gmail.com

unread,
Dec 10, 2019, 4:30:18 PM12/10/19
to
Hello,
0 new messages