Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

About FlushProcessWriteBuffers() and IPIs and C++..

13 views
Skip to first unread message

Horizon68

unread,
Feb 9, 2019, 12:58:11 PM2/9/19
to
Hello...


About FlushProcessWriteBuffers() and IPIs and C++..

It seems that the implementation of the sys_membarrier on Linux 4.3 is
too slow. Starting with kernel 4.14, there is a new flag
MEMBARRIER_CMD_PRIVATE_EXPEDITED that enables much faster implementation
of the syscall using IPI.

See
https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/

for some details.

And read the following about Userspace RCU, it is also using IPIs:

membarrier system call performance and the future of Userspace RCU on Linux

Read more here:

https://lttng.org/blog/2018/01/15/membarrier-system-call-performance-and-userspace-rcu/


Cache-coherency protocols do not use IPIs, and as a user-space level
developer you do not care about IPIs at all. One is most interested in
the cost of cache-coherency itself. However, Win32 API provides a
function that issues IPIs to all processors (in the affinity mask of the
current process) FlushProcessWriteBuffers(). You can use it to
investigate the cost of IPIs.

When i do simple synthetic test on a dual core machine I've obtained
following numbers.

420 cycles is the minimum cost of the FlushProcessWriteBuffers()
function on issuing core.

1600 cycles is mean cost of the FlushProcessWriteBuffers() function on
issuing core.

1300 cycles is mean cost of the FlushProcessWriteBuffers() function on
remote core.

Note that, as far as I understand, the function issues IPI to remote
core, then remote core acks it with another IPI, issuing core waits for
ack IPI and then returns.

And the IPIs have indirect cost of flushing the processor pipeline.


My C++ synchronization objects library was updated,
and now i have invented and added the scalable Fast_RWLock and
the scalable Fast_RWLockX and they are better than scalable Asymmetric
RWLocks that use IPIs, and they are costless on the reader side and
they don't use IPIs on the writer side and they are starvation-free, so
they are really powerful, and they are now working with Windows and with
Linux, i have tested thoroughly my C++ synchronization objects library
and i think it is much more stable and fast.

You can read about it and download it from my website here:

https://sites.google.com/site/scalable68/c-synchronization-objects-library

The source code is inside my zip files here(they are called
Fast_RWLockX.pas and LW_Fast_RWLockX.pas):

https://sites.google.com/site/scalable68/scalable-rwlock



Thank you,
Amine Moulay Ramdane.
0 new messages