Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

About the store buffer and memory visibility..

1 view

Skip to first unread message

amin...@gmail.com

unread,

Aug 14, 2019, 9:34:03 PM8/14/19

Hello,

About the store buffer and memory visibility..

I wrote before the following:

======================================================================

More about memory visibility..

I said before:

As you know that in parallel programming you have to take care
not only of memory ordering , but also take care about memory visibility, read this to notice it:

A store barrier, “sfence” instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued. This will make the program state "visible" to other CPUs so they can act on it if necessary.

Read more here to understand correctly:

"However under x86-TSO, the stores are cached in the store buffers,
a load consult only shared memory and the store buffer of the given thread, wich means it can load data from memory and ignore values from
the other thread."

Read more here:

https://books.google.ca/books?id=C2R2DwAAQBAJ&pg=PA127&lpg=PA127&dq=immediately+visible+and+m+fence+and+store+buffer+and+x86&source=bl&ots=yfGI17x1YZ&sig=ACfU3U2EYRawTkQmi3s5wY-sM7IgowDlWg&hl=en&sa=X&ved=2ahUKEwi_nq3duYPkAhVDx1kKHYoyA5UQ6AEwAnoECAgQAQ#v=onepage&q=immediately%20visible%20and%20m%20fence%20and%20store%20buffer%20and%20x86&f=false

========================================================================

Now can we ask the question of how much time takes the
store buffer to drain ?

So read here to notice:

https://nicknash.me/2018/04/07/speculating-about-store-buffer-capacity/

So as you are noticing he is giving around 500 no-ops to allow the store
buffer to drain, and i think that it can take less than that for the store buffer to drain, because i have noticed it in my scalable MLock when the store buffer draining time is amortized by the atomic CAS and by the time that it takes other cache-lines to transfer from core to core, here is my scalable MLock:

https://sites.google.com/site/scalable68/scalable-mlock

Thank you,
Amine Moulay Ramdane.

0 new messages