> On Fri, Oct 16, 2009 at 9:07 AM, John Dlugosz <
JDlu...@tradestation.com> wrote:
> > Howdy!
> >
> > I happened upon your post <
> >
http://groups.google.com/group/lock-free/browse_frm/thread/1efdc652571
> > c6137>
> >
> > and read other that were linked from it, and searched the web for
> > rcu_syncrohize, etc.
> >
> > I've been doing quite a bit of code that avoids both locks and "atomic"
> > instructions, on the x86/x64 architecture under Windows.
> >
> > But I really don't understand what is going on here. What's all the
> > hullabaloo with sys_sych() etc? I'm guessing that some processor
> > architecture will buffer writes in a way that they are not visible to
> > other cores. But, my understanding is that all machines for sale
> > today actually are "ccNUMA", that is, the cache is coherent across all
> > cores even on different nodes.
> >
> > So, what am I missing?
> >
> > Also, I see you have _ReadWriteBarrier() as the last statement in
> > reader_lock. That seems odd. It's to deal with inlining of that
> > function, right?
>
>
> Hi John,
>
> I will write a kind of article on asymmetric synchronization, I was going to do that anyway. However it will take some time, so here is quick answers.
>
> No, sys_sych() is not about cache coherency, it's about ordering. It's needed for the same things you use MFENCE instruction on x86.
>
> _ReadWriteBarrier() is required in reader_lock()/reader_unlock() so as to code from user critical section does not intermix with synchronization code in reader_lock()/reader_unlock(). If you have, let's say, _InterlockedExchange() in mutex acquire function then this provides some guarantees that user code will not hoist above that. But since I have basically no synchronization in reader_lock()/reader_unlock(), user code can hoist above acquire or sink below release.
>
> While I am writing my "article" please refer to David Dice et al "Asymmetric Dekker Synchronization":
>
http://home.comcast.net/~pjbishop/Dave/Asymmetric-Dekker-Synchronization.txt
> They describe basically the same idea. They call SYNCHRONIZE(thread) what I called sys_sych(). However they applied the idea only to two-thread mutual exclusion, while I applied it to the reader-write problem.
>
> If you will have any questions after reading "Asymmetric Dekker Synchronization", feel free to ask me. However I would prefer to have a discussion on the lock-free group:
>
http://groups.google.com/group/lock-free
>
> Thank you for the interest.
>
> --
> Dmitriy V'jukov
>
> Relacy Race Detector: Make your synchronization correct!
>
http://groups.google.ru/group/relacy
That's what I meant about sys_sync and cache coherency: once the core
processes the write (which may have been queued to an I/O pipeline),
sending it "to memory", it is visible to all cores even though the
write starts off in the first of a hierarchy of caches before getting
to the main RAM chips.
So, given that once a write is actually issued (not just queued to be
performed later) it is visible to all, what is the reason for
requiring a thread on every other core to perform some special
operation? The write queuing is something that is only on this core.
Making sure it is really issued before doing the read is something
local to the thread doing it.
What architectures require this? The original writings here don't
mention anything so seems to be general purpose. But I've not
encountered any such concept on x86/x64.