Fwd: Performance and memory model

52 views
Skip to first unread message

Fil Mackay

unread,
Aug 16, 2011, 9:23:13 PM8/16/11
to disrup...@googlegroups.com
On Wed, Aug 17, 2011 at 11:19 AM, Olivier Deheurles <ma...@odeheurles.com> wrote:

r260, the Sequence class now uses AtomicLongArray instead of volatile long.

 


Thanks Oliver. How did this translate to your port.NET?

One interesting thing I saw in the latest revisions was "lazySet" - I can't find an equivalent in the .NET world; you?

Regards, Fil.


Olivier Deheurles

unread,
Aug 16, 2011, 9:27:13 PM8/16/11
to disrup...@googlegroups.com

I don’t know any equivalent, I’m going to search but not sure to find something equivalent.

 

Martin, what is exactly the behavior of lazySet?

 

Thanks,

 

Olivier

 

De : disrup...@googlegroups.com [mailto:disrup...@googlegroups.com] De la part de Fil Mackay
Envoyé : mercredi 17 août 2011 02:23
À : disrup...@googlegroups.com
Objet : Fwd: Performance and memory model

Fil Mackay

unread,
Aug 16, 2011, 9:50:08 PM8/16/11
to disrup...@googlegroups.com
On Wed, Aug 17, 2011 at 11:27 AM, Olivier Deheurles <ma...@odeheurles.com> wrote:

I don’t know any equivalent, I’m going to search but not sure to find something equivalent.


Me too ;)
 

Martin, what is exactly the behavior of lazySet?


lazySet creates a fence prior to the write, (ie. it wont be re-ordered earlier), but will allow the write to be re-ordered later - and will not necessarily be visible immediately to other readers. It maps to memory_order_release in C++0x.

Regards, Fil.

Brandon

unread,
Aug 17, 2011, 5:50:25 PM8/17/11
to Disruptor-net
It is a write/store release type fence and is the same semantics that
you see with Thread.VolatileWrite, which you can verify by looking at
the Thread.VolatileWrite implementation and noting that the call to
Thread.MemoryFence is actually placed before the variable assignment.
As it so happens, the CLR memory model stipulates that all writes are
to follow store release fence semantics; so the same is actually
achieved without even using volatile (which actually JITs to a no-op
on x64/x86 platforms, but unfortunately you can't use the volatile
field specifier on longs) or Thread.VolatileWrite. I've verified this
CLR memory model property on x64 and you will actually see some fairly
significant performance gains by "loosening" the memory fencing in the
disruptor. Note that according to the CLR memory model, load/reads
follow much looser guidelines and require usages of volatile or
Thread.VolatileRead (which enforce read/load acquire semantics) to
ensure cross-platform consistency. However, on x64 CLR
implementations you may actually be able to not use volatile or
Thread.VolatileRead, but would run the risk of the JIT compiler
reorder memory access with optimizations like hoisting loads outside
of loops that it thinks maintains sequential consistency.

On Aug 16, 9:50 pm, Fil Mackay <f...@vertigotechnology.com> wrote:

Martin Thompson

unread,
Aug 17, 2011, 6:25:28 PM8/17/11
to Disruptor-net
The lasySet is basically a software memory barrier, i.e. instruction
to the compiler not to re-order stores, plus whatever is required for
the hardware to ensure store ordering. On x86/x64 the store order is
already defined in the memory model so a simple "mov" assembly
instruction is all that is required. The store buffer ordering and
cache coherency take care of things from there. On other platforms
some type of fence will be required to ensure stores are not re-
ordered with older stores. On Linux this would be "memory_barrier()"
in C/C++ if on x86/x64, or the atomic for C++0x is doing a store with
"memory_order_release".

Windows must have something similar for C/C++/C#.

On Aug 17, 2:27 am, "Olivier Deheurles" <m...@odeheurles.com> wrote:
> I don’t know any equivalent, I’m going to search but not sure to find
> something equivalent.
>
> Martin, what is exactly the behavior of lazySet?
>
> Thanks,
>
> Olivier
>
> De : disrup...@googlegroups.com [mailto:disrup...@googlegroups.com]
> De la part de Fil Mackay
> Envoyé : mercredi 17 août 2011 02:23
> À : disrup...@googlegroups.com
> Objet : Fwd: Performance and memory model
>
> On Wed, Aug 17, 2011 at 11:19 AM, Olivier Deheurles <m...@odeheurles.com>

Brandon

unread,
Aug 17, 2011, 7:05:12 PM8/17/11
to Disruptor-net
The CLR's specified memory model actually mirrors that of the x86/x64
specified memory model. That's why writing to volatile fields while
running on x86/x64 implementations of the CLR results in no-ops,
because, as you say, the x86/x64 memory model specifies store release
fence semantics. Tagging it as volatile simply prevents JIT compiler
level reordering in this case, as you also say. Unfortunately, as of
C# 4, you can't specify longs as being volatile and are required to
use the Thread.VolatileWrite/Read methods, which are quite costly by
comparison, because they will emit full fences (through lock:or/add
instructions I believe). However, as it so happens with the current
x64 JIT compiler implementation, it will not reorder stores (and it
may not even reorder loads at the moment, but I know the x86
implementation did, but I certainly wouldn't count of this). I
believe that Windows only currently runs on processors that support
full cache coherency (x86/x64 and IA64 (I think this is full cache
coherent)), so memory fences aren't required to "flush" caches to make
their contents visible to other processors.
> > Regards, Fil.- Hide quoted text -
>
> - Show quoted text -

Brandon

unread,
Aug 17, 2011, 7:07:38 PM8/17/11
to Disruptor-net
I should also add that the CLR could run on non-Windows platforms/
processor architectures (such as what you see in some mobile devices)
that don't have full cache coherency, in which case the CLR would have
to emit memory fences/flush instructions to make writes visible to
other processors.
> > - Show quoted text -- Hide quoted text -
Reply all
Reply to author
Forward
0 new messages