Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

correct use of mfence

94 views
Skip to first unread message

Ram

unread,
Feb 27, 2009, 6:02:58 PM2/27/09
to
Hi,

I am writting atomic inc/dec/cas operations in assembly.

increment64(__int64* value, int val)
{
__int64* temp = &val;
_asm
{
push edi
push ebx
mov edi, pValue
mov eax, [edi]
mov edx, [edi+4]
again:
mov ecx, edx
mov ebx, eax
add ebx, 1
adc ecx, 0
mfence
lock cmpxchg8b [edi]
mfence
jnz again
mov eax, temp
mov [eax], ebx
mov [eax+4], ecx
pop ebx
pop edi
}
}

Requirement is that it should work on single and multi-core
hardwares.

1. Use of mfence is not clear to me. So I temporarily put mfence
before and after cmpxchg8b. I am looking for correct use of mfence
opcode in this context. I would like to know how many lines mfence
serializes before its use.
MFENCE:
This serializing operation guarantees that every load and store
instruction that precedes in program order the MFENCE instruction is
globally visible before any load or store instruction that follows the
MFENCE instruction is globally visible

2. If i use mfence then also i need to use lock opcode before
cmpxchg8b?

What is the correct use of mfence in above assembly code.

Thanks,
Ram

Chris M. Thomasson

unread,
Feb 27, 2009, 6:49:57 PM2/27/09
to
"Ram" <shini...@gmail.com> wrote in message
news:915d2208-33de-467e...@w24g2000prd.googlegroups.com...

> Hi,
>
> I am writting atomic inc/dec/cas operations in assembly.
>
> increment64(__int64* value, int val)
> {
> [...]

> }
>
> Requirement is that it should work on single and multi-core
> hardwares.
>
> 1. Use of mfence is not clear to me. So I temporarily put mfence
> before and after cmpxchg8b. I am looking for correct use of mfence
> opcode in this context. I would like to know how many lines mfence
> serializes before its use.
> MFENCE:
> This serializing operation guarantees that every load and store
> instruction that precedes in program order the MFENCE instruction is
> globally visible before any load or store instruction that follows the
> MFENCE instruction is globally visible

MFENCE makes all preceding loads/stores to write-back/combining memory
visible before any subsequent store/load instructions are rendered visible.


> 2. If i use mfence then also i need to use lock opcode before
> cmpxchg8b?

Sure; the MFENCE has no bearing on the atomicity of a RMW instruction.


> What is the correct use of mfence in above assembly code.

The LOCK prefix acts as a full memory barrier. You would only need MFENCE in
your `increment64()' function if you also wish to synchronize
write-combining (e.g., wrt SSE streaming loads/stores) along with write-back
memory (e.g., from malloc()/free()). If you need to support synchronization
of SSE streaming instructions, I would make separate functions, one for
write-back memory, and one for write-combining.

Dmitriy Vyukov

unread,
Feb 28, 2009, 4:32:37 AM2/28/09
to
On Feb 28, 2:49 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Ram" <shining...@gmail.com> wrote in message

>
> news:915d2208-33de-467e...@w24g2000prd.googlegroups.com...
>
>
>
> > Hi,
>
> > I am writting atomic inc/dec/cas operations in assembly.
>
> > increment64(__int64* value, int val)
> > {
> > [...]
> > }
>
> > Requirement is that it should work on single and multi-core
> > hardwares.
>
> > 1. Use of mfence is not clear to me. So I temporarily put mfence
> > before and after cmpxchg8b. I am looking for correct use of mfence
> > opcode in this context. I would like to know how many lines mfence
> > serializes before its use.
> > MFENCE:
> > This serializing operation guarantees that every load and store
> > instruction that precedes in program order the MFENCE instruction is
> > globally visible before any load or store instruction that follows the
> > MFENCE instruction is globally visible
>
> MFENCE makes all preceding loads/stores to write-back/combining memory
> visible before any subsequent store/load instructions are rendered visible.
>
> > 2. If i use mfence then also i need to use lock opcode before
> > cmpxchg8b?
>
> Sure; the MFENCE has no bearing on the atomicity of a RMW instruction.

Well, but if you use LOCK prefix, then you already don't need
MFENCE :)

> > What is the correct use of mfence in above assembly code.
>
> The LOCK prefix acts as a full memory barrier. You would only need MFENCE in
> your `increment64()' function if you also wish to synchronize
> write-combining (e.g., wrt SSE streaming loads/stores) along with write-back
> memory (e.g., from malloc()/free()). If you need to support synchronization
> of SSE streaming instructions, I would make separate functions, one for
> write-back memory, and one for write-combining.

Hmmm... Chris, doesn't LOCK also synchronize non-temporal stores and
WC memory?

--
Dmitriy V'jukov

Chris M. Thomasson

unread,
Feb 28, 2009, 5:28:52 AM2/28/09
to
"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:45fa7b28-fa83-472e...@h5g2000yqh.googlegroups.com...

On Feb 28, 2:49 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> > [...]

> Hmmm... Chris, doesn't LOCK also synchronize non-temporal stores and
> WC memory?

Does LOCK prefix ensure sync of WC memory if LOCK'ed is already location in
the cache? There can be certain optimizations wrt this specific scenario. I
am not sure if this effects WC memory.

Chris M. Thomasson

unread,
Feb 28, 2009, 5:37:58 AM2/28/09
to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:mk8ql.40320$g63....@newsfe24.iad...

IIRC, if target address is already in cache, then LOCK signal can be
avoided.

Chris M. Thomasson

unread,
Feb 28, 2009, 5:39:59 AM2/28/09
to
"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:mk8ql.40320$g63....@newsfe24.iad...

I am probably missing something here!

;^(


Please enlighten me Dmitriy.

Dmitriy Vyukov

unread,
Feb 28, 2009, 9:47:02 AM2/28/09
to
On Feb 28, 1:39 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Chris M. Thomasson" <n...@spam.invalid> wrote in messagenews:mk8ql.40320$g63....@newsfe24.iad...
>
> > "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message

> >news:45fa7b28-fa83-472e...@h5g2000yqh.googlegroups.com...
> > On Feb 28, 2:49 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> >> > [...]
> >> Hmmm... Chris, doesn't LOCK also synchronize non-temporal stores and
> >> WC memory?
>
> > Does LOCK prefix ensure sync of WC memory if LOCK'ed is already location
> > in the cache? There can be certain optimizations wrt this specific
> > scenario. I am not sure if this effects WC memory.
>
> I am probably missing something here!
>
> ;^(
>
> Please enlighten me Dmitriy.


Sorry for that. I was always thinking that LOCK is a kind of superset
of MFENCE. It turns out that this is architecture dependent:

------------------------------
For the P6 family processors, locked operations serialize all
outstanding load and store operations (that is, wait for them to
complete). This rule is also true for the Pentium 4 and Intel Xeon
processors, with one exception. Load operations that reference weakly
ordered memory types (such as the WC memory type) may not be
serialized.
------------------------------

--
Dmitriy V'jukov

Dmitriy Vyukov

unread,
Mar 1, 2009, 3:03:16 AM3/1/09
to


But for synchronization of non-temporal stores and WC memory SFENCE is
enough, which is some cheaper.

--
Dmitriy V'jukov

0 new messages