volatile keyword and memory barriers

Leigh Johnston

unread,

Jan 7, 2010, 6:57:43 PM1/7/10

to

I am confused, the Wikipedia article
http://en.wikipedia.org/wiki/Double-checked_locking claims that VC++
volatile keyword includes a memory barrier however if I compile the
following program:

volatile int n1;
volatile int n2;

int main()
{
++n1;
++n2;
}

I get the following output:

_main PROC ; COMDAT

; 19 : ++n1;

00000 b8 01 00 00 00 mov eax, 1
00005 01 05 00 00 00
00 add DWORD PTR ?n1@@3HC, eax ; n1

; 20 : ++n2;

0000b 01 05 00 00 00
00 add DWORD PTR ?n2@@3HC, eax ; n2

; 21 : }

00011 33 c0 xor eax, eax
00013 c3 ret 0
_main ENDP

I cannot see any memory barrier instructions here unless I am being stupid
so my question is does VC++ volatile keyword provide a memory barrier or
not? I am using VS2008.

/Leigh

Igor Tandetnik

unread,

Jan 7, 2010, 8:03:47 PM1/7/10

to

Leigh Johnston <le...@i42.co.uk> wrote:
> I am confused, the Wikipedia article
> http://en.wikipedia.org/wiki/Double-checked_locking claims that VC++
> volatile keyword includes a memory barrier however if I compile the
> following program:
>

> I get the following output:
>

> 00000 b8 01 00 00 00 mov eax, 1
> 00005 01 05 00 00 00
> 00 add DWORD PTR ?n1@@3HC, eax ; n1
>

> I cannot see any memory barrier instructions

x86 CPUs don't have memory barrier instructions and, architecturally, don't need them. You'd need to compile for IA64 to see them in action.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925

Leigh Johnston

unread,

Jan 8, 2010, 7:49:57 AM1/8/10

to

"Igor Tandetnik" <itand...@mvps.org> wrote in message
news:e97jv5$jKHA...@TK2MSFTNGP05.phx.gbl...

> Leigh Johnston <le...@i42.co.uk> wrote:
>> I am confused, the Wikipedia article
>> http://en.wikipedia.org/wiki/Double-checked_locking claims that VC++
>> volatile keyword includes a memory barrier however if I compile the
>> following program:
>>
>> I get the following output:
>>
>> 00000 b8 01 00 00 00 mov eax, 1
>> 00005 01 05 00 00 00
>> 00 add DWORD PTR ?n1@@3HC, eax ; n1
>>
>> I cannot see any memory barrier instructions
>
> x86 CPUs don't have memory barrier instructions and, architecturally,
> don't need them. You'd need to compile for IA64 to see them in action.

not true, LFENCE, SFENCE, MFENCE and LOCK all exist for x86 and are useful
in multi-threaded programs.

/Leigh

Leigh Johnston

unread,

Jan 8, 2010, 8:06:15 AM1/8/10

to

Targeting x64 makes no difference, still no memory barrier instructions
output.

Igor Tandetnik

unread,

Jan 8, 2010, 8:35:55 AM1/8/10

to

http://www.linuxjournal.com/article/8211

x86 CPU provides process consistency, where writes by one CPU are observed in order by all other CPUs. For this reason, it doesn't need explicit memory barrier instructions.

LFENCE, SFENCE and MFENCE are SSE instructions, apparently needed because certain other SSE instructions are asynchronous. I must admit I'm not very familiar with SSE, but your example doesn't issue SSE instructions anyway, so this is moot.

LOCK is not an instruction by itself, but a prefix to other instructions that renders them atomic (e.g. instructions like ADD which need to read, modify and write a memory location). Note that "volatile" doesn't promise or guarantee atomicity: ++n1 is still not atomic even though n1 is declared volatile.

Igor Tandetnik

unread,

Jan 8, 2010, 8:38:46 AM1/8/10

to

Johnston wrote:
> Targeting x64 makes no difference, still no memory barrier instructions
> output.

x64 provides the same strong consistency model as x86. That's why I said you need to compile for IA64 (aka Itanium): as far as I know, it's the only CPU supported by MSVC compiler that has a weak consistency model and actually needs memory barriers.

Leigh Johnston

unread,

Jan 8, 2010, 9:03:04 AM1/8/10

to

>
> http://www.linuxjournal.com/article/8211
>
> x86 CPU provides process consistency, where writes by one CPU are observed
> in order by all other CPUs. For this reason, it doesn't need explicit
> memory barrier instructions.
>
> LFENCE, SFENCE and MFENCE are SSE instructions, apparently needed because
> certain other SSE instructions are asynchronous. I must admit I'm not very
> familiar with SSE, but your example doesn't issue SSE instructions anyway,
> so this is moot.
>

The FENCE instructions are not "SSE" instructions they are required for the
following cases it seems
(http://www.intel.com/Assets/PDF/manual/253668.pdf):
Writes to memory are not reordered with other writes, with the following
exceptions:
- writes executed with the CLFLUSH instruction;
- streaming stores (writes) executed with the non-temporal move instructions
(MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
- string operations (see Section 8.2.4.1).

But yeah for my simple ADD example it looks like you are correct, no fence
required.

> LOCK is not an instruction by itself, but a prefix to other instructions
> that renders them atomic (e.g. instructions like ADD which need to read,
> modify and write a memory location). Note that "volatile" doesn't promise
> or guarantee atomicity: ++n1 is still not atomic even though n1 is
> declared volatile.

I am aware that LOCK is a prefix and I have read elsewhere that is *also*
acts as a memory barrier when used in conjunction with a compatible
instruction. I am also well aware that volatile does not promise atomicity,
I never said that it does.

/Leigh

Leigh Johnston

unread,

Jan 8, 2010, 9:20:10 AM1/8/10

to

>
> x86 CPU provides process consistency, where writes by one CPU are observed
> in order by all other CPUs. For this reason, it doesn't need explicit
> memory barrier instructions.
>

What about store forwarding? MFENCE may help I think (from
http://www.intel.com/Assets/PDF/manual/253668.pdf):

The memory-ordering model allows concurrent stores by two processors to be
seen
in different orders by those two processors; specifically, each processor
may perceive
its own store occurring before that of the other. This is illustrated by the
following
example:

Example 8-5. Intra-Processor Forwarding is Allowed
Processor 0 Processor 1
mov [ _x], 1 mov [ _y], 1
mov r1, [ _x] mov r3, [ _y]
mov r2, [ _y] mov r4, [ _x]
Initially x == y == 0
r2 == 0 and r4 == 0 is allowed

The memory-ordering model imposes no constraints on the order in which the
two
stores appear to execute by the two processors. This fact allows processor 0
to see
its store before seeing processor 1's, while processor 1 sees its store
before seeing
processor 0's. (Each processor is self consistent.) This allows r2 == 0 and
r4 == 0.
In practice, the reordering in this example can arise as a result of
store-buffer
forwarding. While a store is temporarily held in a processor's store buffer,
it can
satisfy the processor's own loads but is not visible to (and cannot satisfy)
loads by
other processors.

Igor Tandetnik

unread,

Jan 8, 2010, 9:19:19 AM1/8/10

to

Leigh Johnston wrote:
> The FENCE instructions are not "SSE" instructions they are required for the
> following cases it seems
> (http://www.intel.com/Assets/PDF/manual/253668.pdf):
> Writes to memory are not reordered with other writes, with the following
> exceptions:
> - writes executed with the CLFLUSH instruction;
> - streaming stores (writes) executed with the non-temporal move instructions
> (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
> - string operations (see Section 8.2.4.1).

All these are in fact from SSE[2] instruction set:

http://en.wikipedia.org/wiki/X86_instruction_listings

Igor Tandetnik

unread,

Jan 8, 2010, 9:34:31 AM1/8/10

to

Leigh Johnston wrote:
>> x86 CPU provides process consistency, where writes by one CPU are observed
>> in order by all other CPUs. For this reason, it doesn't need explicit
>> memory barrier instructions.
>>
>
> What about store forwarding?

I must admit you are digging deeper than my understanding extends. Hopefully, someone more knowledgeable will chime in.

Leigh Johnston

unread,

Jan 8, 2010, 9:44:53 AM1/8/10

to

>>
>> What about store forwarding?
>
> I must admit you are digging deeper than my understanding extends.
> Hopefully, someone more knowledgeable will chime in.
> --

Read
http://bartoszmilewski.wordpress.com/2008/11/05/who-ordered-memory-fences-on-an-x86/

Leigh Johnston

unread,

Jan 8, 2010, 9:58:40 AM1/8/10

to

I guess the use-cases for MFENCE on x86 are rare so Microsoft decided that
its overhead cannot be justified to have volatile use it.

/Leigh

Leigh Johnston

unread,

Jan 8, 2010, 10:28:35 AM1/8/10

to

Or the LOCK prefix rather which is used by Enter/LeaveCriticalSection.

/Leigh

Bo Persson

unread,

Jan 8, 2010, 5:42:35 PM1/8/10

to

Because volatile has nothing to do with threads, just with memory
mapped hardware?

Bo Persson

Igor Tandetnik

unread,

Jan 8, 2010, 5:49:35 PM1/8/10

to

That would be the case for most C++ compilers, yes. But MS made volatile have something to do with threads as of VC8 (VS 2005), by claiming that reading and writing volatile variables now has acquire/release semantics a la Java, and that the compiler would emit barrier instructions as necessary.