Why do we use xchg rather than lock mov to inplement atomic.StoreX?

197 views
Skip to first unread message

Cholerae Hu

unread,
Apr 26, 2020, 4:31:04 AM4/26/20
to golang-nuts
Atomic.StoreX doesn't return the old value of the given pointer, so lock mov would work. Why do we use a xchg instead? It there any performance issue?

Ian Lance Taylor

unread,
Apr 27, 2020, 7:26:15 PM4/27/20
to Cholerae Hu, golang-nuts
On Sun, Apr 26, 2020 at 1:31 AM Cholerae Hu <chole...@gmail.com> wrote:
>
> Atomic.StoreX doesn't return the old value of the given pointer, so lock mov would work. Why do we use a xchg instead? It there any performance issue?

I assume that you are talking about Intel processors. Intel
processors do not have a lock mov instruction.

From the Intel architecture manual:

The LOCK prefix can be prepended only to the following
instructions and only to those forms
of the instructions where the destination operand is a memory
operand: ADD, ADC, AND,
BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR,
XADD, and XCHG.

Ian

Cholerae Hu

unread,
Apr 28, 2020, 7:03:00 AM4/28/20
to golang-nuts
But on gcc 9.3, atomic store with seq_cst order, will be compiled to mov+fence rather than xchg, see https://gcc.godbolt.org/z/ucbQt6 . Why do we use xchg rather than mov+fence in Go?

在 2020年4月28日星期二 UTC+8上午7:26:15,Ian Lance Taylor写道:

keith....@gmail.com

unread,
Apr 28, 2020, 7:42:26 PM4/28/20
to golang-nuts
It looks like the mechanism used by C's std::atomic would not be useful for us.

We require release semantics on atomic stores.  That is, if one thread does:

.. some other writes ...
atomic.StoreInt32(p, 1)

and another thread does

if atomic.LoadInt32(p) == 1 {
   .. some other reads ...
}

If the load sees the store, then the "other reads" must see all of the "other writes". For the C atomic you cited, it does:

regular write
mfence

That doesn't provide the guarantee we need. A write before the atomic could be reordered with the regular write, causing the reader to not see one of the writes it was required to.

For our use case, it would have to be

mfence
regular write

and the semantics of mfence would need to prevent write-write reorderings (does it do that? Not sure.)

We'd need some indication that changing it would be faster, as well.

Cholerae Hu

unread,
Apr 28, 2020, 11:23:31 PM4/28/20
to golang-nuts
On x86-TSO model, it seems that we don't need any mfence to archive acquire-release semantics. Acquire-release semantics only need compiler barrier to prevent compiler reordering, see https://godbolt.org/z/7JcX-d .

在 2020年4月29日星期三 UTC+8上午7:42:26,keith....@gmail.com写道:

Keith Randall

unread,
Apr 30, 2020, 3:42:07 PM4/30/20
to Cholerae Hu, golang-nuts
Ah, so I guess we don't need a barrier at all on x86 for the release semantics.
Presumably we still need something for Dekker-style algorithms, although I don't think we use those anywhere in the stdlib, at least.
I guess it's just a question of which is faster?

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/EbBrCk2LOaU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/80d2c494-809b-47d0-bb9b-549b32068c1c%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages