Must an atomic write always 'happen' at all?

Richard Hodges

unread,

Sep 9, 2016, 6:27:19 AM9/9/16

to std-dis...@isocpp.org

For reference, I refer you to this interesting (at least to me) discussion in the comments of a stack overflow answer:

http://stackoverflow.com/a/39394173/2015579

Here is the question at hand:

I conject that in this following program, I have a right to expect to see different numbers emerge on stdout. My counterpart asserts that this is merely a quality of implementation detail, and that the expressions:

++num;

—num;

could be legally optimised away by the compiler.

Can anyone shed any light on:-

a. what the standard actually says

b. what it really means

c. what it was supposed to say

on the subject?

Thanks in advance

code:

#include <thread>
#include <atomic>

int main()
{
    for (int iter = 0 ; iter < 20 ; ++iter)
    {
        std::atomic<int> num = { 0 };
        std::thread t1([&] {
            for (int i = 0 ; i < 10000000 ; ++i)
            {
                ++num;
                --num;
            }
        });
        std::thread t2([&] {
            for (int i = 0 ; i < 10000000 ; ++i)
            {
                num = 100;
            }
        });

        t2.join();
        t1.join();
        std::cout << num << std::endl;
    }
}

sample output:

Anthony Williams

unread,

Sep 9, 2016, 7:19:01 AM9/9/16

to std-dis...@isocpp.org

On 09/09/16 11:27, Richard Hodges wrote:
> For reference, I refer you to this interesting (at least to me)
> discussion in the comments of a stack overflow answer:
>
> http://stackoverflow.com/a/39394173/2015579
>
> Here is the question at hand:
>
> I conject that in this following program, I have a right to expect to
> see different numbers emerge on stdout. My counterpart asserts that this
> is merely a quality of implementation detail, and that the expressions:
>
> ++num;
> —num;
>
> could be legally optimised away by the compiler.
>
> Can anyone shed any light on:-
>
> a. what the standard actually says
> b. what it really means
> c. what it was supposed to say
>
> on the subject?

Everything in the C++ standard is worded in terms of behaviour of the
abstract machine. The compiler is allowed to generate any code that
matches a possible behaviour of the abstract machine for the provided
source code (the as-if rule).

The only *required* observable behaviour is the I/O and accesses to
volatile variables of the abstract machine.

Since "num" is not declared volatile, the compiler is allowed to look at
the increment/decrement pair, say "it is a possible execution of the
abstract machine to have no other thread observe the intermediate state,
so let's enforce that and combine the operations".

Note: there was a paper about optimizing atomics published just before
the last C++ standards meeting:

http://wg21.link/p0062

Anthony
--
Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/
just::thread C++11 thread library http://www.stdthread.co.uk
Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Anthony Williams

unread,

Sep 9, 2016, 7:20:06 AM9/9/16

to std-dis...@isocpp.org

On 09/09/16 12:18, Anthony Williams wrote:
> Note: there was a paper about optimizing atomics published just before
> the last C++ standards meeting:
>
> http://wg21.link/p0062

Richard Hodges

unread,

Sep 9, 2016, 8:55:05 AM9/9/16

to std-dis...@isocpp.org

Very helpful Anthony, thank you.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Faisal Vali

unread,

Sep 9, 2016, 6:24:52 PM9/9/16

to std-dis...@isocpp.org

On Fri, Sep 9, 2016 at 5:27 AM, Richard Hodges <hodg...@gmail.com> wrote:
> For reference, I refer you to this interesting (at least to me) discussion
> in the comments of a stack overflow answer:
>
> http://stackoverflow.com/a/39394173/2015579
>
> Here is the question at hand:
>
> I conject that in this following program, I have a right to expect to see
> different numbers emerge on stdout. My counterpart asserts that this is
> merely a quality of implementation detail, and that the expressions:
>
> ++num;
> —num;
>
> could be legally optimised away by the compiler.
>
> Can anyone shed any light on:-
>
> a. what the standard actually says
> b. what it really means
> c. what it was supposed to say
>

It's tough (if not impossible) to improve upon any of Anthony's
answers in the concurrency arena - so I won't claim to do that. But
here's another phrasing of the answer - that may have some value for
the bored:

One way to think of the c++ standard - at its highest level of
fundamental abstraction - is attempting to describe aspects of two
infinite sets of turing-complete machines (TMs):

1) for the first set of TMs, it describes the set of countable
infinite strings (a C++ program) that any of those *deterministic*
turing-complete machines (our compilers) will 'Recognize' [Not
'Decide', given: ill-formed no diagnostic required, constexpr and
template instantiation based infinite recursion...]. It places no
restrictions on the intermediate or the eventual configurations of the
machine - just that these machines must recognize a valid set of
strings as described in the standard.

2) for the second set, the standard defines a set of
*nondeterministic* universal turing-complete machines (our computers)
that 'Recognize' string encodings of any of the turing-machines from
the first set (our compilers) concatenated with a string (C++
programs) that they (our compilers) recognize, concatenated with the
input strings to those C++ programs. The standard places restrictions
on the structure/mechanics of these universal-TMs. It only allows for
those universal TM's that, as they process their input must contain
certain configurations (current state and tape/RAM-status), having
executed a certain number of steps or having halted. These
restrictions are minimal and per the standard, if Bjarne ;) should
ever randomly choose to inspect the configuration of any of these
machines at any given moment of his day (or at any given step of the
machine), he must be able to determine:
- that any volatile memory location that was described as being
accessed in the C++ program was indeed accessed (i.e. the tape records
every access) in an order consistent with happens before (although
volatile accesses to independent locations can be re-ordered with
respect to each other according to some experts - since 'strictly' is
poorly defined!)
- and that any prompting output occurred before any corresponding
input was processed.

Additionally, if the universal TM ever halts - only then is he allowed
to examine the file contents (i.e a certain tape of a multi-tape TM) -
and the contents must be in a certain state (where the valid eventual
states of the tape are constrained by the standard-provided
semantic-descriptions of each expression and the happens-before
relation between each of their side-effects).

That's it! That's all that your system can be held accountable for *by
the standard*.

Also, given that the standard describes the behavior (configurations)
of a non-deterministic universal TM (and not a probabilistic TM) -
there can be (and indeed frequently are when multiple
threads-of-execution are created) many valid configuration sequences
that eventually lead to acceptance (or halting) - and any sequence is
as good as any other sequence as long as it doesn't violate the above
constraints. Given that the machine described by the standard is not
probabilistic, there is no requirement that any of these sequences
must ever occur with any frequency - just that one of them must occur
- and the *nondeterministic* (but not probabilistic) machine can
always choose that same sequence/path - and the standard would be OK
with it.

So, in summary - since your program has a valid halting configuration
that allows t1 to have completed before t2 (regardless of the joins),
your program could be optimized to the following:

int main() {
for (int iter = 0; iter < 20; ++iter)

std::cout << 100 << std::endl;
}

and the standard should have nothing bad to say about that (since
inspection of the output-file - upon halting - would be consistent
with the constraints placed by the standard). Your management may
have something to say about you using that compiler/optimizer - but
they wouldn't be able to lean on the standard for authority ;)

> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to std-discussio...@isocpp.org.

Richard Hodges

unread,

Sep 9, 2016, 7:00:03 PM9/9/16

to std-dis...@isocpp.org

Thank you Faisal, that is indeed enlightening. It tells me that my thinking about computer programs has always been driven by empirical observation (of both software and circuits) and not by extrapolating the (lack of) defined consequences of the model that stands behind the standard.

So my final question would be, "is it possible to write an efficient spinlock in standard c++ which is guaranteed to be 'fast' and fair"? My suspicion is no, since there is no guarantee about when another thread (running concurrently rather than pre-emptively) might observe a change in the counter of the spinlock. This puts the author of a spinlock back in the position of having to rely on "known quirks" ( otherwise known as implementation defined behaviour) of a compiler.

This seems to me to be counter to the purpose of the standard memory model, and seems to implicitly thwart the thousands of developers who seem to constantly strive to create lock-free concurrent containers in user code (a practice I strongly discourage in my own projects in any case).

R

Sent from my iPad

Reply all

Reply to author

Forward