Memory model of C++ considered limited

michael.po...@gmail.com

unread,

Jan 9, 2019, 11:17:18 PM1/9/19

to

If I am not mistaken, I am going to show that there is a class of concurrent algorithms which are incompatible with C++ memory model.

An excerpt from https://en.cppreference.com/w/cpp/language/memory_model:

"If a data race occurs, the behavior of the program is undefined."

And a data race is "simultaneous" read and write operation from two threads (or two writes as well) of the non-atomic data.

Now, let's consider a special kind of single-writer multiple-readers algorithm which never blocks the writer (which is a fixed dedicated thread which delivers data updates). I define it out of the context of C++, just pure processor instructions with appropriate memory barriers. We expect that our processor may read/write some memory cells in an "atomic" way i.e. as a whole and with arbitrary memory barriers we may need. The other memory cells may be read/written without such "atomic" protection.

Suppose, we have a buffer of memory "B" which is to be updated by the writer and read by the multiple readers. We may add an atomic counter (initialized to zero) to this buffer and make the writer increment the counter before it starts to update the buffer and after it finishes to update it. As a result, if the reader reads this counter and sees an odd value, it knows that the buffer is currently updated. On the other side, if the counter is even, the reader may read the buffer "B", then read the counter again and if it finds it did not change from the beginning, the data read from the buffer may be considered valid. Note that the readers may need to retry their read operation or wait while the counter is odd, but the writer is never blocked.

Now, of course we do not need our buffer to be an array of atomic memory cells, our protocol looks to care well about all the possible conflicts and simultaneous access to the buffer memory. Making the buffer atomic will just make our code less effective and will disallow workig with it as with an array of chars, for instance.

And here we can clearly say that this algorithm, as defined, may not be implented in C++. It does allow simultaneous reads and writes of the same non-atomic buffer, but the C++ says, that such a case must be qualified as "data race" and as a result "the behavior of the program is undefined".

Chris M. Thomasson

unread,

Jan 9, 2019, 11:43:42 PM1/9/19

to

On 1/9/2019 8:17 PM, michael.po...@gmail.com wrote:
> If I am not mistaken, I am going to show that there is a class of concurrent algorithms which are incompatible with C++ memory model.
>
> An excerpt from https://en.cppreference.com/w/cpp/language/memory_model:
>
> "If a data race occurs, the behavior of the program is undefined."
>
> And a data race is "simultaneous" read and write operation from two threads (or two writes as well) of the non-atomic data.
>
> Now, let's consider a special kind of single-writer multiple-readers algorithm which never blocks the writer (which is a fixed dedicated thread which delivers data updates). I define it out of the context of C++, just pure processor instructions with appropriate memory barriers. We expect that our processor may read/write some memory cells in an "atomic" way i.e. as a whole and with arbitrary memory barriers we may need. The other memory cells may be read/written without such "atomic" protection.
>
> Suppose, we have a buffer of memory "B" which is to be updated by the writer and read by the multiple readers. We may add an atomic counter (initialized to zero) to this buffer and make the writer increment the counter before it starts to update the buffer and after it finishes to update it. As a result, if the reader reads this counter and sees an odd value, it knows that the buffer is currently updated. On the other side, if the counter is even, the reader may read the buffer "B", then read the counter again and if it finds it did not change from the beginning, the data read from the buffer may be considered valid. Note that the readers may need to retry their read operation or wait while the counter is odd, but the writer is never blocked.

This sounds just like a seqlock.

> Now, of course we do not need our buffer to be an array of atomic memory cells, our protocol looks to care well about all the possible conflicts and simultaneous access to the buffer memory. Making the buffer atomic will just make our code less effective and will disallow workig with it as with an array of chars, for instance.
>
> And here we can clearly say that this algorithm, as defined, may not be implented in C++. It does allow simultaneous reads and writes of the same non-atomic buffer, but the C++ says, that such a case must be qualified as "data race" and as a result "the behavior of the program is undefined".
>

We can have multiple reads per non-atomic data, think read write lock.
Not 100% sure about writes, it should be a data-race. Need to think on
this. Fwiw, have you seen the following "distributed" seqlock:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/improved-lock-free-seqlock

One way, might be to hide the state behind a pointer load, and use
data-dependent consume wrt the readers loading. Something like:

struct state
{
///[...]
};

state g_state = { ... };
std::atomic<state*> m_state_ptr = &g_state;

We can load state in one shot via memory order consume:

state* s = m_state_ptr.load(std::memory_order_consume);

Still, if the state can be concurrently read and written from/to, then
its members should be atomic, wrt relaxed memory order loads.

michael.po...@gmail.com

unread,

Jan 10, 2019, 1:16:42 AM1/10/19

to

On Wednesday, January 9, 2019 at 11:43:42 PM UTC-5, Chris M. Thomasson wrote:

>
> This sounds just like a seqlock.
>

That's right.

> Still, if the state can be concurrently read and written from/to, then
> its members should be atomic, wrt relaxed memory order loads.

Yes, we'll need to define the data buffer as an array of atomics in seqlock to comply to C++ rules, then use the most weak "relaxed" memory order to read or write this buffer. By doing this we just surrender to C++ limitations. Had we written the code in assembly, we would not have cared much about how we acceess the same buffer.

Alf P. Steinbach

unread,

Jan 10, 2019, 2:08:16 AM1/10/19

to

The conclusion “may not be implemented in C++” is too general.

“May not be 100% portably implemented in C++” is a more reasonable
conclusion.

The C++17 standard notes, in the relevant section, that there may, at
least hypothetically, be systems with hardware race detection:

C++17 §4.7.1/23:
“Transformations that introduce a speculative read of a potentially
shared memory location may not preserve the semantics of the C ++
program as defined in this International Standard, since they
potentially introduce a data race. However, they are typically valid in
the context of an optimizing compiler that targets a specific machine
with well-defined semantics for data races. They would be invalid for a
hypothetical machine that is not tolerant of races or provides hardware
race detection”

So it seems that the unqualified UB is in support of possible future
systems that are not tolerant of data races.

Or, it /could/ be a defect, that there should be some qualification that
if the read data in a race, is used for anything, then you have UB. But
this language would preclude the mentioned hypothetical architectures,
unless also that was made part of the qualification. Which would be unusual.

Cheers!,

- Alf

Chris M. Thomasson

unread,

Jan 10, 2019, 4:29:23 AM1/10/19

to

Well, I agree with you Michael. Fwiw, Relacy Race Detector definitely
does not approve of its non-atomic's (e.g., VAR_T) being concurrently
written to by multiple threads, multiple reads all day long, no
conflicting writes. It tries to model the standard... ;^)

Chris Vine

unread,

Jan 10, 2019, 11:35:54 AM1/10/19

to

Interesting, although I don't think I accept your proof, at least as
stated. However it is highly refreshing to have someone post a question
about a C++ related issue to this newsgroup rather than the perpetual
stream of religious postings: we seem to have about 10 off-topic
(religious orientated, atheist orientated, science fiction related or
other ancillary nonsense) for every on-topic post at present, so thank
you.

According to the standard:

"The execution of a program contains a data race if it contains two
potentially concurrent conflicting actions, at least one of which is
not atomic, and neither happens before the other, except for the
special case for signal handlers described below".

and

"Two expression evaluations conflict if one of them modifies a
memory location (4.4) and the other one reads or modifies the same
memory location."

There is nothing wrong with having a buffer comprising an array of
non-atomic ints (or of non-atomic anything else) protected by some form
of synchronization such as a mutex or semaphore, including some custom
semaphore you have created yourself from memory barriers, atomic values
and spinlocks. The question to be asked is whether there is some
synchronization which prevents any non-atomic memory location being
modified concurrently with a read or another write.

What you might be thinking of is avoiding spinlocks on values which are
atomic at the hardware level and doing what we used to do in the old
(pre-C++11 memory model) days and use volatile built-in types (say,
ints) which you know to be atomic at the hardware level, supplemented
as necessary by fences to ensure visibility on your platform. But if
you can do that you can achieve the identical effect with atomic ints
with relaxed memory ordering (including relaxed memory ordering with an
external memory barrier - there is a use for that). You can even (I
think) have a std::array of std::atomic ints or a std::atomic of
std::array of ints with relaxed memory ordering. (In the case of an
std::atomic array of ints you must ensure you have relaxed memory
ordering set for it, otherwise the implementation will insert a mutex
to protect the array, which you don't want).

In other words, I think that anything you can do with volatile integer
types with additional fences can be done identically with a std::atomic
integer types with relaxed memory ordering and additional fences.

michael.po...@gmail.com

unread,

Jan 10, 2019, 1:11:40 PM1/10/19

to

On Thursday, January 10, 2019 at 11:35:54 AM UTC-5, Chris Vine wrote:

> According to the standard:
>
> "The execution of a program contains a data race if it contains two
> potentially concurrent conflicting actions, at least one of which is
> not atomic, and neither happens before the other, except for the
> special case for signal handlers described below".
>
> and
>
> "Two expression evaluations conflict if one of them modifies a
> memory location (4.4) and the other one reads or modifies the same
> memory location."
>
> There is nothing wrong with having a buffer comprising an array of
> non-atomic ints (or of non-atomic anything else) protected by some form
> of synchronization such as a mutex or semaphore, including some custom
> semaphore you have created yourself from memory barriers, atomic values
> and spinlocks. The question to be asked is whether there is some
> synchronization which prevents any non-atomic memory location being
> modified concurrently with a read or another write.

I agree with all that you said above. But that is out of the context of my original post which discusses an algorithm which simply does not need to "protect" an access to non-atomic data with the standard synchronization primitives and achieves the correctness by the very different means.

> What you might be thinking of is avoiding spinlocks on values which are
> atomic at the hardware level and doing what we used to do in the old
> (pre-C++11 memory model) days and use volatile built-in types (say,
> ints) which you know to be atomic at the hardware level, supplemented
> as necessary by fences to ensure visibility on your platform.

Well, I don't see if this is related to my post either. I don't care in my algorithm about spinlocks and if I cared, I would not have a problem to make a spinlock around a C++11 atomic data.

> But if
> you can do that you can achieve the identical effect with atomic ints
> with relaxed memory ordering (including relaxed memory ordering with an
> external memory barrier - there is a use for that). You can even (I
> think) have a std::array of std::atomic ints or a std::atomic of
> std::array of ints with relaxed memory ordering. (In the case of an
> std::atomic array of ints you must ensure you have relaxed memory
> ordering set for it, otherwise the implementation will insert a mutex
> to protect the array, which you don't want).
>
> In other words, I think that anything you can do with volatile integer
> types with additional fences can be done identically with a std::atomic
> integer types with relaxed memory ordering and additional fences.

Well... It is true I could implement "my" algorithm (known commonly as "seqlock") by defining my data buffer as an array of atomic types, say atomic<int> or atomic<char> then accessing it with memory_order_relaxed.

But

1. the point is the standard does no give me any guarantee such an access have the same effectiveness as an access to non-atomic memory.
2. Also, I cannot use strcpy() or sprintf() to write into my buffer.
3. Also, if my buffer is structured, say, has a format

struct Buffer
{
std::atomic<int> a;
std::atomic<double> ddd;
}

then access to the "ddd" may demand either mutex/spinlock or "bus locked" operation even for memory_order_relaxed model (surprise?) just to guarantee the atomicity.

So, there is no a reasonable and effective implementation here. And the cause is our buffer is simply NOT atomic by the algorithm design and that we need to make it atomic is the result of C++ memory mode limitations.

As for using std::atomic<std::array<int,1000>>, this is simply extremelly non-effective, causes massive memory copying and sychronozation (mutex/spinlock) and cannot be considered as a resolution candidate at all.

michael.po...@gmail.com

unread,

Jan 10, 2019, 1:34:39 PM1/10/19

to

On Wednesday, January 9, 2019 at 11:17:18 PM UTC-5, michael.po...@gmail.com wrote:
> If I am not mistaken, I am going to show that there is a class of concurrent algorithms which are incompatible with C++ memory model.

Replying to myself as a sort of continuation.

Interestingly, I found an article "Can Seqlocks Get Along with Programming Language Memory Models?" by Hans-J. Boehm

http://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf

For what I could see, Boehm does not exactly consider the problem I raised here (so, his attitude and a range of discussed issues is rather different), yet it still worth reading for those interested in the topics discussed here.

Chris Vine

unread,

Jan 10, 2019, 1:46:26 PM1/10/19

to

On Thu, 10 Jan 2019 10:11:30 -0800 (PST)
michael.po...@gmail.com wrote:
> On Thursday, January 10, 2019 at 11:35:54 AM UTC-5, Chris Vine wrote:
> > According to the standard:
> >
> > "The execution of a program contains a data race if it contains two
> > potentially concurrent conflicting actions, at least one of which is
> > not atomic, and neither happens before the other, except for the
> > special case for signal handlers described below".
> >
> > and
> >
> > "Two expression evaluations conflict if one of them modifies a
> > memory location (4.4) and the other one reads or modifies the same
> > memory location."
> >
> > There is nothing wrong with having a buffer comprising an array of
> > non-atomic ints (or of non-atomic anything else) protected by some form
> > of synchronization such as a mutex or semaphore, including some custom
> > semaphore you have created yourself from memory barriers, atomic values
> > and spinlocks. The question to be asked is whether there is some
> > synchronization which prevents any non-atomic memory location being
> > modified concurrently with a read or another write.
>
> I agree with all that you said above. But that is out of the context of
> my original post which discusses an algorithm which simply does not need
> to "protect" an access to non-atomic data with the standard
> synchronization primitives and achieves the correctness by the very
> different means.

I was covering my bases. Your "Now, of course we do not need our

buffer to be an array of atomic memory cells, our protocol looks to
care well about all the possible conflicts and simultaneous access to

the buffer memory" seemed to involve some synchronization "protocol"
which obviated the need for atomics.

> > What you might be thinking of is avoiding spinlocks on values which are
> > atomic at the hardware level and doing what we used to do in the old
> > (pre-C++11 memory model) days and use volatile built-in types (say,
> > ints) which you know to be atomic at the hardware level, supplemented
> > as necessary by fences to ensure visibility on your platform.
>
> Well, I don't see if this is related to my post either. I don't care
> in my algorithm about spinlocks and if I cared, I would not have a
> problem to make a spinlock around a C++11 atomic data.

See above.

> > But if
> > you can do that you can achieve the identical effect with atomic ints
> > with relaxed memory ordering (including relaxed memory ordering with an
> > external memory barrier - there is a use for that). You can even (I
> > think) have a std::array of std::atomic ints or a std::atomic of
> > std::array of ints with relaxed memory ordering. (In the case of an
> > std::atomic array of ints you must ensure you have relaxed memory
> > ordering set for it, otherwise the implementation will insert a mutex
> > to protect the array, which you don't want).
> >
> > In other words, I think that anything you can do with volatile integer
> > types with additional fences can be done identically with a std::atomic
> > integer types with relaxed memory ordering and additional fences.
>
> Well... It is true I could implement "my" algorithm (known commonly as
> "seqlock") by defining my data buffer as an array of atomic types, say
> atomic<int> or atomic<char> then accessing it with memory_order_relaxed.
>
> But
>
> 1. the point is the standard does no give me any guarantee such an access
> have the same effectiveness as an access to non-atomic memory.

The standard provides no guarantee about volatile either as regards
threads, which is your only alternative if you have no mutual
exclusion and/or semaphores and/or spinlocks and don't trust
std::atomic. However, for built-in types which on the particular
platform in question are atomic at the hardware level, relaxed memory
ordering will involve no synchronization operations in practice.

Would it be nice to have something requiring that in the standard
instead of being left as a quality of implementation issue? Yes, I
think it would.

> 2. Also, I cannot use strcpy() or sprintf() to write into my buffer.

True.

> 3. Also, if my buffer is structured, say, has a format
>
> struct Buffer
> {
> std::atomic<int> a;
> std::atomic<double> ddd;
> }
>
> then access to the "ddd" may demand either mutex/spinlock or "bus locked"
> operation even for memory_order_relaxed model (surprise?) just to
> guarantee the atomicity.

Yes it might, but what alternative is there?

> So, there is no a reasonable and effective implementation here. And the
> cause is our buffer is simply NOT atomic by the algorithm design and
> that we need to make it atomic is the result of C++ memory mode
> limitations.
>
> As for using std::atomic<std::array<int,1000>>, this is simply extremelly
> non-effective, causes massive memory copying and sychronozation
> (mutex/spinlock) and cannot be considered as a resolution candidate at
> all.

Memory copying is something completely orthogonal to your original
posting so I don't really understand what your point here is. (Copying
an array would almost certainly require some explicit locking anyway).
In any event you could use plain arrays and rely on pointer decay to
avoid copying if you want to pass by pointer. Possibly also an array of
std::atomic<int> would suit you better than an atomic array of int. But
as I say I don't understand your point here.

However, the overarching issue is that an array of volatile ints will
have the same advantages and disadvantages as an array of
std::atomic<int> with relaxed memory ordering, save that the first is
not standard conforming (but works) and the second is standard
conforming and also works. The code emitted will be identical.

michael.po...@gmail.com

unread,

Jan 10, 2019, 2:32:06 PM1/10/19

to

On Thursday, January 10, 2019 at 1:46:26 PM UTC-5, Chris Vine wrote:

>
> However, the overarching issue is that an array of volatile ints will
> have the same advantages and disadvantages as an array of
> std::atomic<int> with relaxed memory ordering, save that the first is
> not standard conforming (but works) and the second is standard
> conforming and also works. The code emitted will be identical.

Well, my point was to discuss the C++ Standard, not what can be done if we deviate to non-standard and platform-dependent things. If programming out of the standard, I can try to use "volatiles", rely on atomicity of "int" type, do whatever else possible and then claim that my code is 100% safe after analyzing the disassembly output. Or I can use std::atomic<int> [which may be still a bit awkward and too low-level if I actually have other types of data in my buffer, like doubles] and see it does not produce any additional burden if accessed with relaxed operations on my platform.

If I program based on the C++ standard (and not based on the particular platform features), I cannot rely on that and my point is that C++ standard does not cover well the needs of Seqlock algorithm.

Chris Vine

unread,

Jan 10, 2019, 4:17:47 PM1/10/19

to

Well I think the C++ memory model deals with the issue as well as it can
apart from the point (with which I agree) that perhaps more could be
said in the standard about the requirement for relaxed memory ordering
not to synchronize where the hardware is atomic on the data type in
question.

The Boehm paper you posted was interesting but it came up with two
solutions within the C++ standard, the first of which is the one which
would occur to me (the second was too subtle). The point about the
overconstraining nature of acquire/release atomics comes up in other
contexts. Take this example of the humble use of an atomic flag to
spin on:

std::string A;
std::atomic<bool> Ready(false);

Thread 1:

A = "42";
Ready.store(true, std::memory_order_release);

Thread 2:

while (!Ready.load(std::memory_order_acquire));
std::string B = A; // B is guaranteed to hold "42"

This is correct and looks OK but on every iteration in the while loop a
fence instruction may be emitted, depending on the processor
architecture. You can eliminate that with relaxed ordering and a
separate fence. Here an acquire synchronization is only executed once:

std::string A;
std::atomic<bool> Ready(false);

Thread 1:

A = "42";
std::atomic_thread_fence(std::memory_order_release);
Ready.store(true, std::memory_order_relaxed);

Thread 2:

while (!Ready.load(std::memory_order_relaxed));
std::atomic_thread_fence(std::memory_order_acquire);
std::string B = A; // B is guaranteed to hold "42"

This is basically the same approach as you would take in the days of
being stuck with volatile ints for flags.

hjkh...@gmail.com

unread,

Jan 10, 2019, 5:03:05 PM1/10/19

to

[Michael P. pointed me at this thread. ]

A few comments:

This kind of issues has been discussed repeatedly by WG21/SG1, the concurrency subgroup of the C++ committee. See e.g. wg21.link/P0690 .

Although it's surprising to most people, there are good reasons to require the seqlock data accesses to be atomic:

1) If the reader code does more than just copy data, it usually does care about accesses being indivisible. If I read a pointer and its target in the reader critical section, my code is likely to trap if I read the concatenation of two half-pointers rather than a real pointer. That failure isn't likely on the most mainstream implementations, but it's otherwise allowed by the standard. We can sometimes live with byte-level atomicity, but the code has to be written very defensively. If you're accessing individual scalar or pointer fields, memory_order_relaxed accesses allow you to express your atomicity requirements, and I think they're commonly the right tool.

2) If the accesses are not marked atomic, you're lying to the compiler about what's going on. The consequences of that may be unfortunate. If I write in the reader critical section, where unsigned_int_field and ten_element_array are part of the shared data protected by the seqlock, and tmp is local:

tmp = unsigned_int_field;
if (tmp < 10) {
...
... = ten_element_array[tmp];
}

the compiler may discover that it doesn't have room to keep tmp in a register. It can legitimately (assuming no intervening synchronization) decide that there is no point in saving the old value of tmp on the stack, when it can just be reloaded from unsigned_int_field again. That can result in an out-of-bounds array access and a segmentation fault. You would have discarded the result at the end of the seq_lock reader critical section, but you may not get that far.

Re: replacing acquire loads with explicit fences.
There are cases when that makes sense. But aside from seq_locks they seem to be getting less common. Acquire loads impose less of an ordering constraint than a fence, and thus may be cheaper to implement in hardware. The most recent ARM processors seem to be starting to realize that potential. And on x86, the code is likely to be the same either way.

Chris M. Thomasson

unread,

Jan 10, 2019, 5:17:02 PM1/10/19

to

Wrt to strictly following the C++ standard, a seqlock does need its
state to be comprised of atomics using relaxed memory order. The readers
in such an algorithm simply do not care if there any concurrent writes.
This is because of the way the seqlock algorithm handles its version
numbers. When a reader notices that the version numbers read are valid
wrt the rules of the algorithm itself, then it knows what it read is
100% coherent...

Readers do not give a damn about any concurrent writes.

Readers use the version number scheme to determine if it read a 100%
coherent view of the state.

I agree with Michael.

Chris M. Thomasson

unread,

Jan 10, 2019, 7:12:47 PM1/10/19

to

For some reason, if there is "non-trivial" processing within the "very
chaotic" readers region, _before_ they know if the read data is even in
a coherent 100% atomic state, seems a little sketchy to me, almost
sounds a bit like transactional memory doing IO in within a transaction
region...

Should the state always be a POD? Should readers be generally advised to
only process the whole state, instead of fractions of a state within the
read loop itself?

hjkh...@gmail.com

unread,

Jan 10, 2019, 7:57:45 PM1/10/19

to

On Thursday, January 10, 2019 at 4:12:47 PM UTC-8, Chris M. Thomasson wrote:

I personally think it's desirable to add a version of memcpy that's defined to do a byte-at-a-time, at least optionally memory_order_relaxed, atomic copy. I think most standard memcpy implementations would already be usable as an implementation. And that would solve this problem for the case in which there is no processing inside the reader critical section, and the data is trivially copyable. But so far that's only a personal opinion. And not all the details are clear.

Chris M. Thomasson

unread,

Jan 10, 2019, 11:27:58 PM1/10/19

to

That sort of memcpy would be very useful. And I agree that a lot of
existing memcpy impls should be okay, or rather easily adapted to work.

Scott Lurndal

unread,

Jan 14, 2019, 9:23:02 AM1/14/19

to

Why a memcpy? Just code it directly as an assignment loop. byte-wise bulk copies are horribly
inefficient.

hjkh...@gmail.com

unread,

Jan 14, 2019, 12:34:57 PM1/14/19

to

On Mon, Jan 14, 2019 at 6:23 AM Scott Lurndal wrote:
>
> Why a memcpy? Just code it directly as an assignment loop. byte-wise bulk copies are horribly
> inefficient.

I was talking about semantics, not implementation. Copying with larger granularity, as memcpy implementations usually do, is fine. Taking advantage of compiler knowledge about alignment is also fine. But we would only promise byte-level atomicity.

You can already code it as an assignment loop using memory_order_relaxed atomic accesses. That will become easier in some cases with atomic_ref. Coding it as non-atomic accesses is unlikely to ever be fine. By doing so, you're telling the compiler that it's OK to assume the underlying value won't be changed concurrently by another thread, which is incorrect. I don't think there is much (any?) interest on the committee for defining a "seqlock subset" of C++ that's OK to use in a racey way inside seqlocks. I don't see any way to do that without horribly complicating compiler optimization rules.

But that doesn't necessarily make the resulting code slower, except by disabling compiler optimizations that are likely to be incorrect in this context.

Chris M. Thomasson

unread,

Jan 14, 2019, 4:43:36 PM1/14/19

to

It would be useful, or perhaps more "convenient", within the context of
any type of algorithm where the readers do not care about concurrent
writes because they use other means to determine 100% coherent atomic
reads of a state. This "very special" memcpy would ultimately behave as
if each "unit" was read using std::atomic::load with
std::memory_order_relaxed ordering. Very simple.

<quick pseudo-code, sorry for any typos>
_________________________________
// A readers local view of the state
struct local_view
{
unsigned long a;
short b;
unsigned int c;
};

// The shared state
struct seqlock_user_state
{
std::atomic<unsigned long> a;
std::atomic<short> b;
std::atomic<unsigned int> c;

// Take a local view of the state...
void snapshot(local_view& v) const
{
// Let's read the sucker!
v.a = a.load(std::memory_order_relaxed);
v.b = b.load(std::memory_order_relaxed);
v.c = c.load(std::memory_order_relaxed);
}
};
_________________________________

seqlock_user_state::snapshot works within the readers critical section
wrt seqlock. This does not show the version numbers, but instead focuses
on the state itself.

Chris M. Thomasson

unread,

Jan 14, 2019, 4:53:15 PM1/14/19

to

On 1/14/2019 1:43 PM, Chris M. Thomasson wrote:
> On 1/14/2019 6:22 AM, Scott Lurndal wrote:
>> "Chris M. Thomasson" <invalid_chr...@invalid.invalid> writes:
>>> On 1/10/2019 4:57 PM, hjkh...@gmail.com wrote:
>>>> On Thursday, January 10, 2019 at 4:12:47 PM UTC-8, Chris M.
>>>> Thomasson wrote:
>>>>> On 1/10/2019 2:02 PM, I wrote:
>>>>>> [Michael P. pointed me at this thread. ]

[...]

Now, imagine a memcpy that just copies seqlock_user_state over to a
local_view in one simple shot, yet still had all of the guarantees of
the snapshot function? Imvho, that would be pretty darn convenient, at
least.

hjkh...@gmail.com

unread,

Jan 14, 2019, 8:12:17 PM1/14/19

to

On Monday, January 14, 2019 at 1:53:15 PM UTC-8, Chris M. Thomasson wrote:
>
> Now, imagine a memcpy that just copies seqlock_user_state over to a
> local_view in one simple shot, yet still had all of the guarantees of
> the snapshot function? Imvho, that would be pretty darn convenient, at
> least.
>

The memcpy function I was proposing would do roughly that, but it would not give you the guarantee that the load of e.g. the member "a" is atomic. You could instead read parts of "a" written by different concurrent writes. It would guarantee that each byte read corresponds to a byte of "a" that was actually written at some point.

This greatly reduces the cost on some hardware. And it's sufficient for simple seq_lock critical sections. If the read of "a" reads from multiple writes, you're going to throw the result away anyway.

The actual implementation of this memcpy would read at whatever granularity was convenient for the hardware, just like memcpy does now.

Chris M. Thomasson

unread,

Jan 14, 2019, 11:55:08 PM1/14/19

to

On 1/14/2019 5:12 PM, hjkh...@gmail.com wrote:
> On Monday, January 14, 2019 at 1:53:15 PM UTC-8, Chris M. Thomasson wrote:
>>
>> Now, imagine a memcpy that just copies seqlock_user_state over to a
>> local_view in one simple shot, yet still had all of the guarantees of
>> the snapshot function? Imvho, that would be pretty darn convenient, at
>> least.
>>
> The memcpy function I was proposing would do roughly that, but it would not give you the guarantee that the load of e.g. the member "a" is atomic. You could instead read parts of "a" written by different concurrent writes. It would guarantee that each byte read corresponds to a byte of "a" that was actually written at some point.
>
> This greatly reduces the cost on some hardware. And it's sufficient for simple seq_lock critical sections. If the read of "a" reads from multiple writes, you're going to throw the result away anyway.

Sounds good enough, will work fine for a seq_lock.

> The actual implementation of this memcpy would read at whatever granularity was convenient for the hardware, just like memcpy does now.

Should it be a compiler barrier? I think so. Humm...

Chris M. Thomasson

unread,

Jan 15, 2019, 1:54:24 AM1/15/19

to

memcpy byte-level relaxed atomic load, can work on bytes, or whatever is
fastest for the underlying system.

memcpy member-by-member relaxed atomic load, always works on actual
members just like my simple snapshot example code.

They would be different, but it does not matter wrt just reading in a
seq_lock read-side critical section. However, if there was some
processing within the section, well, the member-by-member load, like my
snapshot might be "better", in a sense.

all are compiler barriers just like a std::atomic operation.

Or, perhaps even a magical atomic_memcpy<std::memory_order> wrt relaxed,
acquire or seq_cst? All of the membars would be compatible with
std::atomic::load...

Chris M. Thomasson

unread,

Jan 18, 2019, 11:18:39 PM1/18/19

to

This discussion is very informative, real, and nice. Thanks again
Michael. It brings back some memories.