Clarification of memory model behavior within a single goroutine

Peter Rabbitson

unread,

Jan 21, 2023, 12:36:20 PM1/21/23

to golan...@googlegroups.com, Filippo Valsorda

Greetings,

I am trying to understand the exact mechanics of memory write ordering from within the same goroutine. I wrote a self-contained runnable example with the question inlined here: https://go.dev/play/p/ZXMg_Qq3ygF and am copying its header here:

// Below is a complete example, with the question starting on line 38:
// how do I ensure that a *separate Linux OS process* observing `IPCfile`
// (either via pread() or mmap()) can *NEVER* observe W2 before W1.
// The only permissible states are:
// 1. no changes visible
// 2. only W1 is visible
// 3. both W1 and W2 are visible

I did read through https://go.dev/ref/mem and https://github.com/golang/go/discussions/47141 + links, but could not find a definitive answer to my specific use-case.

Would really appreciate any help getting to the bottom of this!

burak serdar

unread,

Jan 21, 2023, 1:48:12 PM1/21/23

to Peter Rabbitson, golan...@googlegroups.com, Filippo Valsorda

On Sat, Jan 21, 2023 at 10:36 AM Peter Rabbitson <riba...@gmail.com> wrote:

Greetings,

I am trying to understand the exact mechanics of memory write ordering from within the same goroutine. I wrote a self-contained runnable example with the question inlined here: https://go.dev/play/p/ZXMg_Qq3ygF and am copying its header here:

// Below is a complete example, with the question starting on line 38:
// how do I ensure that a *separate Linux OS process* observing `IPCfile`
// (either via pread() or mmap()) can *NEVER* observe W2 before W1.
// The only permissible states are:
// 1. no changes visible
// 2. only W1 is visible
// 3. both W1 and W2 are visible

This is based on my interpretation of the go memory model:

Atomic memory operations are sequentially consistent, so here:

(*mmapBufAtomic.Load())[fSize-1] = 255 // W1
(*mmapBufAtomic.Load())[0] = 42 // W2

The first atomic load happens before the second load. That also implies the first write (W1) happens before the second (W2). However, there is no guarantee that W2 will be observed by another goroutine.

I think what is really needed here is an atomic store byte operation. If this is the only goroutine writing to this buffer, you can emulate that by atomic.LoadUint32, set the highest/lowest byte, then atomic.StoreUint32

I did read through https://go.dev/ref/mem and https://github.com/golang/go/discussions/47141 + links, but could not find a definitive answer to my specific use-case.

Would really appreciate any help getting to the bottom of this!

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAMrvTSKXb5JQMR9PcCXwYhcT4rq8O_5hiTHrOChk6sUeOrbagw%40mail.gmail.com.

Peter Rabbitson (ribasushi)

unread,

Jan 21, 2023, 2:11:30 PM1/21/23

to golang-nuts

On Saturday, January 21, 2023 at 7:48:12 PM UTC+1 bse...@computer.org wrote:

On Sat, Jan 21, 2023 at 10:36 AM Peter Rabbitson <riba...@gmail.com> wrote:

Greetings,

I am trying to understand the exact mechanics of memory write ordering from within the same goroutine. I wrote a self-contained runnable example with the question inlined here: https://go.dev/play/p/ZXMg_Qq3ygF and am copying its header here:

// Below is a complete example, with the question starting on line 38:
// how do I ensure that a *separate Linux OS process* observing `IPCfile`
// (either via pread() or mmap()) can *NEVER* observe W2 before W1.
// The only permissible states are:
// 1. no changes visible
// 2. only W1 is visible
// 3. both W1 and W2 are visible

This is based on my interpretation of the go memory model:

Atomic memory operations are sequentially consistent, so here:

(*mmapBufAtomic.Load())[fSize-1] = 255 // W1
(*mmapBufAtomic.Load())[0] = 42 // W2

The first atomic load happens before the second load. That also implies the first write (W1) happens before the second (W2). However, there is no guarantee that W2 will be observed by another goroutine.

This is perfectly acceptable ( see point 2. above ). Also note that there is no other goroutine that is looking at this: the observers are separate ( possibly not even go-based ) OS processes. I am strictly trying to get to a point where the writer process exemplified in the playground will issue the CPU write instructions in the order I expect.

I think what is really needed here is an atomic store byte operation. If this is the only goroutine writing to this buffer, you can emulate that by atomic.LoadUint32, set the highest/lowest byte, then atomic.StoreUint32

This would not be viable: the W1 write is a single byte for the sake of brevity. In practice it will be a multi-GiB write, with a multi-KiB write following it, followed by a single-UInt write. All part of a lock-free "ratcheted" transactional implementation, allowing for incomplete writes, but no dirty reads - the "root pointer" is the last thing being updated, so an observer process sees "old state" or "new state" and nothing inbetween. This is why my quest to understand the precise behavior and guarantees of the resulting compiled program.

Peter Rabbitson

unread,

Jan 21, 2023, 2:24:32 PM1/21/23

to burak serdar, golan...@googlegroups.com, Filippo Valsorda

( apologies for the previous mangled message, re-posting from a saner UI )

On Sat, Jan 21, 2023 at 7:47 PM burak serdar <bse...@computer.org> wrote:

On Sat, Jan 21, 2023 at 10:36 AM Peter Rabbitson <riba...@gmail.com> wrote:
Greetings,

I am trying to understand the exact mechanics of memory write ordering from within the same goroutine. I wrote a self-contained runnable example with the question inlined here: https://go.dev/play/p/ZXMg_Qq3ygF and am copying its header here:

// Below is a complete example, with the question starting on line 38:
// how do I ensure that a *separate Linux OS process* observing `IPCfile`
// (either via pread() or mmap()) can *NEVER* observe W2 before W1.
// The only permissible states are:
// 1. no changes visible
// 2. only W1 is visible
// 3. both W1 and W2 are visible

This is based on my interpretation of the go memory model:

Atomic memory operations are sequentially consistent, so here:

(*mmapBufAtomic.Load())[fSize-1] = 255 // W1
(*mmapBufAtomic.Load())[0] = 42 // W2

The first atomic load happens before the second load. That also implies the first write (W1) happens before the second (W2). However, there is no guarantee that W2 will be observed by another goroutine.

This is perfectly acceptable ( see point 2. above ). Also note that there is no other goroutine that is looking at this: the observers are separate ( possibly not even go-based ) OS processes. I am strictly trying to get to a point where the writer process exemplified in the playground will issue the CPU write instructions in the order I expect.

I think what is really needed here is an atomic store byte operation. If this is the only goroutine writing to this buffer, you can emulate that by atomic.LoadUint32, set the highest/lowest byte, then atomic.StoreUint32

burak serdar

unread,

Jan 21, 2023, 4:46:09 PM1/21/23

to Peter Rabbitson (ribasushi), golang-nuts

You realize, if W1 is a multi-GB write, another process will observe partial writes for W1. But, I believe, if another process observes W2, then it is guaranteed that all of W1 is written.

I think the Go memory model does not really apply here, because you are talking about other processes reading shared memory. What you are really relying on is that on x86, there will be a memory barrier associated with atomic loads. I don't know how this works on arm. I am not sure how portable this solution would be. The memory model is explicit about observing the effects of an atomic write operation, and sequential consistency of atomic memory operations. So it sounds like an unprotected W1 followed by an atomic store of W2 would still work the same way.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/c23d512e-a307-4f4d-bf23-74398c5cf42bn%40googlegroups.com.

Keith Randall

unread,

Jan 21, 2023, 8:31:37 PM1/21/23

to golang-nuts

On the write side, you write your mult-GB data using normal writes, then atomic.Store for the final flag uint. On the read side, you use an atomic.Load for the flag uint followed by regular loads for the remaining multi-GB of data.

Reading a particular flag value ensures that the following loads see all the writes from before the writer wrote that particular flag value. This is guaranteed by the memory model, as the atomic read seeing the atomic write introduces the synchronized-before edge you need.

I agree that the Go memory model doesn't directly address multi-process communication like this, but assuming both ends are Go this is guaranteed to work by the Go memory model. YMMV on what operations/barriers/etc. you need in other languages.

Peter Rabbitson (ribasushi)

unread,

Jan 22, 2023, 2:12:11 AM1/22/23

to golang-nuts

This question is focused exclusively on the writer side.

So are you saying that this will also work (based on https://go.dev/play/p/ZXMg_Qq3ygF )
mmapBufRaw[fSize-1] = 255 // W1

(*mmapBufAtomic.Load())[0] = 42 // W2

How about this, would that work as a "everything before the atomic. has to appear as a CPU instruction 1st ?
mmapBufRaw[fSize-1] = 255 // W1

atomic.LoadInt64(randomVal) // any atomic access acts as barrier
mmapBufRaw[0] = 42 // W2

This is the exact mechanism I am trying to understand - what is the minimum that golang needs to guarantee "as written" order synchronization, within a specific single goroutine.

Ian Lance Taylor

unread,

Jan 22, 2023, 11:50:16 AM1/22/23

to Peter Rabbitson (ribasushi), golang-nuts

On Sat, Jan 21, 2023, 11:12 PM Peter Rabbitson (ribasushi) <riba...@gmail.com> wrote:

This question is focused exclusively on the writer side.

Perhaps I misunderstand, but it doesn't make sense to ask a question about the memory model only about one side or the other. The memory model is about communication between two goroutines. It has very little to say about the behavior of a single goroutine.

Ian

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/4c8feb7a-f2d9-4392-ad1d-f72253ccafd7n%40googlegroups.com.

Peter Rabbitson

unread,

Jan 22, 2023, 12:12:50 PM1/22/23

to Ian Lance Taylor, golang-nuts

On Sun, Jan 22, 2023 at 5:49 PM Ian Lance Taylor <ia...@golang.org> wrote:

On Sat, Jan 21, 2023, 11:12 PM Peter Rabbitson (ribasushi) <riba...@gmail.com> wrote:
This question is focused exclusively on the writer side.

Perhaps I misunderstand, but it doesn't make sense to ask a question about the memory model only about one side or the other. The memory model is about communication between two goroutines. It has very little to say about the behavior of a single goroutine.

I might be using the wrong term then, although a lot of text in https://go.dev/ref/mem is relevant, it just does not answer my very specific question. Let me try from a different angle:

I want to write a single-threaded program in go which once compiled has a certain behavior from the point of view of the OS, . More specifically this program, from the Linux OS point of view should in very strict order:

1. grab a memory region
2. write an arbitrary amount of data to the region's end
3. write some more data to the region start

By definition 1) will happen first ( you got to grab in order to write ), but it is also critical that the program does all of 2), before it starts doing 3).

Modern compilers are wicked smart, and often do emit assembly that would have some of 3) interleaved with 2). Moreover, due to kernel thread preemption 2) and 3) could execute simultaneously on different physical CPUs, requiring a NUMA-global synchronization signal to prevent this ( I believe it is LOCK on x86 ). I am trying to understand how to induce all of this ordering from within a go program in the most lightweight manner possible ( refer to the problem statement and attempts starting on line 38 at https://go.dev/play/p/ZXMg_Qq3ygF )

If I were in C-land I believe would use something like:

#include <asm/system.h>

void wmb(void);

The question is - what is the go equivalent.

robert engels

unread,

Jan 22, 2023, 1:40:23 PM1/22/23

to Peter Rabbitson, Ian Lance Taylor, golang-nuts

The atomic store will force a memory barrier - as long as the reader (in the other process) atomically reads the “new value”, all other writes prior will also be visible.

BUT you can still have an inter-process race condition if you are updating the same memory mapped file regions - and you need an OS mutex to protect against this, or use other advanced append/sequence number techniques.

You can look at projects like https://github.com/OpenHFT/Chronicle-Queue for ideas.

Still, large-scale shared memory systems are usually not required. I would use a highly efficient message system like Nats.io and not reinvent the wheel. Messaging systems are also far more flexible.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAMrvTSL%2B5XO%3Dz6815RN6sNxLJsce%2BJsyRYm3zq85NiA6L9O_cA%40mail.gmail.com.

Peter Rabbitson

unread,

Jan 22, 2023, 1:54:27 PM1/22/23

to robert engels, Ian Lance Taylor, golang-nuts

On Sun, Jan 22, 2023 at 7:39 PM robert engels <ren...@ix.netcom.com> wrote:

The atomic store will force a memory barrier - as long as the reader (in the other process) atomically reads the “new value”, all other writes prior will also be visible.

Could you translate this to specific go code? What would constitute what you called "the atomic store" in the playground example I gave?

BUT you can still have an inter-process race condition if you are updating the same memory mapped file regions - and you need an OS mutex to protect against this

Correct. This particular system is multiple-reader single-threaded-writer, enforced by a Fcntl POSIX advisory lock. Therefore as long as I make the specific writer consistent - I am done.

You can look at projects like https://github.com/OpenHFT/Chronicle-Queue for ideas.

Still, large-scale shared memory systems are usually not required. I would use a highly efficient message system like Nats.io and not reinvent the wheel. Messaging systems are also far more flexible.

Nod, the example you linked is vaguely in line with what I want. You are also correct that reinventing a wheel is bad form, and is to be avoided at all costs. Yet the latency sensitivity of the particular IPC unfortunately does call for an even rounder wheel. My problem isn't about "what to do" nor "is there another way", but rather "how do I do this from within the confines of go".

robert engels

unread,

Jan 22, 2023, 6:42:26 PM1/22/23

to Peter Rabbitson, Ian Lance Taylor, golang-nuts

Write data to memory mapped file/shared memory. Keep track of last written byte as new_length;

Use atomic.StoreUint64(pointer to header.length, new_length);

readers read header.length atomically to determine the last valid byte (using whatever facilities their language has).

A reader then knows that bytes up to header.length are valid to consume.

This assumes you are always appending to the buffer - never reusing the earlier buffer space. If you desire to do this, then it is much more complicated as you need to determine that all readers have consumed the data before the writer reuses it.

The above must work in order for Go to have a happens before relationship with the atomics - all writes must be visible to a reader that see the updated value in the header.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAMrvTS%2BHqsqCOMay3c8D5LuTwcmtuZQJY7gs8Rw5rXBLiYwErg%40mail.gmail.com.

Peter Rabbitson

unread,

Jan 22, 2023, 9:12:57 PM1/22/23

to robert engels, Ian Lance Taylor, golang-nuts

On Mon, Jan 23, 2023 at 12:42 AM robert engels <ren...@ix.netcom.com> wrote:

Write data to memory mapped file/shared memory. Keep track of last written byte as new_length;

Use atomic.StoreUint64(pointer to header.length, new_length);

This does not answer the question I posed, which boils down to:

How does one insert the equivalent of smp_wmb() / asm volatile("" ::: "memory") into a go program.

For instance is any of these an answer? https://groups.google.com/g/golang-nuts/c/tnr0T_7tyDk/m/9T2BOvCkAQAJ

readers read ...

Please don't focus on the reader ;)

This assumes you are always appending ,,, then it is much more complicated ... all readers have consumed the data before the writer reuses it.

Yes, it is much more complicated :) I am making a note to post the result back to this thread in a few weeks when it is readable enough.

Robert Engels

unread,

Jan 22, 2023, 9:38:15 PM1/22/23

to Peter Rabbitson, Ian Lance Taylor, golang-nuts

The atomic functions force a memory barrier when atomically in conjunction with the atomic read of the same value.

You could use CGO to call a C function to do what you desire - but it shouldn’t be necessary.

Not sure what else I can tell you.

On Jan 22, 2023, at 8:12 PM, Peter Rabbitson <riba...@gmail.com> wrote:

David Klempner

unread,

Jan 22, 2023, 11:24:04 PM1/22/23

to Robert Engels, Peter Rabbitson, Ian Lance Taylor, golang-nuts

On Mon, Jan 23, 2023 at 11:38 AM Robert Engels <ren...@ix.netcom.com> wrote:

The atomic functions force a memory barrier when atomically in conjunction with the atomic read of the same value.

You could use CGO to call a C function to do what you desire - but it shouldn’t be necessary.

C doesn't solve this problem, unless your "C function" is something like an "asm volatile". C has the exact same underlying issue as Go here -- the language does not define interprocess atomics, because doing so is hard.

In practice you can probably do the equivalent of a sequentially consistent store on the writer side and a sequentially consistent read on the other, and things will likely work.

With that said, that doesn't strictly work even for two C programs. As a concrete example, if you're implementing sequentially consistent atomics on x86, while read-modify-write instructions don't need any extra fencing, you have to choose whether to make reads expensive (by adding a fence on reads) or writes expensive (by adding a fence on writes). While in practice every implementation chooses to make the writes expensive, that is an implementation decision and nothing stops someone from making a reads-expensive-writes-cheap compiler.

If a "writes cheap compiler" program does an atomic write over shared memory to a "reads cheap compiler" program, there won't be any fences on either side and you won't get sequential consistent synchronization. (That's fine for a simple producer-consumer queue as in this example which only requires release-acquire semantics, but fancier algorithms that rely on sequential consistency will be quite unhappy.)

Not sure what else I can tell you.

On Jan 22, 2023, at 8:12 PM, Peter Rabbitson <riba...@gmail.com> wrote:

On Mon, Jan 23, 2023 at 12:42 AM robert engels <ren...@ix.netcom.com> wrote:
Write data to memory mapped file/shared memory. Keep track of last written byte as new_length;

Use atomic.StoreUint64(pointer to header.length, new_length);

This does not answer the question I posed, which boils down to:

How does one insert the equivalent of smp_wmb() / asm volatile("" ::: "memory") into a go program.

For instance is any of these an answer? https://groups.google.com/g/golang-nuts/c/tnr0T_7tyDk/m/9T2BOvCkAQAJ

readers read ...

Please don't focus on the reader ;)

This assumes you are always appending ,,, then it is much more complicated ... all readers have consumed the data before the writer reuses it.

Yes, it is much more complicated :) I am making a note to post the result back to this thread in a few weeks when it is readable enough.

--

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/5F73A3C7-32C3-42A1-91A2-E1A0714FAEA5%40ix.netcom.com.

Ian Lance Taylor

unread,

Jan 23, 2023, 1:05:03 AM1/23/23

to Peter Rabbitson, golang-nuts

Memory ordering only makes sense in terms of two different execution
threads using shared memory. In order to answer your question
precisely, you need to tell us what the process reading the memory
region is going to do to access the memory. In order to know how to
write the memory, it's necessary to know how the memory is going to be
read.

That said, it's fairly likely that if you use an atomic store for the
very first memory write in step 3 that the right thing will happen.
On the Go side, that will ensure that all memory operations before
that memory write have at least been executed. And if the reader does
an atomic read, it should ensure that when the reader sees that memory
write, it will also see all the earlier memory writes.

>
> If I were in C-land I believe would use something like:
>
> #include <asm/system.h>
>
> void wmb(void);
>
>
> The question is - what is the go equivalent.

There is no Go equivalent to a pure write memory barrier. Of course
you could use cgo to call a C function that does exactly what you
want.

Ian

Peter Rabbitson

unread,

Jan 23, 2023, 2:36:56 AM1/23/23

to Ian Lance Taylor, golang-nuts

On Mon, Jan 23, 2023 at 7:04 AM Ian Lance Taylor <ia...@golang.org> wrote:

Memory ordering only makes sense in terms of two different execution
threads using shared memory. In order to answer your question
precisely, you need to tell us what the process reading the memory
region is going to do to access the memory. In order to know how to
write the memory, it's necessary to know how the memory is going to be
read.

That's a fair point. I avoided going into details not to risk tickling latent design-urges of the readers ;)

Setup:

- Single-writer multiple-readers scenario

- Writer is always exclusively single threaded, no concurrency whatsoever. Only possible sources of operation reordering are: a) the discrete CPU execution pipeline b) the compiler itself c) OS preemption/ SMP migration.

- Communication is over a single massive mmaped file-backed region.

- Exploits the fact that on Linux the VFS cache in front of the named file and the mmaped "window" within every process are all literally the same kernel memory.

- Communication is strictly one-way: writer does not know nor care about the amount of readers, what are they looking at, etc.

- Readers are expected to accommodate above, be prepared to look at stale data, etc

- For simplicity assume that the file/mmap is of unreachable size ( say 1PiB ) and that additions are all appends, with no garbage collection - stale data which is not referenced by anything just sticks around indefinitely.

Writer pseudocode ( always only one thread, has exclusive write access )

1. Read current positioning from mmap offset 0 - no locks needed since I am the one who modified things last

2. Do the payload writes, several GiB append within the unused portion of the mmap

3. Writeout necessary indexes and pointers to the contents of 2, another append this time several KiB

4. {{ MY QUESTION }} Emit a SFENCE/LOCK(amd64) or DMB(arm64) to ensure sequencing consistency and that all CPUs see the same state of the kernel memory backing the mmap

5. Write a single uint64 at mmap offset 0, pointing to the new "state of the world" written during 3. which in turn points at various pieces of data written in 2.

6. goto 1

Readers pseudocode ( many readers, various implementation languages not just go, utterly uncoordinated, happy to see "old transaction", but expect 5 => 3 => 2 to be always consistent )

1. Read current positioning from mmap offset 0 - no locks as I am equally happy to see the new or old uint64. I do assume that a word-sized read is always atomic, and I won't see a "torn" u64

2. Walk around either the new or old network of pointers. The barrier 4. in the writer ensures I can't see a pointer to something that doesn't yet exist.

The end.

There is no Go equivalent to a pure write memory barrier.

Ian, I recognize I am speaking to one of the language creators and that you know *way* more than me about this subject. Nevertheless I find it really hard to accept your statement. There got to be a set of constructs that have the desired side-effects described in 4 above. I also still maintain that the memory model should discuss this, in the compilation guarantees section at the bottom. After all a standalone go program is nothing more than a list of instructions for a CPU mediated by an OS. The precise sequencing of these instructions in special circumstances should be clear/controllable.

I guess I will spend some time to learn how to poke around the generated assembly tomorrow...

fge...@gmail.com

unread,

Jan 23, 2023, 5:13:20 AM1/23/23

to Peter Rabbitson, Ian Lance Taylor, golang-nuts

On 1/23/23, Peter Rabbitson <riba...@gmail.com> wrote:
...

> I guess I will spend some time to learn how to poke around the generated
> assembly tomorrow...

If I understand correctly you are trying to force your model of the
world into the Go memory model. The models are different, so this
won't work.

Please also note that your model of current execution complexes is
probably valid today, but it could change anytime. The Go memory model
is differently restricting to accommodate for that future.

Of course you can implement what you want using any tool available,
but the correct execution can't be ensured by the Go memory model if
you don't build on that.

Keith Randall

unread,

Jan 23, 2023, 11:57:37 AM1/23/23

to golang-nuts

Just to be clear, to get what you want just write data normally for steps 1-4 and use an atomic store for step 5. That guarantees that other processes will see steps 1-4 all done if they see the write from step 5. (But you *do* have to use an atomic read appropriate to your language to do reader step 1. Just a standard read will not do.)

Go does not provide separate "pure" memory barriers. The compiler and/or runtime include them when needed to ensure the required semantics for locks, atomic operations, etc.

Ian Lance Taylor

unread,

Jan 23, 2023, 4:54:51 PM1/23/23

to Peter Rabbitson, golang-nuts

On Sun, Jan 22, 2023 at 11:36 PM Peter Rabbitson <riba...@gmail.com> wrote:
>
> That's a fair point. I avoided going into details not to risk tickling latent design-urges of the readers ;)
>
> Setup:
> - Single-writer multiple-readers scenario
> - Writer is always exclusively single threaded, no concurrency whatsoever. Only possible sources of operation reordering are: a) the discrete CPU execution pipeline b) the compiler itself c) OS preemption/ SMP migration.
> - Communication is over a single massive mmaped file-backed region.
> - Exploits the fact that on Linux the VFS cache in front of the named file and the mmaped "window" within every process are all literally the same kernel memory.
> - Communication is strictly one-way: writer does not know nor care about the amount of readers, what are they looking at, etc.
> - Readers are expected to accommodate above, be prepared to look at stale data, etc
> - For simplicity assume that the file/mmap is of unreachable size ( say 1PiB ) and that additions are all appends, with no garbage collection - stale data which is not referenced by anything just sticks around indefinitely.
>
> Writer pseudocode ( always only one thread, has exclusive write access )
> 1. Read current positioning from mmap offset 0 - no locks needed since I am the one who modified things last
> 2. Do the payload writes, several GiB append within the unused portion of the mmap
> 3. Writeout necessary indexes and pointers to the contents of 2, another append this time several KiB
> 4. {{ MY QUESTION }} Emit a SFENCE/LOCK(amd64) or DMB(arm64) to ensure sequencing consistency and that all CPUs see the same state of the kernel memory backing the mmap
> 5. Write a single uint64 at mmap offset 0, pointing to the new "state of the world" written during 3. which in turn points at various pieces of data written in 2.
> 6. goto 1
>
> Readers pseudocode ( many readers, various implementation languages not just go, utterly uncoordinated, happy to see "old transaction", but expect 5 => 3 => 2 to be always consistent )
> 1. Read current positioning from mmap offset 0 - no locks as I am equally happy to see the new or old uint64. I do assume that a word-sized read is always atomic, and I won't see a "torn" u64
> 2. Walk around either the new or old network of pointers. The barrier 4. in the writer ensures I can't see a pointer to something that doesn't yet exist.
>
> The end.
>
>>
>> There is no Go equivalent to a pure write memory barrier.
>
>
> Ian, I recognize I am speaking to one of the language creators and that you know *way* more than me about this subject. Nevertheless I find it really hard to accept your statement. There got to be a set of constructs that have the desired side-effects described in 4 above. I also still maintain that the memory model should discuss this, in the compilation guarantees section at the bottom. After all a standalone go program is nothing more than a list of instructions for a CPU mediated by an OS. The precise sequencing of these instructions in special circumstances should be clear/controllable.
>
> I guess I will spend some time to learn how to poke around the generated assembly tomorrow...

There is a model of memory behavior in which programs use pure write
memory barriers and pure read memory barriers. However, Go does not
use that model. Go uses a different model, in which writers and
readers are expected to cooperate using atomic loads and stores.

As such, Go does not provide a pure write memory barrier. I promise.
(As I noted earlier, Go does permit calling into C, and therefore
permits you to do anything that C permits you to do. It's worth
noting that because C code can do anything, a call into a C function
is a full compiler (but not hardware) memory barrier.)

Ian

Peter Rabbitson

unread,

Jan 23, 2023, 5:20:24 PM1/23/23

to Ian Lance Taylor, golang-nuts

On Mon, Jan 23, 2023 at 10:54 PM Ian Lance Taylor <ia...@golang.org> wrote:

There is a model of memory behavior in which programs use pure write
memory barriers and pure read memory barriers. However, Go does not
use that model. Go uses a different model, in which writers and
readers are expected to cooperate using atomic loads and stores.

Understood. Thank you (and Keith) for stating this enough times that it clicked for me: it's not that what I am trying to do is impossible, but rather there is nothing in go that exposes these types of barriers to userspace by design. Having this mentioned explicitly in the memory model doc would have helped me build the correct frame of reference early on.

As such, Go does not provide a pure write memory barrier. I promise.
(As I noted earlier, Go does permit calling into C, and therefore
permits you to do anything that C permits you to do. It's worth
noting that because C code can do anything, a call into a C function
is a full compiler (but not hardware) memory barrier.)

The CGO compile-time ergonomics overhead is a bit of a concern in my case. Does calling into non-inlineable(?) go-asm functions achieve the same compiler-ordering-barrier? I.e. is the following kosher / reasonably forward-compatible: https://stackoverflow.com/q/42911884 ?

Thank you!

Ian Lance Taylor

unread,

Jan 23, 2023, 5:46:18 PM1/23/23

to Peter Rabbitson, golang-nuts

Yes, calling an assembler function will also give you a compiler (not
hardware) memory barrier. Something along the lines of that Stack
Overflow question ought to work OK.

Ian

Reply all

Reply to author

Forward