On AArch64, what invariant requires publicationBarrier placement in mallocgc?

15 views
Skip to first unread message

Qingwei Li

unread,
8:17 AM (4 hours ago) 8:17 AM
to golang-nuts
The discussion is restricted to AArch64.

Question: On arm64, publicationBarrier in mallocgc is implemented as DMB ST.
What is the invariant that requires it to execute at its current position?

Specifically:
- Must it execute before the allocated object becomes visible to another P/M?
- Must it execute before GC metadata becomes visible?
- Or is it required for maintaining the tri-color invariant under concurrent GC?

My reasoning (please correct me if wrong)

The comment in runtime/stubs.go says that the purpose of publicationBarrier is to ensure that other processors observe the fully initialized object before it becomes reachable from GC.

If that is the case, it seems that as long as:

1) the allocated object is not yet accessible by another goroutine, and
2) the goroutine which does the allocation is not preempted or schedule itself through chanrecv or other operations to another P/M,

then the barrier might be deferrable.

Under this reasoning, it appears possible that a single DMB ST could be shared across multiple consecutive mallocgc calls.

However, I'm unsure whether this reasoning overlooks some GC or scheduler invariants, and that is what I would like to understand.

---

Background:

The current order in mallocgc (simplified) is:

```go
alloc
publicationBarrier   // DMB ST
update GC metadata
```

According to measurements in issue comment https://github.com/golang/go/issues/63640#issuecomment-3661284210, the barrier can account for ~35–40% of mallocgc time on arm64 microbenchmarks.

I experimented with amortizing the barrier across multiple consecutive allocations (i.e., sharing the DMB ST). The design is omitted here for concise question. Microbenchmark results show mixed performance impact:

```
goos: linux
goarch: arm64
pkg: runtime
                     │ default.txt │              batch.txt              │
                     │   sec/op    │   sec/op     vs base                │
Malloc8-64             22.11n ± 0%   21.82n ± 0%   -1.31% (p=0.000 n=10)
Malloc16-64            38.79n ± 0%   33.76n ± 0%  -12.98% (p=0.000 n=10)
MallocTypeInfo8-64     28.49n ± 0%   31.37n ± 0%  +10.11% (p=0.000 n=10)
MallocTypeInfo16-64    38.19n ± 0%   39.57n ± 0%   +3.61% (p=0.000 n=10)
MallocLargeStruct-64   417.9n ± 1%   400.8n ± 1%   -4.10% (p=0.000 n=10)
geomean                52.27n        51.62n        -1.24%
```

However, my main concern is correctness: I would like to understand the exact memory-ordering guarantee enforced by this barrier on AArch64.

Thanks.
Reply all
Reply to author
Forward
0 new messages