Alignment guarantees of heap allocations

251 views
Skip to first unread message

Timo Beckers

unread,
Oct 10, 2024, 8:23:36 AM10/10/24
to golang-nuts
Hello,

I've been searching around for some info or existing conversations around this topic, but that hasn't turned up anything useful so far. I had a question around some implicit behaviour of Go's heap allocator.

I'm working on implementing BPF map operations through direct shared memory access (without going through syscalls). To make the API somewhat ergonomic, I've settled on mmapping BPF map (kernel) memory over a part of the Go heap using MAP_FIXED. Feel free to tell me how bad of an idea this is, I can elaborate if needed.

In order for this to work and to minimize allocations, I need a heap allocation that starts on a page boundary. Initially, I experimented with //go:linkname'ing mallocgc(), but I figured allocating a regular slice the size of a page has basically the same effect. Here's a playground link: https://go.dev/play/p/ua2NJ-rEIlC. As long as the slice/backing array is a multiple of the architecture's page size, it seems to start on a page boundary. I've tried allocating a bunch of ballast, forcing GCs, etc. and it hasn't failed once.

Here's my question: which property of the Go allocator is this banking on? Intuitively, this makes sense to me. An 8-byte alloc needs to be 8-byte aligned in case it's a pointer. Does a 4k allocation need to be 4k-aligned as well (e.g. in case it's a struct), since a compiler would align members to the start of the struct? I'm reading larger (> 8 or 16 bytes depending on arch?) allocs have an alignment of 1 byte, and unsafe.Alignof([4096]byte) tells me the same, but that's not the behaviour exhibited by the allocator.

So, is this just luck or is this behaviour we can rely on?

Thanks,

Timo

Robert Engels

unread,
Oct 10, 2024, 8:35:32 AM10/10/24
to Timo Beckers, golang-nuts
I’m not going to tell you how bad an idea it is… 

But I would like to ask ”why”? Why not just map the address using mmap - why does it need to be in the heap memory space?

On Oct 10, 2024, at 7:23 AM, 'Timo Beckers' via golang-nuts <golan...@googlegroups.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/94808426-074f-4cb8-b647-f47664aed273n%40googlegroups.com.

Jan Mercl

unread,
Oct 10, 2024, 8:48:40 AM10/10/24
to Timo Beckers, golang-nuts
On Thu, Oct 10, 2024 at 2:23 PM 'Timo Beckers' via golang-nuts
<golan...@googlegroups.com> wrote:

> I've been searching around for some info or existing conversations around this topic, but that hasn't turned up anything useful so far. I had a question around some implicit behaviour of Go's heap allocator.

From the specs point of view, there's no heap and no allocator. The
language specification covers allocation at
https://go.dev/ref/spec#Allocation, but the allocator is an
implementation detail with no particular guarantees.

> I'm working on implementing BPF map operations through direct shared memory access (without going through syscalls). To make the API somewhat ergonomic, I've settled on mmapping BPF map (kernel) memory over a part of the Go heap using MAP_FIXED. Feel free to tell me how bad of an idea this is, I can elaborate if needed.

I'm not sure what exactly do you mean by "over" as there is more than
one possibility I can think of. Go managed memory and manually mmaped
memory can intermix under certain conditions. One of them is that the
runtime must know which is which, ie. be able to distinguish pointers
to memory allocated by itself from memory allocated by something else
- manually mmaped, C allocated etc.

> In order for this to work and to minimize allocations, I need a heap allocation that starts on a page boundary. Initially, I experimented with //go:linkname'ing mallocgc(), but I figured allocating a regular slice the size of a page has basically the same effect. Here's a playground link: https://go.dev/play/p/ua2NJ-rEIlC. As long as the slice/backing array is a multiple of the architecture's page size, it seems to start on a page boundary. I've tried allocating a bunch of ballast, forcing GCs, etc. and it hasn't failed once.

Those are guarantees possibly provided by the kernel. No promises from Go.

> Here's my question: which property of the Go allocator is this banking on? Intuitively, this makes sense to me. An 8-byte alloc needs to be 8-byte aligned in case it's a pointer. Does a 4k allocation need to be 4k-aligned as well (e.g. in case it's a struct), since a compiler would align members to the start of the struct? I'm reading larger (> 8 or 16 bytes depending on arch?) allocs have an alignment of 1 byte, and unsafe.Alignof([4096]byte) tells me the same, but that's not the behaviour exhibited by the allocator.

Implementation details that can change. You cannot rely on it. You
have to code it manually and look for platform details to work out.

-j

Timo Beckers

unread,
Oct 10, 2024, 10:22:48 AM10/10/24
to golang-nuts
On Thu, 10 Oct 2024 at 14:34, Robert Engels <ren...@ix.netcom.com> wrote:
>
> I’m not going to tell you how bad an idea it is…
>
> But I would like to ask ”why”? Why not just map the address using mmap - why does it need to be in the heap memory space?

Ha! I had a feeling someone would bite. :) The full explanation is in
memory.go in this PR: https://github.com/cilium/ebpf/pull/1572/files.

Essentially, what this does is:
- create bpf map
- allocate a slice on the Go heap
- mmap the contents of the bpf map (by fd) over this slice's backing array
- set finalizer on the backing array to undo
(mmap(MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS)) the mapping and patch the
hole in the slab
- synthesize Go objects around parts of this mmapped backing array, in
particular sync/atomic types and structs

This has the nice property of being completely hands-off wrt. resource
management.
Since the GC tracks any pointers into this part of the heap, it will
call the finalizer for
us when all pointers into the region have disappeared.

If we'd instead create this mapping off-heap, we'd need to tie its
lifecycle to a 'proxy' Go object/accessor.
The first version of this implementation featured a Memory object that
received this finalizer calling the 'undo' mmap(),
But this proved dangerous since access to the underlying memory took
place through one of those synthesized
atomic.Uint64s that were not tracked by the GC. For example:

```
type Variable struct {
*atomic.Uint64
m *Memory
}
```

Technically, when calling Variable.Uint64.Add(), the Variable and its
underlying Memory can become unreachable
before the atomic access takes place, which is pretty disastrous as it
would cause an access to unmapped memory.
Any such call would need to be followed by a runtime.Keepalive(v) to
be perfectly safe, which doesn't make for great API.

Hopefully this response satisfies your curiosity!

Timo

Timo Beckers

unread,
Oct 10, 2024, 10:32:01 AM10/10/24
to Jan Mercl, golang-nuts
On Thu, 10 Oct 2024 at 14:48, Jan Mercl <0xj...@gmail.com> wrote:
>
> On Thu, Oct 10, 2024 at 2:23 PM 'Timo Beckers' via golang-nuts
> <golan...@googlegroups.com> wrote:
>
> > I've been searching around for some info or existing conversations around this topic, but that hasn't turned up anything useful so far. I had a question around some implicit behaviour of Go's heap allocator.
>
> From the specs point of view, there's no heap and no allocator. The
> language specification covers allocation at
> https://go.dev/ref/spec#Allocation, but the allocator is an
> implementation detail with no particular guarantees.

Thanks, I didn't realize this was mentioned in the spec. I guess that
puts an end to the conversation immediately. :)
I'll allocate an extra page to make sure we can always align to a page boundary.

>
> > I'm working on implementing BPF map operations through direct shared memory access (without going through syscalls). To make the API somewhat ergonomic, I've settled on mmapping BPF map (kernel) memory over a part of the Go heap using MAP_FIXED. Feel free to tell me how bad of an idea this is, I can elaborate if needed.
>
> I'm not sure what exactly do you mean by "over" as there is more than
> one possibility I can think of. Go managed memory and manually mmaped
> memory can intermix under certain conditions. One of them is that the
> runtime must know which is which, ie. be able to distinguish pointers
> to memory allocated by itself from memory allocated by something else
> - manually mmaped, C allocated etc.
>

Precisely, we want this memory to be both tracked by the GC _and_
backed by shared kernel memory.
I can only hope this isn't too common. :)

> > In order for this to work and to minimize allocations, I need a heap allocation that starts on a page boundary. Initially, I experimented with //go:linkname'ing mallocgc(), but I figured allocating a regular slice the size of a page has basically the same effect. Here's a playground link: https://go.dev/play/p/ua2NJ-rEIlC. As long as the slice/backing array is a multiple of the architecture's page size, it seems to start on a page boundary. I've tried allocating a bunch of ballast, forcing GCs, etc. and it hasn't failed once.
>
> Those are guarantees possibly provided by the kernel. No promises from Go.

The kernel guarantees handing out page-aligned slabs of memory to Go,
but after that point, isn't it up to Go to slice that up however it
pleases.

>
> > Here's my question: which property of the Go allocator is this banking on? Intuitively, this makes sense to me. An 8-byte alloc needs to be 8-byte aligned in case it's a pointer. Does a 4k allocation need to be 4k-aligned as well (e.g. in case it's a struct), since a compiler would align members to the start of the struct? I'm reading larger (> 8 or 16 bytes depending on arch?) allocs have an alignment of 1 byte, and unsafe.Alignof([4096]byte) tells me the same, but that's not the behaviour exhibited by the allocator.
>
> Implementation details that can change. You cannot rely on it. You
> have to code it manually and look for platform details to work out.
>
> -j

Yep, clear. Thanks again for the pointer to the spec, not sure why I
didn't check there first.

Take care,

Timo

Robert Engels

unread,
Oct 10, 2024, 3:41:45 PM10/10/24
to Timo Beckers, Jan Mercl, golang-nuts
I think you had a bug in your Memory implementation. That is the correct way to implement and it’s straightforward. Not sure why you ran into a problem.

> On Oct 10, 2024, at 9:31 AM, 'Timo Beckers' via golang-nuts <golan...@googlegroups.com> wrote:
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CANgQc9gD3uphTpX3V46caqQvuSyNJ3ieOuz%2BOJJcyXPThTi%3D1Q%40mail.gmail.com.

Keith Randall

unread,
Oct 10, 2024, 7:31:24 PM10/10/24
to golang-nuts
type Variable struct {
*atomic.Uint64
m *Memory
}

I agree that if you have a finalizer on a Variable that unmaps m, and you've given out a reference to the internal Uint64 to callers, then you can get in a situation where the Variable is dead, its finalizer is run, and the memory is unmapped while someone is doing operations on the atomic Uint64.

So don't embed like that. Instead just have a private field and implement the methods you need. Like:

type Variable struct {
  u *atomic.Uint64
  m *Memory
}
func (v *Variable) Load() uint64 {
   x := v.u.Load()
   runtime.KeepAlive(v)
   return x
}

... and ditto for other methods of atomic.Uint64 that your clients need. It's a bit of boilerplate, but it should work without having to mmap Go heap pages.

Robert Engels

unread,
Oct 10, 2024, 9:15:30 PM10/10/24
to Keith Randall, golang-nuts
That’s what I was thinking. Thanks for providing the details. Rule of thumb - if you’re trying to do something and you’re the only one that’s trying it - you’re probably doing something wrong - or at least there is a much easier way. 

On Oct 10, 2024, at 6:31 PM, 'Keith Randall' via golang-nuts <golan...@googlegroups.com> wrote:

type Variable struct {

Timo Beckers

unread,
Oct 11, 2024, 3:25:35 AM10/11/24
to Robert Engels, Keith Randall, golang-nuts
On Fri, 11 Oct 2024 at 03:15, Robert Engels <ren...@ix.netcom.com> wrote:
>
> That’s what I was thinking. Thanks for providing the details. Rule of thumb - if you’re trying to do something and you’re the only one that’s trying it - you’re probably doing something wrong - or at least there is a much easier way.

I wish I could take the credit for the idea, but one of our
contributors referred me to this package:
https://github.com/Jille/gcmmap. Yes, it doesn't have any importers
and it's unpolished, but we thought the idea had merit, so we adapted
it to our use case.

As for there being an easier way -- it's hard to argue against KISS,
but the example I gave leaves out many details that made us prefer
this approach, see below.

> On Oct 10, 2024, at 6:31 PM, 'Keith Randall' via golang-nuts <golan...@googlegroups.com> wrote:
>
> type Variable struct {
> *atomic.Uint64
> m *Memory
> }
>
> I agree that if you have a finalizer on a Variable that unmaps m, and you've given out a reference to the internal Uint64 to callers, then you can get in a situation where the Variable is dead, its finalizer is run, and the memory is unmapped while someone is doing operations on the atomic Uint64.
>
> So don't embed like that. Instead just have a private field and implement the methods you need. Like:
>
> type Variable struct {
> u *atomic.Uint64
> m *Memory
> }
> func (v *Variable) Load() uint64 {
> x := v.u.Load()
> runtime.KeepAlive(v)
> return x
> }

Yes, we couldn't allow the caller to obtain a reference to Uint64
because of the risk of losing the Memory pointer. My initial
implementation was slightly different; it defined unexported type
aliases for all integer types in sync/atomic and embedded those
instead, so the caller could only invoke the methods promoted from
e.g. atomic.Uint64 to Variable. However, that still left the
possibility of Memory becoming unreachable at the start of
atomic.Uint64.Load(), we'd need to wrap all accessors with
runtime.KeepAlive() as you suggested above.

The reason this wasn't feasible is because the size of a Variable is
not known (by us, the library) at compile time. We parse an eBPF ELF
into a bunch of objects, including a slice of Variables, which
essentially includes all global vars and consts in the ELF. A Variable
has a size and an offset into a Memory.

We can't make a Variable generic, because again, types are not known
at Go compile time and different-size Variables need to be put into a
slice.

Additionally, Variables can be structs as well, and don't necessarily
need to be atomic. My initial implementation contained a bunch of
boilerplate for atomics, but this relegated all struct operations to
reflection-based marshaling using Variable.Set() and .Get() (some
amalgamation of binary.Read and .Write underneath).

At the end of the day, to keep things somewhat flexible and ergonomic,
this was the only option we were left with. Now, the caller can do:

```
type foo struct { a, b, uint64 }
var v *Variable
s, _ := VariablePointer[foo](v) // returns (*foo, error)

foo.a = 123
```

VariablePointer performs a bounds check on the underlying Memory and
the size of foo, and ensures it doesn't contain any pointers.
(binary.Size will enforce this)

Accesses to foo now go to shared kernel memory. foo can be passed
around the Go app without fear of the underlying memory getting
unmapped. *Memory and *Variable can be lost without issues.
> You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/LzUMguyouXM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/F9F590B4-A75E-4846-BF3E-A53AE41A2489%40ix.netcom.com.

Robert Engels

unread,
Oct 11, 2024, 12:03:03 PM10/11/24
to Timo Beckers, Keith Randall, golang-nuts
Aren’t all of the maps/variables mapped and pinned in the kernel side anyway?

I think I would make this more portable and resilient using accessors and maintain a reference to the memory and not expose the direct memory address. The compiler should optimize/inline the atomic accesses anyway so there’s no performance overhead.

> On Oct 11, 2024, at 2:25 AM, Timo Beckers <ti...@incline.eu> wrote:

David Finkel

unread,
Oct 12, 2024, 4:06:34 PM10/12/24
to Timo Beckers, Robert Engels, Keith Randall, golang-nuts
In cases like this where I don't know the type at compile time, I've usually opted to define an interface that I can assign any instantiation of a generic type to.
This way, you make your return value (or members of a slice, struct, etc.) that interface type and the user can type-assert wherever necessary.
(which has better safety properties than needing to do additional safety checks on your library's side)
I'd generally recommend having some interface-value accessors, though.
(note: interface method calls are wholly distinct from reflection (although both to involve the runtime resolving types))


e.g.
```
type Fizzle interface {
fizzleImpl()
}

type Foobar[A any] struct {
field A
Ptr   atomic.Pointer[A]
}

func (f *Foobar[A]) fizzleImpl() {}

func (f *Foobar[A]) SomeOtherMethod() A { return f.field }

func MakeFizzle(i int) Fizzle {
switch i {
case 0:
return &Foobar[int]{}
case 1:
return &Foobar[int64]{}
default:
return &Foobar[string]{}
}
}
```

Reply all
Reply to author
Forward
0 new messages