Go allocator: allocates arenas, reclaims pages?

471 views
Skip to first unread message

Vitaly Isaev

unread,
Jun 20, 2022, 11:48:02 AM6/20/22
to golang-nuts
Go allocator requests memory from OS in large arenas (on Linux x86_64 the size of arena is 64Mb), then allocator splits each arena to 8 Kb pages, than merges pages in spans of different sizes (from 8kb to 80kb size according to https://go.dev/src/runtime/sizeclasses.go). This process is well described in various blog posts and presentations.

But there is much less information about scavenger. Is it true that in contrast to allocation process, scavenger reclaims to OS not arenas, but pages underlying idle spans? This performed with madvice(MADV_DONT_NEED).

If so, Am I correct that after a while the virtual address space of a Go application resembles a "layered cake" of interleaving used and reclaimed memory regions (kind of classic memory fragmentation problem)? Looks like if application requires more virtual memory after some time, the OS won't be able to reuse these page-size regions to allocate contiguous space sufficient for arena allocation.

Are there any consequences of this design for the runtime performance, especially for the RSS consumption?

Finally, how does runtime decide, what to use - munmap or madvice - for the purposes of memory reclamation?

Thank you

Michael Knyszek

unread,
Jun 20, 2022, 5:46:36 PM6/20/22
to golang-nuts
Thanks for the question. The scavenger isn't as publicly visible as other parts of the runtime. You've got it mostly right, but I'm going to repeat some things you've already said to make it clear what's different.

The Go runtime maps new heap memory (specifically: a new virtual memory mapping for the heap) as read/write in increments called arenas. (Note: my use of "heap" here is a little loose; that pool of memory is also used for e.g. goroutine stacks.) The concept of arena is carried forward to how GC metadata is managed (chunk of metadata per arena) but is otherwise orthogonal to everything else I'm about to describe. To the scavenger, the concept of an arena doesn't really exist.

The platform (OS + architecture) has some underlying physical page size (typically between 4 and 64 KiB, inclusive), but Go has an internal page size of 8 KiB. It divides all of memory up into these 8 KiB pages, including heap memory.

The runtime assumes, in general, that new virtual memory is not backed by physical memory until first use (or an explicit system call on some platforms, like Windows). As free pages get allocated for the heap (for spans, as you say), they are assumed to be backed by physical memory. Once those pages are released, they are still assumed to be backed by physical memory.

This is where the scavenger comes in: it tells the OS that these free regions of the address space, which it assumes are backed by physical pages, are no longer needed in the short term. So, the OS is free to take the physical memory back. "Telling the OS" is the madvise system call on Linux platforms. Note that the Go runtime could be wrong about whether the region is backed by physical memory; that's fine, madvise is just a hint anyway (a really useful one). (Also, it's really unlikely to be wrong, because memory needs to be zeroed before it's handed to the application. Still, it's theoretically possible.)

The scavenger doesn't really have any impact on fragmentation, because the Go runtime is free to allocate a span out of a mix of scavenged and unscavenged pages. When it's actively scavenging, it briefly takes those pages out of the allocation pool, which can affect fragmentation, but the system is organized such that such a collision (and thus potentially some fragmentation) is less likely.

The result is basically just fewer physical pages consumed by Go applications (what "top" reports as "RSS") at the cost of about 1% of total CPU time. The CPU cost, however, is usually much less; 1% is just the target while it's active, but in the steady-state there's typically not too much work to do.

The Go runtime also never unmaps heap memory, because virtual memory that's guaranteed to not be backed by physical memory is very cheap (likely just a single interval in some OS bookkeeping). Unmapping virtual address space is also fairly expensive in comparison to madvise, so it's worthwhile to avoid.

I don't fully understand what you mean by "layered cake" in this context. The memory allocator in general is certainly a "layered cake," but the scavenger just operates directly on the pool of free pages (which again, don't have much to do with arenas other than that happens to be the increment that new pages are added to the pool).

There's also two additional complications to all of this:
(1) Because the Go runtime's page size doesn't usually match the system's physical page size, the scavenger needs to be careful to only return contiguous and aligned runs of pages that add up to the physical page size. This makes it less effective on platforms with physical page larger than 8 KiB because fragmentation can prevent an entire physical page from being free. This is fine, though; the scavenger is most useful when, for example, the heap size shrinks significantly. Then there's almost always a large swathe of available free pages. Note also that platforms with smaller physical page sizes are fine, because every scavenge operation releases some multiple of physical pages.
(2) The Go runtime tries to take into account transparent huge pages as well. That's its own can of worms that I won't go into for now.

Michael Knyszek

unread,
Jun 20, 2022, 9:50:23 PM6/20/22
to golang-nuts
Just to clarify, when I said "publicly visible" I meant via blog posts and talks. There are a few design documents and runtime-internal comments that go into more depth.

Vitaly Isaev

unread,
Jun 21, 2022, 4:30:21 PM6/21/22
to golang-nuts
Michael, many thanks for such a comprehensive description!

вторник, 21 июня 2022 г. в 04:50:23 UTC+3, Michael Knyszek:
Reply all
Reply to author
Forward
0 new messages