Why golang garbage-collector not implement Generational and Compact gc?

5436 views
Skip to first unread message

canji...@qq.com

unread,
May 16, 2017, 8:54:23 AM5/16/17
to golang-nuts
Generational and Compact gc have already been thought best practice. But golang doesn't adopt it. Who can tell me the reason? 

Ian Lance Taylor

unread,
May 16, 2017, 10:06:25 AM5/16/17
to canji...@qq.com, golang-nuts
On Tue, May 16, 2017 at 2:01 AM, <canji...@qq.com> wrote:
>
> Generational and Compact gc have already been thought best practice. But
> golang doesn't adopt it. Who can tell me the reason?

This has been discussed in the past.

Ignoring details, the basic advantages of a compacting GC are 1) avoid
fragmentation, and 2) permit the use of a simple and efficient bump
allocator. However, modern memory allocation algorithms, like the
tcmalloc-based approach used by the Go runtime, have essentially no
fragmentation issues. And while a bump allocator can be simple and
efficient for a single-threaded program, in a multi-threaded program
like Go it requires locks. In general it's likely to be more
efficient to allocate memory using a set of per-thread caches, and at
that point you've lost the advantages of a bump allocator. So I would
assert that, in general, with many caveats, there is no real advantage
to using a compacting memory allocator for a multi-threaded program
today. I don't mean that there is anything wrong with using a
compacting allocator, I'm just claiming that it doesn't bring any big
advantage over a non-compacting one.

Now let's consider a generational GC. The point of a generational GC
relies on the generational hypothesis: that most values allocated in a
program are quickly unused, so there is an advantage for the GC to
spend more time looking at recently allocated objects. Here Go
differs from many garbage collected languages in that many objects are
allocated directly on the program stack. The Go compiler uses escape
analysis to find objects whose lifetime is known at compile time, and
allocates them on the stack rather than in garbage collected memory.
So in general, in Go, compared to other languages, a larger percentage
of the quickly-unused values that a generational GC looks for are
never allocated in GC memory in the first place. So a generational GC
would likely bring less advantage to Go than it does for other
languages.

More subtly, the implicit point of most generational GC
implementations is to reduce the amount of time that a program pauses
for garbage collection. By looking at only the youngest generation
during a pause, the pause is kept short. However, Go uses a
concurrent garbage collector, and in Go the pause time is independent
of the size of the youngest generation, or of any generation. Go is
basically assuming that in a multi-threaded program it is better
overall to spend slightly more total CPU time on GC, by running GC in
parallel on a different core, rather than to minimize GC time but to
pause overall program execution for longer.

All that said, generational GC could perhaps still bring significant
value to Go, by reducing the amount of work the GC has to do even in
parallel. It's a hypothesis that needs to be tested. Current GC work
in Go is actually looking closely at a related but different
hypothesis: that Go programs may tend to allocate memory on a
per-request basis. This is described at
https://docs.google.com/document/d/1gCsFxXamW8RRvOe5hECz98Ftk-tcRRJcDFANj2VwCB0/view
. This is work in progress and it remains to be seen whether it will
be advantageous in reality.

Ian

Brian Hatfield

unread,
May 16, 2017, 10:17:31 AM5/16/17
to Ian Lance Taylor, canji...@qq.com, golang-nuts
This is a really great response. I appreciated the high-level overview in one place like this, and I feel like I learned something. Thanks for writing it up, Ian.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zellyn

unread,
May 16, 2017, 12:48:38 PM5/16/17
to golang-nuts, canji...@qq.com
Thanks for the enlightening and interesting reply, Ian.

One quick question: do you have a link or a short description of why “modern memory allocation algorithms, like the tcmalloc-based approach used by the Go runtime, have essentially no fragmentation issues”?

I was curious, but a quick search for [tcmalloc fragmentation] yielded mostly people struggling with fragmentation issues when using tcmalloc.

Zellyn

Eddie Ringle

unread,
May 16, 2017, 2:36:47 PM5/16/17
to Ian Lance Taylor, canji...@qq.com, golang-nuts
On Tue, May 16, 2017 at 9:06 AM Ian Lance Taylor <ia...@golang.org> wrote:
On Tue, May 16, 2017 at 2:01 AM,  <canji...@qq.com> wrote:
>
> Generational and Compact gc have already been thought best practice. But
> golang doesn't adopt it. Who can tell me the reason?

This has been discussed in the past.

Perhaps, then, the information from this write-up should be added to the FAQ on golang.org?

- Eddie 

r...@google.com

unread,
May 16, 2017, 3:26:42 PM5/16/17
to golang-nuts, canji...@qq.com
The Johnstone / Wilson paper "The memory fragmentation problem: solved?" [1] is the original source.

Modern malloc systems including Google's TCMalloc, Hoard [2], and Intel's Scalable Malloc (aka Mcrt Malloc [3]) all owe much to that paper and along with other memory managers all segregate objects by size. Many languages, most notable C/C++, use these fragmentation avoidance memory managers to build large system without the need for copy compaction.

[1] Mark S. Johnstone and Paul R. Wilson. 1998. The memory fragmentation problem: solved?. In Proceedings of the 1st international symposium on Memory management (ISMM '98). ACM, New York, NY, USA, 26-36. DOI=http://dx.doi.org/10.1145/286860.286864

[2] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: a scalable memory allocator for multithreaded applications. SIGPLAN Not. 35, 11 (November 2000), 117-128. DOI=http://dx.doi.org/10.1145/356989.357000

[3] Richard L. Hudson, Bratin Saha, Ali-Reza Adl-Tabatabai, and Benjamin C. Hertzberg. 2006. McRT-Malloc: a scalable transactional memory allocator. In Proceedings of the 5th international symposium on Memory management (ISMM '06). ACM, New York, NY, USA, 74-83. DOI=http://dx.doi.org/10.1145/1133956.1133967

David Chase

unread,
May 16, 2017, 4:22:42 PM5/16/17
to golang-nuts, canji...@qq.com
See also: Norman R. Nielsen. Dynamic memory allocation in computer simulation. Communications of the ACM, 20(11):864–873, November 1977.
This was the first place I saw this result.  A later improvement was realizing this allowed headerless BIBOP organization of allocated memory.

I think the first malloc/free-compatible GC/allocator that used BIBOP like this was the Boehm-Weiser conservative collector:
Hans Boehm and Mark Weiser. Garbage collection in an uncooperative environment. Software, Practice and Experience, pages 807–820, September 1988.
I used the technique in a performance-tuned malloc at Sun back in the early 90s, and its fragmentation was entirely acceptable;
not as good as Cartesian trees (best non-compacting fragmentation at the time) but not much worse, and far faster.

Zellyn

unread,
May 16, 2017, 5:04:16 PM5/16/17
to golang-nuts, canji...@qq.com
What a fantastic discussion. Thanks so much, folks!

canji...@qq.com

unread,
May 16, 2017, 10:07:46 PM5/16/17
to golang-nuts, canji...@qq.com
Thanks for your patient and wonderful reply. 

在 2017年5月16日星期二 UTC+8下午10:06:25,Ian Lance Taylor写道:

canji...@qq.com

unread,
May 16, 2017, 10:10:56 PM5/16/17
to golang-nuts, canji...@qq.com
Thanks for your patient and wonderful reply. 

在 2017年5月16日星期二 UTC+8下午10:06:25,Ian Lance Taylor写道:
On Tue, May 16, 2017 at 2:01 AM,  <canji...@qq.com> wrote:

leven...@gmail.com

unread,
May 17, 2017, 12:54:02 AM5/17/17
to golang-nuts
It's not clear why when you use "a set of per-thread caches" you "lose advantages of bump allocator". At any point of time, a single goroutine is executed on a thread. The points when a goroutine gains and loses the execution context of a thread, and when it is transferred from one thread to another are known to runtime. At those points a goroutine could cache (eg in a register) the current thread's bump allocation address and use it for very fast bump allocation during execution.

Ian Lance Taylor

unread,
May 17, 2017, 12:58:13 AM5/17/17
to leven...@gmail.com, golang-nuts
On Tue, May 16, 2017 at 8:27 PM, <leven...@gmail.com> wrote:
>
> It's not clear why when you use "a set of per-thread caches" you "lose advantages of bump allocator". At any point of time, a single goroutine is executed on a thread. The points when a goroutine gains and loses the execution context of a thread, and when it is transferred from one thread to another are known to runtime. At those points a goroutine could cache (eg in a register) the current thread's bump allocation address and use it for very fast bump allocation during execution.

Fair enough, although it's considerably more complicated, as you have
to allocate a chunk of address space for each thread, you have to
replenish those chunks, you go back to worrying about fragmentation,
etc.

Ian

Sokolov Yura

unread,
May 18, 2017, 3:57:59 AM5/18/17
to golang-nuts
To perform concurrent compaction/copying you should use Read Barriers. Read barrier is very expensive. It is less expensive in Java, cause pointers always looks into beginning of allocation (so redirect pointer is in known position). But it will be very expensive in Go cause of interior pointers.

wan...@gmail.com

unread,
May 18, 2017, 5:23:04 PM5/18/17
to golang-nuts, leven...@gmail.com
My feeling is a per-P bump-pointer allocation space can fit into the current go's GMP model for faster allocation than the current allocation path with span calculation and free slot search.

But anyway the fragmentation is a big pain. I'm not sure the non-moving property of go GC is the design motivation or the result of non-generation/compaction. Read barrier and moving is still expensive in Java, like in Shenandoah. The throughput number is still fairly low.

Konstantin Shaposhnikov

unread,
May 19, 2017, 11:48:41 AM5/19/17
to golang-nuts
The following article showing that a compacting gc can make the application run faster in some cases might be of interest as well: https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/

Jan Ziak

unread,
May 30, 2017, 3:44:19 PM5/30/17
to golang-nuts, canji...@qq.com
On Tuesday, May 16, 2017 at 4:06:25 PM UTC+2, Ian Lance Taylor wrote:
On Tue, May 16, 2017 at 2:01 AM,  <canji...@qq.com> wrote:
>
> Generational and Compact gc have already been thought best practice. But
> golang doesn't adopt it. Who can tell me the reason?

Now let's consider a generational GC.  The point of a generational GC
relies on the generational hypothesis: that most values allocated in a
program are quickly unused, so there is an advantage for the GC to
spend more time looking at recently allocated objects.

From a different angle/viewpoint, the point of a generational GC is to skip redundant work. Multiple runs of a non-generational GC might be performing repetitive work that can be avoided in the memory gets partitioned. It seems to me that looking at a generational GC from work redundancy viewpoint is slightly more general than the more commonly used generational hypothesis that most objects die young.

Ian Lance Taylor

unread,
May 30, 2017, 3:53:46 PM5/30/17
to Jan Ziak, golang-nuts, 李岚清
That is a good point. Go's current GC is clearly doing extra work,
but it's doing it in parallel with other work, so on a system with
spare CPU capacity Go is making a reasonable choice. But see
https://golang.org/issue/17969 .

Ian
Reply all
Reply to author
Forward
0 new messages