[generics] Data structures and the garbage collector

241 views
Skip to first unread message

Markus Heukelom

unread,
Jul 9, 2020, 12:40:55 PM7/9/20
to golang-nuts
Hi,

Many data structures are implemented using pointers. How does this perform wrt garbage collecting? For example take any data structure that uses some tree of nodes, or even simpler just a singly linked list: there is one pointer from the left node to the next. Wouldn't this incur huge GC (unavoidable) cycles for large linked lists that will look at all pointers? I can imagine that a large AVL tree would then cause potentially large hick-ups in the GC cycle? 

Of course, luckily we have map where you can just store the values. But if people start using generics for their own map types, or trees, wouldn't this potentially be a bigger problem? Wouldn't this defy the use of generics for data structures aimed to manage a lot of data?

I don't know how Java does this for example, or if it is no practical issue at all. In C++ you can you ref counting for the nodes, in C you would use a arena allocator probably for the nodes etc. I was wondering how this relates to Go.

-Markus






Ian Lance Taylor

unread,
Jul 9, 2020, 3:58:21 PM7/9/20
to Markus Heukelom, golang-nuts
There is no special connection between the builtin map type and the
garbage collector. The builtin map type uses pointers too. The
garbage collector has to trace those pointers.

Go uses a concurrent garbage collector, so a large linked list won't
cause any sort of program pause. The garbage collector does use CPU
time, and it will use more CPU time for data structures that use more
pointers. But I don't see that as being particularly tied to
generics. If a program needs to build data structures with lots of
pointers, it will build those data structures whether or not the
language has generics.

In fact, there is a good argument that generics will reduce the number
of pointers overall. Without generics, a general purpose data
structure will use interface types, and in the current implementation
interface types always contain two pointers. With generics, a general
purpose data structure will store the value directly; if the value
does not contain any pointers, then the overall effect will be fewer
pointers and less GC work.

Ian

Jesper Louis Andersen

unread,
Jul 10, 2020, 5:57:45 AM7/10/20
to Markus Heukelom, golang-nuts
On Thu, Jul 9, 2020 at 6:40 PM Markus Heukelom <markus....@gain.pro> wrote:
Many data structures are implemented using pointers. How does this perform wrt garbage collecting? For example take any data structure that uses some tree of nodes, or even simpler just a singly linked list: there is one pointer from the left node to the next. Wouldn't this incur huge GC (unavoidable) cycles for large linked lists that will look at all pointers? I can imagine that a large AVL tree would then cause potentially large hick-ups in the GC cycle? 


Is your worry latency in the form of pause times while doing GC, or bandwidth in the form of CPU cycles when doing GC? It isn't entirely clear.

In any case, you are right that these things require attention in a garbage collector, both of them. But Go has addressed this for many years now.

As an example, imagine you have a binary AVL or Red/Black tree taking up 128 Gigabyte of memory space. You now delete the root node of said tree.

Example 1: in C++ we have reference counted the tree. Deleting the root node sets its childrens refcounts to 0, so they get deleted, and so on recursively down the tree. It might take some time to delete 128 Gigabyte of data this way. So C++ isn't exempt from memory reclamation pauses either.

Example 2: Imagine a language using Cheney's copying collection as the main garbage collection scheme. It will use 256 gigabyte of memory for its two copying spaces, but collection time will be 0, since the tree's data isn't live. You are paying with memory for faster collection in this case.

Example 3: Concurrent collectors, like the one Go uses nowadays, will collect the tree while the program is running and doing productive work. While doing so incurs a bandwidth loss, latency pause times do not suffer. If you have spare CPU cores, they'll be used to handle the collection while the remaining cores run the program (more or less, this is a gross simplification).

As for generics:

When you use a generic, you substitute in a type to generate a special-case of a function for that type. If the implementation uses monomorphization, it will generate a new piece of code specifically for that type invocation. This might mean other optimizations trigger for the code path, notably stripping pointers away.

As for general worry:

In some cases it is beneficial to construct data structures such that they pack data well and don't use excessive amounts of pointers. One reason is that collecting pointers takes time in any language. Another reason is that pointer indirections cost memory loads, which can be expensive. A third reason is that packed data has cache locality, so access to data close together is several orders of magnitude faster than following yet another pointer indirection. And finally, it might avoid memory pressure in copying data around.

Markus Heukelom

unread,
Jul 10, 2020, 12:48:05 PM7/10/20
to golang-nuts
Many data structures are implemented using pointers. How does this perform wrt garbage collecting? For example take any data structure that uses some tree of nodes, or even simpler just a singly linked list: there is one pointer from the left node to the next. Wouldn't this incur huge GC (unavoidable) cycles for large linked lists that will look at all pointers? I can imagine that a large AVL tree would then cause potentially large hick-ups in the GC cycle? 


Is your worry latency in the form of pause times while doing GC, or bandwidth in the form of CPU cycles when doing GC? It isn't entirely clear.


My "worry" is about CPU cycles that would not be (necessarily) needed in let's say C++ or C. The fact that Go has a concurrent GC resulting in no "stop the world"-hick-ups is really awesome, but was not my main point. 

(btw I don't really "worry" about it for any real applications; it was more a question out of interest)

The reason I asked is because I use a custom tree data-structure that manages parent-child, and leaf node prev-next, relationships using pointers. I use the tree for multiple value types, so the payload type is interface{} now. It would be a great place for me to use generics; this would make my code more clear and safer. 

In any case, you are right that these things require attention in a garbage collector, both of them. But Go has addressed this for many years now.

As an example, imagine you have a binary AVL or Red/Black tree taking up 128 Gigabyte of memory space. You now delete the root node of said tree.

Example 1: in C++ we have reference counted the tree. Deleting the root node sets its childrens refcounts to 0, so they get deleted, and so on recursively down the tree. It might take some time to delete 128 Gigabyte of data this way. So C++ isn't exempt from memory reclamation pauses either.


In C/C++ you could better use an arena-allocator and not refcounts. Clearing the tree would then use only (nodes / (nodes per region)) deallocation calls. As the region is typically large, this would probably be orders of magnitude quicker. I believe compilers use arena allocators for this reason. (Deleting individual nodes would be cheaper maybe as well.) 

Refcounting I think is best reserved for situations where the lifetime of an object is not determined by, or escapes, the package (class internals in C++), or for reasons  of simplicity of implementation of course.

I was under the (false) impression that one of the reasons of having a built-in array, slice and map type in Go (besides it just being really handy) is that for example a map[string]int would relieve the GC needing to trace the keys and values (in contrast to for example map[int]*string). But apparently (see Ian's answer) this is not the case, and the GC always needs to trace all elements, keys and values. In that case there is (luckily) no reason to choose build-in map over a custom map written with generics.

t hepudds

unread,
Jul 11, 2020, 12:04:49 PM7/11/20
to golang-nuts
Hello Markus,

> "But apparently (see Ian's answer) this is not the case, and the GC always needs to trace all elements, keys and values."

I think Ian said something a bit different from what you said there.

As far as I understand, the GC does not scan the keys and values of a map if they are not pointers and do not contain pointers. See for example https://go-review.googlesource.com/c/go/+/3288

> "so the payload type is interface{} now."

For your case, it sounds like with generics you could substantially reduce the total count of pointers used by your data structure by storing the value directly as Ian said, rather than using interface{}.

Separately, a great overview of the GC system is in this golang.org blog post by Rick Hudson:


Also, the code itself is very readable, including there are some large comments that give overviews, such as:

  
Those are worth a quick read if you are interested in more details of how the GC works.

Regards,
thepudds
Reply all
Reply to author
Forward
0 new messages