Slice reuse + GC

167 views
Skip to first unread message

robfig

unread,
Mar 26, 2020, 1:39:50 PM3/26/20
to golang-nuts
Reducing a slice's length makes the elements unreachable, but the GC appears to still treat them as live, based on this snippet:

I would have expected the HeapInUse to go down in the second measurement.

Why is that?

I presume that the GC is traversing the slice's capacity rather than length, but I'm not sure why this is necessary. Is there a way (without using unsafe) to access slice elements past the length?

Axel Wagner

unread,
Mar 26, 2020, 1:48:37 PM3/26/20
to robfig, golang-nuts

 

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/c2bea525-0636-4365-b551-59d1fceb5959%40googlegroups.com.

Axel Wagner

unread,
Mar 26, 2020, 1:51:53 PM3/26/20
to robfig, golang-nuts
By the way, I don't think your snippet shows what you think it's showing. In particular, the output stays the same, even if you reduce the capacity to 0: https://play.golang.org/p/r2YvnDNsoBg
I also always assumed the GC wouldn't do this optimization (throwing away memory if it's past the capacity of a slice) anyway. Not because it can't (I think), but just because I didn't think anyone built it to.

Leszek Kubik

unread,
Mar 26, 2020, 1:55:23 PM3/26/20
to golang-nuts
AFAIK it is allowed to re-slice past the len but it's rather not the way slices should be ever used. The word slice implies taking a portion from a bigger thing and that's always safe. When you append to the slice however, there's no new allocation if the underlying buffer still has place past the current slice len.

You missed the point about slices, I guess. Many slices can refer to the same internal buffer, giving you sort of window view of the data, thus as long as any slice still refer to the internal buffer GC won't collect it.

robfig

unread,
Mar 26, 2020, 2:01:34 PM3/26/20
to golang-nuts
I see, thank you.

RE reducing the capacity, I want to distinguish freeing the memory (1) used for the slice and (2) referred to by the slice elements. I can easily see that freeing (1) is hard and not so beneficial, but I can't see why (2) would be difficult, and the benefit seems potentially much larger. In our case, each slice element has a handle to a sizable []byte, so that's why I was interested to know if they would remain live.

As an aside, (1) was addressed by Ian here:

Leszek Kubik

unread,
Mar 26, 2020, 2:19:16 PM3/26/20
to golang-nuts
I let you consider an example:

s := make([]*T, 100)
s1, s2, s3 := s[:50], s[50:], s[:]
(.... x lines of code)
s1 = s1[:5]

Would you like the GC to free the elements past the last s1 slice len? What if s2, s3 are still used somewhere...

Ian Lance Taylor

unread,
Mar 26, 2020, 2:28:41 PM3/26/20
to robfig, golang-nuts
It's kind of the same problem for (2). The garbage collector would
have to find every slice that referred to the memory in question and
check the capacity. This is hard because the GC has no natural way to
go from the backing array pointer to the capacity associated with that
pointer. In particular when slices are stored on the stack, there is
no requirement that the capacity be near the backing array pointer or
even to exist at all. I suppose the GC could simply assume that all
backing array pointers on the stack referred to the entire array, but
then compiler optimizations would affect which elements were freed,
and the behavior would be somewhat unintuitive.

All in all, if you want to make sure that the slice elements are
cleared, you'll have to clear them yourself. Or use a slice copy
rather than a simple slice expression.

Ian

Leszek Kubik

unread,
Mar 26, 2020, 2:31:40 PM3/26/20
to golang-nuts
Actually *T was not a good example to make my point, as we can start debating about the GC walking the references and freeing the data pointed to. However when you have []T, and you refer to some continuous buffer of memory with some chunks [0:5] (gap) [50:100], etc there's not much benefit of "freeing" the elements in-between. Anyway, I assume slices are meant to be an abstract pointers to arrays and thus they should only change the visibility not the array

Axel Wagner

unread,
Mar 26, 2020, 3:04:48 PM3/26/20
to Leszek Kubik, golang-nuts
On Thu, Mar 26, 2020 at 6:57 PM Leszek Kubik <leszek...@gmail.com> wrote:
AFAIK it is allowed to re-slice past the len but it's rather not the way slices should be ever used.

I disagree. I do that all the time. It's also how `append` was implemented before it existed as a predeclared function. It's also, FWIW, used in bytes.Buffer. I agree that unless your API is very clear about it, you shouldn't really access a slice past the length for one of your arguments. But there *are* APIs where it's clear that's happening (e.g. anything having "append" in it's doc-string) and there are use-cases where you control the slice for its entire lifetime and where it can make code more readable.

You missed the point about slices, I guess.

What you describe as "their point" certainly is one important point about slices. But it's not "the" point. Amortized O(1) append is definitely also an important aspect and it requires being able to use them past their length (and it's why they carry a capacity, as opposed to strings, for example).

Many slices can refer to the same internal buffer, giving you sort of window view of the data, thus as long as any slice still refer to the internal buffer GC won't collect it.


On Thursday, March 26, 2020 at 6:39:50 PM UTC+1, robfig wrote:
Reducing a slice's length makes the elements unreachable, but the GC appears to still treat them as live, based on this snippet:

I would have expected the HeapInUse to go down in the second measurement.

Why is that?

I presume that the GC is traversing the slice's capacity rather than length, but I'm not sure why this is necessary. Is there a way (without using unsafe) to access slice elements past the length?

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Leszek Kubik

unread,
Mar 26, 2020, 3:29:04 PM3/26/20
to golang-nuts



I disagree. I do that all the time. It's also how `append` was implemented before it existed as a predeclared function. It's also, FWIW, used in bytes.Buffer. I agree that unless your API is very clear about it, you shouldn't really access a slice past the length for one of your arguments. But there *are* APIs where it's clear that's happening (e.g. anything having "append" in it's doc-string) and there are use-cases where you control the slice for its entire lifetime and where it can make code more readable.

 I didn't even know that append wasn't there, but since we have it isn't it meant to be used in the first place?
Of course if we check the capacity, expanding past len is safe, like s[:cap(s)]. I would check the capacity only in case when I really don't want to end with a new slice buffer as a result of appending more than the available space.

Keith Randall

unread,
Mar 26, 2020, 8:06:16 PM3/26/20
to golang-nuts
It's common practice to *write* to elements between the length and capacity of a slice.  Usually, you use append to do that.
It's bad practice to *read* elements between the length and capacity. Which you can't do with a simple indexing op, of course. You would have to reslice larger and then index.
In that sense, it would be nice to have the GC not trace elements between len and cap. They should be dead if you never read them, only write them. It's hard to do in the GC, it requires a language change, etc. But it would be nice.

Leszek Kubik

unread,
Mar 27, 2020, 3:19:26 AM3/27/20
to golang-nuts
Wait, by saying it would be nice to have the GC not trace elements between len and cap, you mean it would be nice to have GC recognize unreferenced elements of a slice between len and cap and free them (presumably if they are pointer or interface types)?

OK, I would say Go has quirks but I'm the type of an engineer who doesn't expect things to work as I like them but I dig how it actually works and live with that. In my opinion Go (or the Go tutorials) advertise slices too much, perhaps the problem is that there's no handy builtin list type with nice language sugar.

A fix for the situation of the topic could be implemented by the slice type. Imagine, if a slice value not only shared the pointer to the underlying buffer but also a list with references to all other slices holding on to the underlying buffer then such operation as s = s[x:y] could walk the list, find and free all unreferenced values in the underlying buffer (perhaps even assigning zero value to have consistent behavior regardless of []T or []*T). Such operation would be deterministic. Of course there could be an optimization in place, whenever the slice was derived from a named array, just keep some flag instead of the list of references to all other slices. I don't see how GC could achieve that in a consistent way since the GC may not even do the swipe between calls like s = s[:x]; s = s[:cap(x)]. Unfortunately I'm afraid that this would break some existing code (even if we also trim the cap property) thus keeping in mind the original notion of a slice as a window to some array makes sense to me.

As a side note, I think the append function has quirks too, so it's easy to make mistakes until you dig into it. The function signature shows it returns a new slice so it's easy to jump to conclusion that you can safely modify the new slice without affecting the original one, or on the other hand jump to conclusion that writes made to this slice would be seen in other slices. It all depends how you initially form the idea of what the slice type is.

Jan Mercl

unread,
Mar 27, 2020, 5:37:32 AM3/27/20
to Leszek Kubik, golang-nuts
On Fri, Mar 27, 2020 at 8:19 AM Leszek Kubik <leszek...@gmail.com> wrote:

> A fix for the situation of the topic could be implemented by the slice type. Imagine, if a slice value not only shared the pointer to the underlying buffer but also a list with references to all other slices holding on to the underlying buffer then such operation as s = s[x:y] could walk the list, find and free all unreferenced values in the underlying buffer (perhaps even assigning zero value to have consistent behavior regardless of []T or []*T). Such operation would be deterministic. Of course there could be an optimization in place, whenever the slice was derived from a named array, just keep some flag instead of the list of references to all other slices. I don't see how GC could achieve that in a consistent way since the GC may not even do the swipe between calls like s = s[:x]; s = s[:cap(x)]. Unfortunately I'm afraid that this would break some existing code (even if we also trim the cap property) thus keeping in mind the original notion of a slice as a window to some array makes sense to me.

For many, if not most programs the current approach is quite ok. The
suggested solution is non trivial, especially in terms of runtime and
memory consumption costs. Not only _all_ programs will get slower, but
the added memory overhead can sometimes dwarf the memory savings.

If a specific program has the problem of "zombie" slice elements,
which is BTW the same problem as keeping a short string sliced from of
a huge string, then in both the cases the solution is to make a copy
of the part that is needed to keep, thus allowing the GC to take care
of the original backing array.
Reply all
Reply to author
Forward
0 new messages