Better GC and Memory Allocator for Go

2327 views
Skip to first unread message

Dmitry Vyukov

unread,
May 22, 2013, 12:57:30 PM5/22/13
to golang-dev
Hi,

I want to share my rough plan on improving Go GC and memory allocator:
https://docs.google.com/document/d/1HCPu3WKyCX3ZRYxmIMKTk0Ik1dePxKW1p02k3uhcft4/edit?usp=sharing
I don't have all the details, and I don't have a timeline.
But since these are very significant changes I want share it early,
hear your thoughts and feedback, and ideally get a buy in from
somebody for compiler/linker part (GC info generation) and parts of
runtime work.
Thanks!

Dmitry Vyukov

unread,
May 24, 2013, 8:15:57 AM5/24/13
to golang-dev
Carl pointed that GC bitmasks (bit per word saying - pointer/not pointer) will conflict with interface -- the data word can be a pointer and not a pointer for the same object.

The point is that it would be very-very nice if we can use just
1 bit to represent type info. It would reduce memory consumption and
working set, simplify and speedup GC code.

One possible solution is to not store data inline (always allocate, as gccgo does).
Another solution is to store inline only pointers (and zero-size structs). I suspect that lots of interface values are pointers.
It's also possible to allocate object with sizes 1..7 inline if we set high bits to 1 so that it does not look like pointer (is it true for all OSes?). But it's probably too complex.
Will interface representation change break any public APIs?

Keith Randall

unread,
May 24, 2013, 9:41:32 AM5/24/13
to Dmitry Vyukov, golang-dev
On Fri, May 24, 2013 at 5:15 AM, Dmitry Vyukov <dvy...@google.com> wrote:
On Wed, May 22, 2013 at 8:57 PM, Dmitry Vyukov <dvy...@google.com> wrote:
Hi,

I want to share my rough plan on improving Go GC and memory allocator:
https://docs.google.com/document/d/1HCPu3WKyCX3ZRYxmIMKTk0Ik1dePxKW1p02k3uhcft4/edit?usp=sharing
I don't have all the details, and I don't have a timeline.
But since these are very significant changes I want share it early,
hear your thoughts and feedback, and ideally get a buy in from
somebody for compiler/linker part (GC info generation) and parts of
runtime work.
Thanks!


Carl pointed that GC bitmasks (bit per word saying - pointer/not pointer) will conflict with interface -- the data word can be a pointer and not a pointer for the same object.


So what kind of objects does this happen with?  I would imagine most objects do not fall into this category.  Certainly stacks, maybe interface values, others?  Perhaps a special case can be made for these (for stacks, there already is).
 
The point is that it would be very-very nice if we can use just
1 bit to represent type info. It would reduce memory consumption and
working set, simplify and speedup GC code.

One possible solution is to not store data inline (always allocate, as gccgo does).
Another solution is to store inline only pointers (and zero-size structs). I suspect that lots of interface values are pointers.
It's also possible to allocate object with sizes 1..7 inline if we set high bits to 1 so that it does not look like pointer (is it true for all OSes?). But it's probably too complex.
Will interface representation change break any public APIs?

--
 
---
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Dmitry Vyukov

unread,
May 24, 2013, 9:46:15 AM5/24/13
to Keith Randall, golang-dev
On Fri, May 24, 2013 at 5:41 PM, Keith Randall <k...@google.com> wrote:

On Fri, May 24, 2013 at 5:15 AM, Dmitry Vyukov <dvy...@google.com> wrote:
On Wed, May 22, 2013 at 8:57 PM, Dmitry Vyukov <dvy...@google.com> wrote:
Hi,

I want to share my rough plan on improving Go GC and memory allocator:
https://docs.google.com/document/d/1HCPu3WKyCX3ZRYxmIMKTk0Ik1dePxKW1p02k3uhcft4/edit?usp=sharing
I don't have all the details, and I don't have a timeline.
But since these are very significant changes I want share it early,
hear your thoughts and feedback, and ideally get a buy in from
somebody for compiler/linker part (GC info generation) and parts of
runtime work.
Thanks!


Carl pointed that GC bitmasks (bit per word saying - pointer/not pointer) will conflict with interface -- the data word can be a pointer and not a pointer for the same object.


So what kind of objects does this happen with?  I would imagine most objects do not fall into this category.  Certainly stacks, maybe interface values, others?  Perhaps a special case can be made for these (for stacks, there already is).


I think for stacks we can get precise info. In each safepoint it's either pointer or not, even if we reuse stack space across scopes.

Can it happen inside of maps?

Keith Randall

unread,
May 24, 2013, 9:48:35 AM5/24/13
to Dmitry Vyukov, golang-dev
On Fri, May 24, 2013 at 6:46 AM, Dmitry Vyukov <dvy...@google.com> wrote:
On Fri, May 24, 2013 at 5:41 PM, Keith Randall <k...@google.com> wrote:

On Fri, May 24, 2013 at 5:15 AM, Dmitry Vyukov <dvy...@google.com> wrote:
On Wed, May 22, 2013 at 8:57 PM, Dmitry Vyukov <dvy...@google.com> wrote:
Hi,

I want to share my rough plan on improving Go GC and memory allocator:
https://docs.google.com/document/d/1HCPu3WKyCX3ZRYxmIMKTk0Ik1dePxKW1p02k3uhcft4/edit?usp=sharing
I don't have all the details, and I don't have a timeline.
But since these are very significant changes I want share it early,
hear your thoughts and feedback, and ideally get a buy in from
somebody for compiler/linker part (GC info generation) and parts of
runtime work.
Thanks!


Carl pointed that GC bitmasks (bit per word saying - pointer/not pointer) will conflict with interface -- the data word can be a pointer and not a pointer for the same object.


So what kind of objects does this happen with?  I would imagine most objects do not fall into this category.  Certainly stacks, maybe interface values, others?  Perhaps a special case can be made for these (for stacks, there already is).


I think for stacks we can get precise info. In each safepoint it's either pointer or not, even if we reuse stack space across scopes.

Can it happen inside of maps?

No.  Every slot in a map data structure is either a pointer or not for its entire lifetime.  We can tell the GC what the layout is at allocation time. 

roger peppe

unread,
May 24, 2013, 9:51:47 AM5/24/13
to Dmitry Vyukov, golang-dev
On 24 May 2013 13:15, Dmitry Vyukov <dvy...@google.com> wrote:
> On Wed, May 22, 2013 at 8:57 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>>
>> Hi,
>>
>> I want to share my rough plan on improving Go GC and memory allocator:
>>
>> https://docs.google.com/document/d/1HCPu3WKyCX3ZRYxmIMKTk0Ik1dePxKW1p02k3uhcft4/edit?usp=sharing
>> I don't have all the details, and I don't have a timeline.
>> But since these are very significant changes I want share it early,
>> hear your thoughts and feedback, and ideally get a buy in from
>> somebody for compiler/linker part (GC info generation) and parts of
>> runtime work.
>> Thanks!
>
>
>
> Carl pointed that GC bitmasks (bit per word saying - pointer/not pointer)
> will conflict with interface -- the data word can be a pointer and not a
> pointer for the same object.
>
> The point is that it would be very-very nice if we can use just
> 1 bit to represent type info. It would reduce memory consumption and
> working set, simplify and speedup GC code.

This is probably a stupid idea, but I wonder if you could make use
of the fact that the *second* pointer in an interface is always
of one or two well known kinds (a type or a method-table+type)

If the second of a pair of pointers is one of those kinds, we
know it's an interface and can potentially work out if the
first of the pair is a pointer.

We could store types and method tables in a separate arena
to make the check cheap.

Dmitry Vyukov

unread,
May 24, 2013, 9:55:08 AM5/24/13
to roger peppe, golang-dev
Thanks!
This is a possible solution.
But it also complicates and slows down GC. It would be nice to scan w/o any exceptions.

Daniel Morsing

unread,
May 24, 2013, 9:57:44 AM5/24/13
to Dmitry Vyukov, golang-dev
I don't think you gain a lot by doing this. People tend to use int as
a placeholder type to attach methods to, so you end up doing just as
much allocating, but with more complexity.

> Will interface representation change break any public APIs?
>

reflect has (Value).InterfaceData() for direct access to the interface
words. I don't think you can use that call for anything useful, so
maybe it's ok to break it?

Will each memory section have it's own copy of the bitmask? Then a
possibility would be to have the interface generation functions
clear/set the bitmask, depending on whether it's word contains a value
or a pointer.

Keith Randall

unread,
May 24, 2013, 10:01:22 AM5/24/13
to Dmitry Vyukov, golang-dev
I had a different idea about improving the way we record type info in the heap.  Perhaps it would compliment your plan.

The idea is that most of the heap is probably just a few object types.  So we can reserve some MSpans to hold just one type of object.  Kind of like the size classes we have now, but divided up by type instead of size class.  Of course, that means a lot more classes, so we can't do this for every type.  Instead, we keep a count of allocations by type and when the count of allocations exceed a threshold, we promote that type to have its own "class", meaning from then on it gets allocated as a separate class (and no longer allocated by size class).

The benefit is that we can record type info at the MSpan level instead of at the individual object level.  At least, for MSpans which correspond to a type class.  We also get the side benefit of saving some space if the type doesn't fit exactly in a size class.  We need type info per object only for the old size class MSpans.

There are still some problems to work through.  How are arrays handled?  And what if the number of type classes grows too large (maybe large = as many as size classes we have now = 61)?  Can we "unpromote" a type class later?  We could reset (or decay) the allocation counts at each GC and rechoose the type classes based on the top N allocations.

Dmitry Vyukov

unread,
May 24, 2013, 10:39:30 AM5/24/13
to Daniel Morsing, golang-dev
One more reason to use struct{} ;)

 

> Will interface representation change break any public APIs?
>

reflect has (Value).InterfaceData() for direct access to the interface
words. I don't think you can use that call for anything useful, so
maybe it's ok to break it?

Does it document the exact layout? AFAIK gccgo always use non-inlined data, so there must be no actual promise on representation...



Will each memory section have it's own copy of the bitmask? Then a
possibility would be to have the interface generation functions
clear/set the bitmask, depending on whether it's word contains a value
or a pointer.


Yeah, I though about it, but it will slowdown interface assignment...
 

Daniel Morsing

unread,
May 24, 2013, 11:23:53 AM5/24/13
to Dmitry Vyukov, golang-dev
It just returns an array of 2 uintptrs. I doubt there is anything you
could actually do with them.

>
>
>> Will each memory section have it's own copy of the bitmask? Then a
>> possibility would be to have the interface generation functions
>> clear/set the bitmask, depending on whether it's word contains a value
>> or a pointer.
>>
>
> Yeah, I though about it, but it will slowdown interface assignment...
>

Right, on top of that, It will make it a lot more difficult to
represent arrays of interfaces compactly. We can scrap that idea.

I'm leaning towards either just assuming that anything in an interface
is a pointer or allocating when it's a value type.

Dmitry Vyukov

unread,
May 24, 2013, 11:29:35 AM5/24/13
to Daniel Morsing, golang-dev
We must aim for 100% GC in long term.
It's OK to treat the interface data conservative as a pointer for now. But we need at least some idea how to implement it in a precise way later (to not reimplement the whole thing from scratch again).

Yes, for now I am inclined to allocate int's separately and store pointers.

Dmitry Vyukov

unread,
May 24, 2013, 11:33:42 AM5/24/13
to Keith Randall, golang-dev
On Fri, May 24, 2013 at 6:01 PM, Keith Randall <k...@google.com> wrote:
I had a different idea about improving the way we record type info in the heap.  Perhaps it would compliment your plan.

The idea is that most of the heap is probably just a few object types.  So we can reserve some MSpans to hold just one type of object.  Kind of like the size classes we have now, but divided up by type instead of size class.  Of course, that means a lot more classes, so we can't do this for every type.  Instead, we keep a count of allocations by type and when the count of allocations exceed a threshold, we promote that type to have its own "class", meaning from then on it gets allocated as a separate class (and no longer allocated by size class).

The benefit is that we can record type info at the MSpan level instead of at the individual object level.  At least, for MSpans which correspond to a type class.  We also get the side benefit of saving some space if the type doesn't fit exactly in a size class.  We need type info per object only for the old size class MSpans.

There are still some problems to work through.  How are arrays handled?  And what if the number of type classes grows too large (maybe large = as many as size classes we have now = 61)?  Can we "unpromote" a type class later?  We could reset (or decay) the allocation counts at each GC and rechoose the type classes based on the top N allocations.



If I correctly understand your proposal, I think the additional complexity outweigh the potential memory savings. It will also slow-down either allocation of GC somewhat, because we will need to count number of alive objects by type.

Dmitry Vyukov

unread,
May 29, 2013, 3:01:11 AM5/29/13
to golan...@googlegroups.com
Hi,

Here is a sketch of the scan loop that uses:
1. heap object type info (instead of points-to type info)
2. simple bitmasks for type info
3. object headers

#define BitAllocated 1
#define BitMarked 2
#define BitsPerTypeInfoWord 64
#define EmbedTypeInfoBits (BitsPerTypeInfoWord-2)
#define EmbedTypeInfoObjectSize (EmbedTypeInfoBits*PtrSize)
#define PAGETABLE (mheap.pagetab)  // can be const
#define SPANTABLE (mheap.spantab)  // can be const

void scan(void)
{
uintptr *p;
uintptr n, i, idx, v, vsize, tisize, tipos0, tipos1, spanstart;
uint64 *ti, *ti0;
Pagetab *pt;
Type *t;
Span *span;

for(;;) {
p = dequeue(&n);  // next object to scan
if(p == nil)
break;
// setup type info
if(n <= EmbedTypeInfoObjectSize) {
// embed type info
tisize = BitsPerTypeInfoWord;
tipos0 = tipos1 = 2;  // first 2 bits are allocated and marked
ti0 = ti = p;
} else {
// satellite type info
t = (Type*)(*p & ~3);
tisize = t->tisize;  // DEREF
tipos0 = tipos1 = 0;
ti0 = ti = t->ti;
}
for(i = 1; i < n/PtrSize; i++) {
if((*ti & (1<<tipos1)) && (v = p[i])) {
// have non-nil pointer v
// find object beginning
pt = PAGETABLE[v/PageSize];  // DEREF
if(pt.size) {
// small object
vsize = pt.size;
spanstart = (v&~(PageSize-1)) - pt.offset*PageSize;
idx = (v-spanstart)/vsize;  // DIV
v = spanstart + idx*vsize;
} else {
// large object
span = SPANTABLE[v/PageSize];  // DEREF
v = span->start;  // DEREF
vsize = span->size;
}
// now v - memory block, vsize - memory block size
// check if already marked
// (*v&BitAllocated) == 0 should not happen with precise GC
if((*v&BitMarked) == 0) {  // DEREF
// deliberately sloppy about atomicity, can scan the same object twice
*v |= BitMarked;
// enqueue for scanning
enqueue(v, vsize);
}
}
// update type info position
tipos0++;
tipos1++;
if(tipos0 == tisize) {
// move to the next array element
ti = ti0;
tipos1 = tipos0 = 0;
} else if(tipos1 == BitsPerTypeInfoWord) {
// move to the next type info word
ti++;
tipos1 = 0;
}
}
}
}

For objects of size <= 496 there is only 1 memory dereference (except the object itself) -- PAGETABLE.
For objects of size <= MaxSmallSize (32K) there is 1 additional memory dereference -- satellite type info.
For large object there are 2 additional memory dereferences -- SPANTABLE and span.
I afraid counting dereferences in the current code, I would expect this to be at least 2 times faster than what we have now.

Hopefully this will be less error-prone as well. The recent GC bugs were caused by (1) wrong points-to type info, (2) dangling pointers and (3) complexity, corner cases and accumulated state. (1) points-to type info is not used in this algo. (2) a dangling pointer can cause a leak at worst. (3) here is no corner cases and accumulated state, a bug is either in this handful of lines of code or in type info bitmasks.

With this algo it's also trivial to calculate alive heap size during scanning (basically impossible now), and this opens the path to concurrent sweeping (another 1/3 stop-the-world reduction).




 

unread,
May 29, 2013, 7:24:01 AM5/29/13
to golan...@googlegroups.com
From the document:


2. GC redesign.

Problems:

2.1 GC is complex

2.2 type info is complex, fat and slow to process

2.3 lots of scattered memory accesses and indirection

My opinion:

2.1 True
2.2 The current GC hasn't been optimized yet
2.3 The current GC hasn't been optimized yet

Dmitry Vyukov

unread,
May 29, 2013, 7:26:27 AM5/29/13
to ⚛, golang-dev
On Wed, May 29, 2013 at 3:24 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
> From the document:
>
>
> 2. GC redesign.
>
> Problems:
>
> 2.1 GC is complex
>
> 2.2 type info is complex, fat and slow to process
>
> 2.3 lots of scattered memory accesses and indirection
>
> My opinion:
>
> 2.1 True
> 2.2 The current GC hasn't been optimized yet
> 2.3 The current GC hasn't been optimized yet

Do you think it can be substantially faster (to outweigh complexity)
than what I propose?



> On Wednesday, May 22, 2013 6:57:30 PM UTC+2, Dmitry Vyukov wrote:
>>
>> Hi,
>>
>> I want to share my rough plan on improving Go GC and memory allocator:
>>
>> https://docs.google.com/document/d/1HCPu3WKyCX3ZRYxmIMKTk0Ik1dePxKW1p02k3uhcft4/edit?usp=sharing
>> I don't have all the details, and I don't have a timeline.
>> But since these are very significant changes I want share it early,
>> hear your thoughts and feedback, and ideally get a buy in from
>> somebody for compiler/linker part (GC info generation) and parts of
>> runtime work.
>> Thanks!
>

unread,
May 29, 2013, 7:59:01 AM5/29/13
to golan...@googlegroups.com, ⚛
On Wednesday, May 29, 2013 1:26:27 PM UTC+2, Dmitry Vyukov wrote:
On Wed, May 29, 2013 at 3:24 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
> From the document:
>
>
> 2. GC redesign.
>
> Problems:
>
> 2.1 GC is complex
>
> 2.2 type info is complex, fat and slow to process
>
> 2.3 lots of scattered memory accesses and indirection
>
> My opinion:
>
> 2.1 True
> 2.2 The current GC hasn't been optimized yet
> 2.3 The current GC hasn't been optimized yet

Do you think it can be substantially faster (to outweigh complexity)
than what I propose?

I do not know the answer. I only know that the current GC can be improved in multiple ways.

Note: The reason why the current GC does not store typeinfo in the beginning of an allocated block is because it has been rejected by rsc. I don't plan to change this decision.
Message has been deleted

Dmitry Vyukov

unread,
May 29, 2013, 8:54:03 AM5/29/13
to ⚛, golang-dev
My plan is to wait until Carl implement it :)
He already has precise args scanning for arguments.

On Wed, May 29, 2013 at 4:00 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
> How do you propose to scan stack frames in a precise way?

Ian Lance Taylor

unread,
May 29, 2013, 9:27:25 AM5/29/13
to Dmitry Vyukov, Daniel Morsing, golang-dev
On Fri, May 24, 2013 at 8:29 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>
> We must aim for 100% GC in long term.
> It's OK to treat the interface data conservative as a pointer for now. But
> we need at least some idea how to implement it in a precise way later (to
> not reimplement the whole thing from scratch again).
>
> Yes, for now I am inclined to allocate int's separately and store pointers.

That is what gccgo does, but for some types of code it introduces
considerable overhead. Some code, like encoding/binary, uses small
values with methods and stores them in interfaces. encoding/binary
only uses a couple of values, but I've seen code that uses a lot more.
Forcing all those values into separate heap locations means a lot of
unnecessary heap allocations.

Ian

Dmitry Vyukov

unread,
May 30, 2013, 2:35:13 AM5/30/13
to Ian Lance Taylor, Daniel Morsing, golang-dev
Bad news...
We can make objects with size < PtrSize embedded as well (by setting
unused bits to 1).
Is it possible to rewrite that code in some other way so that it does
not use such interfaces?

Russ Cox

unread,
May 30, 2013, 11:18:36 AM5/30/13
to Dmitry Vyukov, Ian Lance Taylor, Daniel Morsing, golang-dev
On Thu, May 30, 2013 at 2:35 AM, Dmitry Vyukov <dvy...@google.com> wrote:
Is it possible to rewrite that code in some other way so that it does
not use such interfaces?

I get very scared when I see discussion like this. Our job as language implementers is to implement the language. It is not to change the language because we are not good enough to implement it. Sometimes new information about implementation does feed back into language design, of course, but that must be done deliberately and treated as a language change, not a one-off comment in the middle of an implementation design.

Russ

Dmitry Vyukov

unread,
May 30, 2013, 11:46:41 AM5/30/13
to Russ Cox, Ian Lance Taylor, Daniel Morsing, golang-dev
I just want to collect information for now.
Good design and implementation of memory
allocator/GC/interfaces/chans/maps is not a simple task, and some
restrictions can make it much simpler to solve. Win-win changes are
great, but sometimes we need to make trade-off decisions. This is
where information matters.

Probably I formulated the question badly. What I want to understand is:
Is that code a good idiomatic Go? Or maybe the author will now say
that there is a better way to write it and the better way does not use
"integers in interfaces". I do not see "integers in interfaces" used
often.

Russ Cox

unread,
May 30, 2013, 11:49:11 AM5/30/13
to Dmitry Vyukov, Ian Lance Taylor, Daniel Morsing, golang-dev
It is very important to me that we do not penalize existing uses of the language. Non-pointer values in interfaces do happen in common cases, such as enumerated constant types with String methods.

Russ

Dmitry Vyukov

unread,
May 30, 2013, 11:52:31 AM5/30/13
to ⚛, golang-dev
On Wed, May 29, 2013 at 3:59 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
> On Wednesday, May 29, 2013 1:26:27 PM UTC+2, Dmitry Vyukov wrote:
>
>> On Wed, May 29, 2013 at 3:24 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
>> > From the document:
>> >
>> >
>> > 2. GC redesign.
>> >
>> > Problems:
>> >
>> > 2.1 GC is complex
>> >
>> > 2.2 type info is complex, fat and slow to process
>> >
>> > 2.3 lots of scattered memory accesses and indirection
>> >
>> > My opinion:
>> >
>> > 2.1 True
>> > 2.2 The current GC hasn't been optimized yet
>> > 2.3 The current GC hasn't been optimized yet
>>
>> Do you think it can be substantially faster (to outweigh complexity)
>> than what I propose?
>
>
> I do not know the answer. I only know that the current GC can be improved in
> multiple ways.

There are several reasons why I think it won't be very fast:
1. Scanblock algorithm is complex (read - slow).
2. The complexity does not allow to implement other important
optimizations (e.g. concurrent sweep, and if complex changes will be
implemented, we will be hunting bugs for weeks).
3. There are some inevitable additional memory accesses during scan --
e.g. bitmap, span, span type info array.
4. The data structures are expensive to maintain (e.g. settype,
compare it to 1 store to the first word of the object).

unread,
May 30, 2013, 12:11:57 PM5/30/13
to golan...@googlegroups.com, ⚛
1. The proposal increases memory consumption by adding an uintptr to the start of each object. We can get a fairly accurate idea about the final memory consumption of the proposal by changing the case MTypes_Empty in function runtime·settype() to allocate MTypes_Words instead of MTypes_Bytes.

2. How does the proposal deal with alignment guarantees? If a program allocates 16 bytes, the expectation is for the data to be aligned to 16 bytes.

Russ Cox

unread,
May 30, 2013, 12:23:08 PM5/30/13
to ⚛, golang-dev
On Thu, May 30, 2013 at 12:11 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
1. The proposal increases memory consumption by adding an uintptr to the start of each object. We can get a fairly accurate idea about the final memory consumption of the proposal by changing the case MTypes_Empty in function runtime·settype() to allocate MTypes_Words instead of MTypes_Bytes.

2. How does the proposal deal with alignment guarantees? If a program allocates 16 bytes, the expectation is for the data to be aligned to 16 bytes.

This is a serious problem. Per-object headers are not okay, for this reason.

Russ

Dmitry Vyukov

unread,
May 30, 2013, 12:25:58 PM5/30/13
to ⚛, golang-dev
On Thu, May 30, 2013 at 8:11 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
> 1. The proposal increases memory consumption by adding an uintptr to the
> start of each object. We can get a fairly accurate idea about the final
> memory consumption of the proposal by changing the case MTypes_Empty in
> function runtime·settype() to allocate MTypes_Words instead of MTypes_Bytes.

It is not precise.
The real memory consumption can be lower because when you allocate 24
bytes, the header adds no overhead (fits into 32 byte size class).
On the other hand, for 4096 byte allocation we will need to allocate
much more to add the header.

I have little data now. I've measured memory consumption on
test/bench/garbage/parser. Overall RSS increase was about 5% (heap
size was about 580MB, 900MB Sys).
I need to do more investigation.


> 2. How does the proposal deal with alignment guarantees? If a program
> allocates 16 bytes, the expectation is for the data to be aligned to 16
> bytes.

I understand that. Do we guarantee the alignment somewhere? I would
expect only 8-byte alignment guarantee (enough for any Go type).
Larger alignment can be achieved by manual offset (e.g. for SSE one
would allocate a slightly larger []float32 and then subslice from
beginning to get the necessary alignment).


> On Thursday, May 30, 2013 5:52:31 PM UTC+2, Dmitry Vyukov wrote:
>>
>> On Wed, May 29, 2013 at 3:59 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
>> > On Wednesday, May 29, 2013 1:26:27 PM UTC+2, Dmitry Vyukov wrote:
>> >>
>> >> Do you think it can be substantially faster (to outweigh complexity)
>> >> than what I propose?
>> >
>> >
>> > I do not know the answer. I only know that the current GC can be
>> > improved in
>> > multiple ways.
>>
>> There are several reasons why I think it won't be very fast:
>> 1. Scanblock algorithm is complex (read - slow).
>> 2. The complexity does not allow to implement other important
>> optimizations (e.g. concurrent sweep, and if complex changes will be
>> implemented, we will be hunting bugs for weeks).
>> 3. There are some inevitable additional memory accesses during scan --
>> e.g. bitmap, span, span type info array.
>> 4. The data structures are expensive to maintain (e.g. settype,
>> compare it to 1 store to the first word of the object).
>

Dmitry Vyukov

unread,
May 30, 2013, 12:27:50 PM5/30/13
to Russ Cox, ⚛, golang-dev
Where do we guarantee it?

Dmitry Vyukov

unread,
May 30, 2013, 12:30:30 PM5/30/13
to Russ Cox, Ian Lance Taylor, Daniel Morsing, golang-dev
I understand that. But rejecting to give up any bit of performance for
all primitive operations does not look like a good way forward.

Russ Cox

unread,
May 30, 2013, 12:34:44 PM5/30/13
to Dmitry Vyukov, ⚛, golang-dev
It is not guaranteed by the spec, but it has historically been guaranteed by the implementation, and we have stated on mailing lists and such that it will continue to hold. I believe this is an important property of the current allocator.

I realize that these constraints mean it is not possible to just reuse implementations of other languages. But I don't see that as necessarily a bad thing. If you could implement Go in terms of other languages, what would be the point of having Go?

Russ

Dmitry Vyukov

unread,
May 30, 2013, 12:37:30 PM5/30/13
to Russ Cox, ⚛, golang-dev
On Thu, May 30, 2013 at 8:34 PM, Russ Cox <r...@golang.org> wrote:
>> >> 1. The proposal increases memory consumption by adding an uintptr to
>> >> the
>> >> start of each object. We can get a fairly accurate idea about the final
>> >> memory consumption of the proposal by changing the case MTypes_Empty in
>> >> function runtime·settype() to allocate MTypes_Words instead of
>> >> MTypes_Bytes.
>> >>
>> >> 2. How does the proposal deal with alignment guarantees? If a program
>> >> allocates 16 bytes, the expectation is for the data to be aligned to 16
>> >> bytes.
>> >
>> >
>> > This is a serious problem. Per-object headers are not okay, for this
>> > reason.
>
>
>>
>> Where do we guarantee it?
>
>
> It is not guaranteed by the spec, but it has historically been guaranteed by
> the implementation, and we have stated on mailing lists and such that it
> will continue to hold. I believe this is an important property of the
> current allocator.

OK

Robert Griesemer

unread,
May 30, 2013, 12:49:45 PM5/30/13
to Russ Cox, Dmitry Vyukov, ⚛, golang-dev
On Thu, May 30, 2013 at 9:34 AM, Russ Cox <r...@golang.org> wrote:
I realize that these constraints mean it is not possible to just reuse implementations of other languages. But I don't see that as necessarily a bad thing. If you could implement Go in terms of other languages, what would be the point of having Go?

+1

unread,
May 30, 2013, 1:13:35 PM5/30/13
to golan...@googlegroups.com, Russ Cox, ⚛
It is possible to put the extra uintptr just before the start of an object.

void **p;
uintptr info = (uintptr)p[-1];

Dmitry Vyukov

unread,
May 30, 2013, 1:30:35 PM5/30/13
to ⚛, golang-dev, Russ Cox
This is a good question. Russ, what exactly alignment do you mean?
Would it be OK to put header in the end of the block?

Russ Cox

unread,
May 30, 2013, 1:47:00 PM5/30/13
to Dmitry Vyukov, ⚛, golang-dev
If I allocate a power-of-two size block, I expect it to be aligned to that power of two, up to page size.
You could put a header at the end of the block, but then they wouldn't pack very tightly so you'd waste a lot of memory due to fragmentation.

Russ

Carl Shapiro

unread,
May 30, 2013, 3:26:07 PM5/30/13
to Russ Cox, Dmitry Vyukov, ⚛, golang-dev
On Thu, May 30, 2013 at 9:34 AM, Russ Cox <r...@golang.org> wrote:
On Thu, May 30, 2013 at 12:27 PM, Dmitry Vyukov <dvy...@google.com> wrote:
On Thu, May 30, 2013 at 8:23 PM, Russ Cox <r...@golang.org> wrote:
> On Thu, May 30, 2013 at 12:11 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
>> 2. How does the proposal deal with alignment guarantees? If a program
>> allocates 16 bytes, the expectation is for the data to be aligned to 16
>> bytes.
>
>
> This is a serious problem. Per-object headers are not okay, for this reason.
 
Where do we guarantee it?

It is not guaranteed by the spec, but it has historically been guaranteed by the implementation, and we have stated on mailing lists and such that it will continue to hold. I believe this is an important property of the current allocator.

How is this property currently observable by a Go prograrm?

Carl Shapiro

unread,
May 30, 2013, 4:57:07 PM5/30/13
to Ian Lance Taylor, Dmitry Vyukov, Daniel Morsing, golang-dev
On Wed, May 29, 2013 at 6:27 AM, Ian Lance Taylor <ia...@golang.org> wrote:
That is what gccgo does, but for some types of code it introduces
considerable overhead.  Some code, like encoding/binary, uses small
values with methods and stores them in interfaces.  encoding/binary
only uses a couple of values, but I've seen code that uses a lot more.
 Forcing all those values into separate heap locations means a lot of
unnecessary heap allocations.

I just looked at encoding/binary.  The small values are global structures so there should be no need for a heap allocation as the interface can point to the structure directly.  If this is not done, it should be relatively easy to implement.

I am curious about code which makes more aggressive use of this idiom, can you send me a pointer to the code you are thinking of?


Ian Lance Taylor

unread,
May 30, 2013, 5:16:52 PM5/30/13
to Carl Shapiro, Dmitry Vyukov, Daniel Morsing, golang-dev
On Thu, May 30, 2013 at 1:57 PM, Carl Shapiro <csha...@google.com> wrote:
>
> I just looked at encoding/binary. The small values are global structures so
> there should be no need for a heap allocation as the interface can point to
> the structure directly. If this is not done, it should be relatively easy
> to implement.
>
> I am curious about code which makes more aggressive use of this idiom, can
> you send me a pointer to the code you are thinking of?

I sent Carl a pointer to some internal Google code. Externally, there
is syscall.Errno: a uintptr that implements the Error method and is
often converted to the built-in error interface. It's nice that we
don't need to do an allocation every time a function in syscall
returns an error.

Ian

Russ Cox

unread,
May 30, 2013, 5:29:30 PM5/30/13
to Carl Shapiro, Dmitry Vyukov, ⚛, golang-dev
Allocate memory and use it with something that requires alignment.

Russ

Carl Shapiro

unread,
May 30, 2013, 5:40:57 PM5/30/13
to Russ Cox, Dmitry Vyukov, ⚛, golang-dev
On Thu, May 30, 2013 at 2:29 PM, Russ Cox <r...@golang.org> wrote:
Allocate memory and use it with something that requires alignment.

I am not aware of anything in Go that requires alignment greater than the natural alignment of the type.  Can you provide me with an example?

Russ Cox

unread,
May 30, 2013, 5:51:56 PM5/30/13
to Carl Shapiro, Dmitry Vyukov, ⚛, golang-dev
Nothing in the language per se requires alignment, but since Go is a systems programming language it is common to run into system alignment requirements. For example, Linux I/O on a file opened with syscall.O_DIRECT requires page alignment, and on modern CPUs various vector instructions require 16- or 32-byte alignment even though the basic values are smaller. I am sure there are more examples.

Russ

Carl Shapiro

unread,
May 30, 2013, 10:53:02 PM5/30/13
to Russ Cox, Dmitry Vyukov, ⚛, golang-dev
On Thu, May 30, 2013 at 2:51 PM, Russ Cox <r...@golang.org> wrote:
Nothing in the language per se requires alignment, but since Go is a systems programming language it is common to run into system alignment requirements. For example, Linux I/O on a file opened with syscall.O_DIRECT requires page alignment, and on modern CPUs various vector instructions require 16- or 32-byte alignment even though the basic values are smaller. I am sure there are more examples.

The compiler does not generate code that uses vector types.  If the compiler supported vector types, the constraint of aligning a 16-byte allocation on a 16-byte boundary would alone be insufficient to provide the needed alignment gaurantees.  All memory intended to be loaded by a vector instruction would need to be associated with a type that has an alignment constraint of 16-bytes for the calling convention, local variable allocation, and structure packing to work.

In general, the alignment constraint of an object is best communicated through its type and not by the size of its allocation.  Many 16-byte objects are allocated in Go programs today, few of those objects have an alignment constrain greater than a machine word, but all objects have to pay for this constraint.  With the allocation mechanism we have today this might be cheap, but with other allocation mechanisms, such as bump-pointer, this can be a drag on efficiency.  However, by making alignment decisions with the type information, only the minority of objects with an allocation constraint pay extra.

Providing page aligned memory for O_DIRECT is simple within this framework.  Handling the asynchronous behavior of reads and writes performed to an fd opened with O_DIRECT without O_SYNC is still just as complicated.

Russ Cox

unread,
May 31, 2013, 12:23:58 AM5/31/13
to Carl Shapiro, Dmitry Vyukov, ⚛, golang-dev
It all works today - you can write your own assembly routines using special instructions, and you can call syscall.Read on O_DIRECT files. Please keep it working.

Russ

unread,
May 31, 2013, 2:32:14 AM5/31/13
to golan...@googlegroups.com, Russ Cox, ⚛
Another observation: If info is at p[0] and user data is at p[1..N] then there is an alignment issue on 32-bit platforms because float64 has to be aligned to 8 bytes.

Dmitry Vyukov

unread,
May 31, 2013, 2:35:17 AM5/31/13
to Carl Shapiro, Ian Lance Taylor, Daniel Morsing, golang-dev
On Fri, May 31, 2013 at 12:57 AM, Carl Shapiro <csha...@google.com> wrote:
On Wed, May 29, 2013 at 6:27 AM, Ian Lance Taylor <ia...@golang.org> wrote:
That is what gccgo does, but for some types of code it introduces
considerable overhead.  Some code, like encoding/binary, uses small
values with methods and stores them in interfaces.  encoding/binary
only uses a couple of values, but I've seen code that uses a lot more.
 Forcing all those values into separate heap locations means a lot of
unnecessary heap allocations.

I just looked at encoding/binary.  The small values are global structures so there should be no need for a heap allocation as the interface can point to the structure directly.  If this is not done, it should be relatively easy to implement.


Store to interface *must* make a copy. If the global var changes, the value in the interface must not.

Dmitry Vyukov

unread,
May 31, 2013, 2:35:55 AM5/31/13
to ⚛, golang-dev, Russ Cox
The plan was to use 8-byte headers always.

Keith Randall

unread,
May 31, 2013, 5:25:49 PM5/31/13
to Dmitry Vyukov, ⚛, golang-dev, Russ Cox

How about putting types at the end of the span? Something like this:


span: [obj0 obj1 ... objN type0 type1 ... typeN]


that way objects are packed and aligned correctly, but we can still get to the types efficiently. I'm thinking we use a page table like what Dmitry proposed:


struct PTEntry

{

 uint8 sizeclass;  // element size class (we have 61 classes right now)

 uint8 offset;  // offset from span beginning in pages (at most 15 right now - there's a few unused bits here)

};


from sizeclass we can get object size and span size (for small objects - for large objects we'd need to consult the MSpan). From those two, we can compute where the type should be.

This would mean spans would have fewer than the maximum number of objects in them, to account for the space for types. I'm thinking we could get away with 32 bits for the type info, as Type structs live in low memory. So that's 50% overhead for 8-byte objects, and goes down from there. We could also steal the low 2 bits of the type pointer to represent the allocated and marked bits so we wouldn't need a separate bit table like we have now (the noptr and boundary bits wouldn't be necessary anymore, and we can probably do special some other way).

This could dovetail with my proposal to have one-type-only spans. Special values in the PTEntry could indicate this so we can have just one type for the whole span and pack more objects in.



Carl Shapiro

unread,
May 31, 2013, 8:04:38 PM5/31/13
to Ian Lance Taylor, Dmitry Vyukov, Daniel Morsing, golang-dev
On Thu, May 30, 2013 at 2:16 PM, Ian Lance Taylor <ia...@golang.org> wrote:
I sent Carl a pointer to some internal Google code.  Externally, there
is syscall.Errno: a uintptr that implements the Error method and is
often converted to the built-in error interface.  It's nice that we
don't need to do an allocation every time a function in syscall
returns an error.

An errno value is usually a very small integer.  We can preallocate a range of small integer values around zero. When assigning an integer to an interface, if the value is within range, the preallocated integer can be used instead.

I suspect the enum case that Russ mentioned would benefit from this as well.  If we allocate storage for the names and values of constants for reflection, these integer values might be usable for interface assignment, too.

How hard would it be for someone to try this experiment in gccgo?

Ian Lance Taylor

unread,
May 31, 2013, 8:17:30 PM5/31/13
to Carl Shapiro, Dmitry Vyukov, Daniel Morsing, golang-dev
Not that easy, unfortunately, because right now the compiler always
allocates for integer values stored into interfaces. The code
generation would have to change to call a runtime function that could
either return a pointer to one of these static values or do the
allocation if necessary.

Ian

Russ Cox

unread,
May 31, 2013, 9:31:46 PM5/31/13
to Carl Shapiro, Ian Lance Taylor, Dmitry Vyukov, Daniel Morsing, golang-dev
On Fri, May 31, 2013 at 8:04 PM, Carl Shapiro <csha...@google.com> wrote:
On Thu, May 30, 2013 at 2:16 PM, Ian Lance Taylor <ia...@golang.org> wrote:
I sent Carl a pointer to some internal Google code.  Externally, there
is syscall.Errno: a uintptr that implements the Error method and is
often converted to the built-in error interface.  It's nice that we
don't need to do an allocation every time a function in syscall
returns an error.

An errno value is usually a very small integer.  We can preallocate a range of small integer values around zero. When assigning an integer to an interface, if the value is within range, the preallocated integer can be used instead.

And then you have a performance cliff as soon as someone uses a large enum value. 

What is the problem with storing non-pointers in interface values? Can't you use the other half of the interface value to supply type information?

Russ

Russ Cox

unread,
May 31, 2013, 9:32:18 PM5/31/13
to Keith Randall, Dmitry Vyukov, ⚛, golang-dev
On Fri, May 31, 2013 at 5:25 PM, Keith Randall <k...@google.com> wrote:

How about putting types at the end of the span?


SGTM. In fact I think this is what Atom's code already does. Look for s->types.compression.

Russ

Keith Randall

unread,
May 31, 2013, 10:53:32 PM5/31/13
to Russ Cox, Dmitry Vyukov, ⚛, golang-dev
Atom's code stores types out-of-line in a separately allocated buffer, linked off of the MSpan.

Dmitry Vyukov

unread,
Jun 1, 2013, 6:09:28 PM6/1/13
to Keith Randall, ⚛, golang-dev, Russ Cox
On Sat, Jun 1, 2013 at 1:25 AM, Keith Randall <k...@google.com> wrote:
> How about putting types at the end of the span? Something like this:
>
>
> span: [obj0 obj1 ... objN type0 type1 ... typeN]

Yes, this looks like the next best option after object headers.
I've prototyped it, and it showed 5% memory usage reduction (with no GC bitmap).

Probably the types must go in reverse order -- typeN ... type0.

> that way objects are packed and aligned correctly, but we can still get to
> the types efficiently. I'm thinking we use a page table like what Dmitry
> proposed:
>
>
> struct PTEntry
>
> {
>
> uint8 sizeclass; // element size class (we have 61 classes right now)
>
> uint8 offset; // offset from span beginning in pages (at most 15 right now
> - there's a few unused bits here)
>
> };

If we have type info at the end of the span, it must also contain span
size in pages. It will be needed to compute type info address.

I would prefer to store size instead of sizeclass. There is no easy
way to compute size from sizeclass. Currently we just do an additional
memory load.


> from sizeclass we can get object size and span size (for small objects - for
> large objects we'd need to consult the MSpan). From those two, we can
> compute where the type should be.
>
> This would mean spans would have fewer than the maximum number of objects in
> them, to account for the space for types. I'm thinking we could get away
> with 32 bits for the type info, as Type structs live in low memory. So
> that's 50% overhead for 8-byte objects, and goes down from there. We could
> also steal the low 2 bits of the type pointer to represent the allocated and
> marked bits so we wouldn't need a separate bit table like we have now (the
> noptr and boundary bits wouldn't be necessary anymore, and we can probably
> do special some other way).


50% is still too much and is not actually required, see the "design doc":
------
- instead of global GC bitmap use fater per-span metadata (this makes
the matainfo per-object, instead of per-word)
- for small objects embed pointer bitmark directly into the metainfo:
- for up to 48-byte objects metainfo is 1 byte: 1 bit - allocated,
1 bit - marked, 6 bits - pointer bitmask)
- for up to 112-byte objects metainfo is 2 bytes: 1 bit -
allocated, 1 bit - marked, 14 bits - pointer bitmask)
- for up to 240-byte objects metainfo is 4 bytes
- for up to 496-byte objects metainfo is 8 bytes
------

Currently Type's are also allocated form heap in reflect package. But
we may need to allocate them in a special way to handle interfaces
anyway. That may allow to fit them into 4 bytes as well.

Embed type info is critical. It must not be a Type pointer all the time.
I am not sure whether we want 8-byte embed type info, or limit it to 4
bytes. Probably it's not very important space/time tradeoff.

We will also need few (2 or 4) type info slots in the MSpan itself.
Because when we allocate 4096, we want to fit it into 1 page.

Yes, we can do 'special' bit in the following way:
MSpan contains a sorted list of special objects in the span. When we
sweep the span, we have a pointer into the list which initially points
to the head (a special object with lowest address). Then, compare each
object with the pointer. If not equal -- it's not special; if equal --
it's special and move the pointer to the next element in the special
list.

Dmitry Vyukov

unread,
Jun 1, 2013, 6:14:13 PM6/1/13
to roger peppe, golang-dev
On Fri, May 24, 2013 at 5:51 PM, roger peppe <rogp...@gmail.com> wrote:
>>> Hi,
>>>
>>> I want to share my rough plan on improving Go GC and memory allocator:
>>>
>>> https://docs.google.com/document/d/1HCPu3WKyCX3ZRYxmIMKTk0Ik1dePxKW1p02k3uhcft4/edit?usp=sharing
>>> I don't have all the details, and I don't have a timeline.
>>> But since these are very significant changes I want share it early,
>>> hear your thoughts and feedback, and ideally get a buy in from
>>> somebody for compiler/linker part (GC info generation) and parts of
>>> runtime work.
>>> Thanks!
>>
>>
>>
>> Carl pointed that GC bitmasks (bit per word saying - pointer/not pointer)
>> will conflict with interface -- the data word can be a pointer and not a
>> pointer for the same object.
>>
>> The point is that it would be very-very nice if we can use just
>> 1 bit to represent type info. It would reduce memory consumption and
>> working set, simplify and speedup GC code.
>
> This is probably a stupid idea, but I wonder if you could make use
> of the fact that the *second* pointer in an interface is always
> of one or two well known kinds (a type or a method-table+type)
>
> If the second of a pair of pointers is one of those kinds, we
> know it's an interface and can potentially work out if the
> first of the pair is a pointer.
>
> We could store types and method tables in a separate arena
> to make the check cheap.


For now it looks like the best option to me.
We will need to make Type/Itab addresses distinguishable from
everything else. Update reflect package to allocate types via runtime.
Do not mark interface value word as pointer, mark interface type/itab
word as pointer. During scan if a pointer is not nil and does not
point into heap area, check if it's a type/itab pointer, if so consult
type/itab and decide whether the next word is a pointer or not.

Keith Randall

unread,
Jun 1, 2013, 7:14:24 PM6/1/13
to Dmitry Vyukov, ⚛, golang-dev, Russ Cox
On Sat, Jun 1, 2013 at 3:09 PM, Dmitry Vyukov <dvy...@google.com> wrote:
On Sat, Jun 1, 2013 at 1:25 AM, Keith Randall <k...@google.com> wrote:
> How about putting types at the end of the span? Something like this:
>
>
> span: [obj0 obj1 ... objN type0 type1 ... typeN]

Yes, this looks like the next best option after object headers.
I've prototyped it, and it showed 5% memory usage reduction (with no GC bitmap).

Probably the types must go in reverse order -- typeN ... type0.

Sure, that works also.
 

> that way objects are packed and aligned correctly, but we can still get to
> the types efficiently. I'm thinking we use a page table like what Dmitry
> proposed:
>
>
> struct PTEntry
>
> {
>
>  uint8 sizeclass;  // element size class (we have 61 classes right now)
>
>  uint8 offset;  // offset from span beginning in pages (at most 15 right now
> - there's a few unused bits here)
>
> };

If we have type info at the end of the span, it must also contain span
size in pages. It will be needed to compute type info address.

I would prefer to store size instead of sizeclass. There is no easy
way to compute size from sizeclass. Currently we just do an additional
memory load.

sizeclass is all we need.  From sizeclass you can get both object size and span size from a small (fits in L1) table.  We can also do fun tricks like strength-reduce the divide-by-size computation using pre-computed multipliers for each sizeclass.

If you don't put sizeclass in PTEntry, you'll need both size and span size.  I think that will make PTEntry twice as big as it would be with just sizeclass.
Sure.

Dmitry Vyukov

unread,
Jun 2, 2013, 4:02:12 AM6/2/13
to Keith Randall, ⚛, golang-dev, Russ Cox
OK, I think we need to prototype both options.
Yes, most likely it will be twice as big with size.

Carl Shapiro

unread,
Jun 3, 2013, 3:09:55 PM6/3/13
to Russ Cox, Ian Lance Taylor, Dmitry Vyukov, Daniel Morsing, golang-dev
On Fri, May 31, 2013 at 6:31 PM, Russ Cox <r...@golang.org> wrote:
And then you have a performance cliff as soon as someone uses a large enum value. 

If you are truly worred about such cases, we can instead use the integers that are already allocated as part of the reflection data.  This is easily measured.
 
What is the problem with storing non-pointers in interface values? Can't you use the other half of the interface value to supply type information?

The common experience with other languages is that pointer maps needed for precise garbage collection will increase the object code size by around 30% when 1-bit per pointer is used.  Requiring additional bits per pointer (needed to do what you suggest) will at the least double that.

Russ Cox

unread,
Jun 4, 2013, 2:35:47 PM6/4/13