Go 1.4+ Garbage Collection (GC) Plan and Roadmap

5,633 views
Skip to first unread message

Rick Hudson

unread,
Aug 7, 2014, 12:18:02 PM8/7/14
to golan...@googlegroups.com
golang.org/s/go14gc holds the our current thinking about Garbage Collection (uppercase GC).

Comments are welcome here.

- Rick

Brendan Tracey

unread,
Aug 7, 2014, 12:53:01 PM8/7/14
to golan...@googlegroups.com
When you say "mutator (Go application code)", this refers to all normal Go code, correct? Basically, any Go code I would write as a user of Go.

The Go code that I run on a daily basis consists of GOMAXPROCS (or more) CPU-bound goroutines. My code has taken care to avoid generating garbage to keep GC pause times low (relative to the compute time). Currently, all CPUs running at roughly 100% for perhaps 95% of execution time. When you say "the majority of the concurrent GC is done on one or more dedicated CPUs", it sounds like you are referring to P and not M. Will I experience a 25% slowdown of my code on a 4-core machine (as I will only have GOMAXPROCS-1 cores to do the computation)?

Ian Lance Taylor

unread,
Aug 7, 2014, 1:24:20 PM8/7/14
to Brendan Tracey, golang-dev
On Thu, Aug 7, 2014 at 9:53 AM, Brendan Tracey <tracey....@gmail.com> wrote:
>
> When you say "mutator (Go application code)", this refers to all normal Go
> code, correct? Basically, any Go code I would write as a user of Go.

Yes.

From the perspective of the garbage collector, there is the virtuous
collector and memory allocator, and there is a lot of pesky annoying
code that goes around changing things. The collector's job would be
much simpler without all that other code, which is known as mutator
code. Most of us, without the advantage of the garbage collector's
perspective, just call that code the running program.


> The Go code that I run on a daily basis consists of GOMAXPROCS (or more)
> CPU-bound goroutines. My code has taken care to avoid generating garbage to
> keep GC pause times low (relative to the compute time). Currently, all CPUs
> running at roughly 100% for perhaps 95% of execution time. When you say "the
> majority of the concurrent GC is done on one or more dedicated CPUs", it
> sounds like you are referring to P and not M. Will I experience a 25%
> slowdown of my code on a 4-core machine (as I will only have GOMAXPROCS-1
> cores to do the computation)?

I believe it is true that this proposal will lead to a small slowdown
of programs that do not generate any garbage. However, I can't see
why it would be anything like 25% if there is nothing for the garbage
collector to do. But I guess I shouldn't speak for Rick.

Ian

sj...@x61.eu

unread,
Aug 7, 2014, 1:34:48 PM8/7/14
to golan...@googlegroups.com
Losing a core sounds fine for anything but current laptops. Very often laptop still have only two cores(Four threads though). I don't know how often Go is used for desktop apps but it seems like this could be expensive for laptops.

Vlad Didenko

unread,
Aug 7, 2014, 1:43:58 PM8/7/14
to golan...@googlegroups.com
Rick, Ian,

Are there plans to provide more documentation and clarity around STW phase expectations? Related issue/comment https://code.google.com/p/go/issues/detail?id=7868#c8

Vlad

Brendan Tracey

unread,
Aug 7, 2014, 1:44:00 PM8/7/14
to Ian Lance Taylor, golang-dev

On Aug 7, 2014, at 10:24 AM, Ian Lance Taylor <ia...@golang.org> wrote:

> On Thu, Aug 7, 2014 at 9:53 AM, Brendan Tracey <tracey....@gmail.com> wrote:
>>
>> When you say "mutator (Go application code)", this refers to all normal Go
>> code, correct? Basically, any Go code I would write as a user of Go.
>
> Yes.
>
> From the perspective of the garbage collector, there is the virtuous
> collector and memory allocator, and there is a lot of pesky annoying
> code that goes around changing things. The collector's job would be
> much simpler without all that other code, which is known as mutator
> code. Most of us, without the advantage of the garbage collector's
> perspective, just call that code the running program.

It seems like the GC algorithm would be simpler and more elegant without the mutator code. Maybe we should work on eliminating mutator code instead?

Thanks for the clarification.

>
>
>> The Go code that I run on a daily basis consists of GOMAXPROCS (or more)
>> CPU-bound goroutines. My code has taken care to avoid generating garbage to
>> keep GC pause times low (relative to the compute time). Currently, all CPUs
>> running at roughly 100% for perhaps 95% of execution time. When you say "the
>> majority of the concurrent GC is done on one or more dedicated CPUs", it
>> sounds like you are referring to P and not M. Will I experience a 25%
>> slowdown of my code on a 4-core machine (as I will only have GOMAXPROCS-1
>> cores to do the computation)?
>
> I believe it is true that this proposal will lead to a small slowdown
> of programs that do not generate any garbage. However, I can't see
> why it would be anything like 25% if there is nothing for the garbage
> collector to do. But I guess I shouldn't speak for Rick.

I read the text “the majority of the concurrent GC work is done on one or more dedicated CPUs while the mutators run on the remaining CPUs” as meaning that the GC would continually live on one (or N) P, while mutator code Gs would only ever run on the remaining P-N processors. Am I mistaken, and in fact, the CPU would only be dedicated while the actual GC is occurring? If so, how would that differ from running on a non-dedicated CPU? There can only be one G running on a P at any time.

Luna Duclos

unread,
Aug 7, 2014, 2:59:27 PM8/7/14
to Vlad Didenko, golan...@googlegroups.com
I'm rather surprised a whole array of code that currently works (any go pointers being passed to C) will suddenly be rendered incompatible, doesn't this break the Go compability guarantee in a fairly major way ? Or is this code already incompatible ? (I've never seen it misbehave at all)


--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dmitry Vyukov

unread,
Aug 7, 2014, 3:01:43 PM8/7/14
to Luna Duclos, Vlad Didenko, golang-dev
GC works concurrently with C code, and the GC is not concurrent and C
code is not really cooperating anyway. So at least mutations of Go
objects in C are seriously broken already.

Luna Duclos

unread,
Aug 7, 2014, 3:14:28 PM8/7/14
to Dmitry Vyukov, Vlad Didenko, golang-dev
If there has been a pointer passed to C code, doesn't this imply that there is a pointer to it on the Go side too ? (As it was passed in in the first place). 

This should guarantee that nothing passed to C code gets collected, no ? (This is assuming the C code doesn't store the Go pointer directly, which fits my use case).

Dmitry Vyukov

unread,
Aug 7, 2014, 3:18:10 PM8/7/14
to Luna Duclos, Vlad Didenko, golang-dev
This is not about freeing memory, this is about scanning memory.
If GC scans memory that gets mutated concurrently, it will miss some
pointers, this in turn will lead to freeing of still in-use objects.


On Thu, Aug 7, 2014 at 11:14 PM, Luna Duclos

minux

unread,
Aug 7, 2014, 3:20:14 PM8/7/14
to Luna Duclos, Dmitry Vyukov, Vlad Didenko, golang-dev
On Thu, Aug 7, 2014 at 3:14 PM, Luna Duclos <luna....@palmstonegames.com> wrote:
If there has been a pointer passed to C code, doesn't this imply that there is a pointer to it on the Go side too ? (As it was passed in in the first place). 

This should guarantee that nothing passed to C code gets collected, no ? (This is assuming the C code doesn't store the Go pointer directly, which fits my use case).
It might not get collected, but once we have a moving collector, it might be moved, and the C code will be accessing stale pointers.

For concurrent GC, the GC need to know updates to any pointer fields in the Go heap (see Dmitry's explanation), so you can't pass
Go objects with pointers to the C world. Passing other pure data to C might be fine (albeit with performance hit), see issue 8310.

minux

unread,
Aug 7, 2014, 3:21:46 PM8/7/14
to Luna Duclos, Vlad Didenko, golang-dev
On Thu, Aug 7, 2014 at 2:59 PM, Luna Duclos <luna....@palmstonegames.com> wrote:
I'm rather surprised a whole array of code that currently works (any go pointers being passed to C) will suddenly be rendered incompatible, doesn't this break the Go compability guarantee in a fairly major way ? Or is this code already incompatible ? (I've never seen it misbehave at all)
Please note that cgo is not included in the Go 1 compatibility contract (it only covers go code, and
explicitly excludes any tooling support)

r...@google.com

unread,
Aug 7, 2014, 3:24:34 PM8/7/14
to golan...@googlegroups.com, ia...@golang.org
If the GC is dormant all CPUs will be available to run goroutines. When the GC is triggered it will reacquire the CPU(s).  

Ian Lance Taylor

unread,
Aug 7, 2014, 3:55:00 PM8/7/14
to Luna Duclos, Vlad Didenko, golang-dev
On Thu, Aug 7, 2014 at 11:59 AM, Luna Duclos
<luna....@palmstonegames.com> wrote:
>
> I'm rather surprised a whole array of code that currently works (any go
> pointers being passed to C) will suddenly be rendered incompatible, doesn't
> this break the Go compability guarantee in a fairly major way ? Or is this
> code already incompatible ? (I've never seen it misbehave at all)

Technically the Go compatibility guarantee doesn't promise anything
about cgo.

That said this is clearly a concern. We need to make cgo code as
backward compatible as we can, while still keeping our main goal a
faster and lower-latency GC. There is some discussion on
http://golang.org/issue/8310, and more discussion here or in a
different thread is fine too.

Ian

janm...@gmail.com

unread,
Aug 7, 2014, 4:16:39 PM8/7/14
to golan...@googlegroups.com

Does this mean the "Hello World" minimal executable size will double yet again?

I just tested the 32b version. On Windows it is now at 1.3MB just to show "Hello, world" at the command-line.

Brendan Tracey

unread,
Aug 7, 2014, 4:17:37 PM8/7/14
to Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
The use case we have in Gonum is to allow the user to specify alternative BLAS libraries. Many of these libraries have been developed and tuned over a number of years, and can be tuned for specific architectures. Attempting to beat the Intel MKL is not on our development roadmap. Efficiency for such operations can be a big factor, and so we support a cgo interface. The C interface requires some integers and doubles, but importantly a double* that points to a possibly very large array of doubles. We pass this pointer with unsafe.Pointer(&x[0]), where x is a []float64. The C code modifies the values of the doubles in that array, which Go can then use for whatever. This is the use case we would like to be able to support. I’m not expecting any feedback or decisions at present, just trying to make the needs known.

Brendan Tracey

unread,
Aug 7, 2014, 4:26:21 PM8/7/14
to r...@google.com, golan...@googlegroups.com, ia...@golang.org
Would you mind elaborating on what is meant by “dedicated CPU” for the GC while it’s running? Does this mean that the scheduler won’t be able to interrupt it? As far as I understand, there’s one active M per P and at most one G can be active on that M, so how would the GC goroutine not have a dedicated CPU when it is running?

r...@google.com

unread,
Aug 7, 2014, 4:49:32 PM8/7/14
to golan...@googlegroups.com, r...@google.com, ia...@golang.org
One reasonable implementation would be that when it is time to run the GC the runtime will move a goroutine running on the P the GC wants to the scheduler queue and then use that P. The mechanics are implementation details, the high bit is that while the concurrent GC is running there will be up to a 25% reduction in CPU cycles available to run the mutators.

Dave Cheney

unread,
Aug 7, 2014, 7:24:03 PM8/7/14
to Rick Hudson, golan...@googlegroups.com

Thank you for thus overview Rick.

At the risk of taking the discussion on a tangent, are there any plans to control the upper size of the heap on a per program basis. I'm thinking of an the analogue of Java's -Xmx parameter.

I know that other users have asked for the maximum heap size to be raised, I am asking for the opposite, the ability to limit the maximum heap without having to modify (and rebuild) the runtime for each program.

The use case I have are embedded systems with a moderate, think  raspberry pi, amount of memory and no facility to swap. In that situation it is preferable that the program fault (and presumably be restarted) than to continue to grow until the kernel is forced to intervene.

Thanks for your time.

Dave

--

Andrew Gerrand

unread,
Aug 7, 2014, 7:27:21 PM8/7/14
to Dave Cheney, Rick Hudson, golang-dev

On 8 August 2014 09:23, Dave Cheney <da...@cheney.net> wrote:

I know that other users have asked for the maximum heap size to be raised, I am asking for the opposite, the ability to limit the maximum heap without having to modify (and rebuild) the runtime for each program.

The use case I have are embedded systems with a moderate, think  raspberry pi, amount of memory and no facility to swap. In that situation it is preferable that the program fault (and presumably be restarted) than to continue to grow until the kernel is forced to intervene.


ulimit?

Dave Cheney

unread,
Aug 7, 2014, 7:31:41 PM8/7/14
to Andrew Gerrand, r...@golang.org, golang-dev

I must admit that never occurred to me. I mainly don't trust it to do the right thing on Linux, and hence never tried.

I guess I need to do some experimentation.

Ian Lance Taylor

unread,
Aug 7, 2014, 7:42:28 PM8/7/14
to Andrew Gerrand, Dave Cheney, Rick Hudson, golang-dev
It should work, but I don't think it does. See
http://golang.org/issue/5049 .

Ian

Dave Cheney

unread,
Aug 7, 2014, 7:48:47 PM8/7/14
to Ian Lance Taylor, Andrew Gerrand, r...@golang.org, golang-dev

Well, that's a bummer.

What about other ways of controlling MaxHeap, possibly a -D value passed during linking (with a sensible default, obviously)

Andrew Gerrand

unread,
Aug 7, 2014, 7:58:42 PM8/7/14
to Dave Cheney, Ian Lance Taylor, r...@golang.org, golang-dev
Today I learned that ulimit is enforced by the process, not the OS. Weird.

minux

unread,
Aug 7, 2014, 8:10:15 PM8/7/14
to Andrew Gerrand, Dave Cheney, Ian Lance Taylor, r...@golang.org, golang-dev
On Thu, Aug 7, 2014 at 7:58 PM, Andrew Gerrand <a...@golang.org> wrote:
Today I learned that ulimit is enforced by the process, not the OS. Weird.
It's enforced by the OS, that issue just intends to make Go play nice under the limit
(e.g. reject known bad allocations without making OS taking down the whole process
when decoding a corrupted gob)

Rick Hudson

unread,
Aug 7, 2014, 8:22:26 PM8/7/14
to minux, Andrew Gerrand, Dave Cheney, Ian Lance Taylor, golang-dev
You might consider GOGC. If you know max live data size you can come pretty close to what you are asking for. 

Russ Cox

unread,
Aug 7, 2014, 8:32:56 PM8/7/14
to Brendan Tracey, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
On Thu, Aug 7, 2014 at 4:17 PM, Brendan Tracey <tracey....@gmail.com> wrote:
The use case we have in Gonum is to allow the user to specify alternative BLAS libraries. Many of these libraries have been developed and tuned over a number of years, and can be tuned for specific architectures. Attempting to beat the Intel MKL is not on our development roadmap. Efficiency for such operations can be a big factor, and so we support a cgo interface. The C interface requires some integers and doubles, but importantly a double* that points to a possibly very large array of doubles. We pass this pointer with unsafe.Pointer(&x[0]), where x is a []float64. The C code modifies the values of the doubles in that array, which Go can then use for whatever. This is the use case we would like to be able to support. I'm not expecting any feedback or decisions at present, just trying to make the needs known.

Thanks for letting us know. Passing a Go pointer like that to C is problematic because eventually we will want the garbage collector to be able to move things. A collector must update all the references when it does the move, and it cannot update any references stored in C. Java allows pinning such objects so they cannot move, but that handcuffs the collector quite a bit and we'd like to avoid that. 

Where does your []float64 come from? The intended use case has always been that for cgo you would call C.malloc to get memory and then share *that* pointer between C and Go (and then call C.free when done). C doesn't care. If you could allocate your doubles that way, you'd certainly avoid any restrictions that might be necessary.

Russ

brainman

unread,
Aug 7, 2014, 9:02:56 PM8/7/14
to golan...@googlegroups.com
Rick,

You might have problems with timers on windows. At this moment Go runtime clock ticks at around 15ms on windows (see our struggle here https://codereview.appspot.com/108700045). You might get 1ms, if you lucky (if you running as administrator on windows/amd64).

You also say that your target platform is "a generic $1000 desktop box running Linux.". What about other platforms? Will they run slower / faster?

Alex

Brendan Tracey

unread,
Aug 7, 2014, 9:13:09 PM8/7/14
to Russ Cox, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev, Dan Kortschak
On Aug 7, 2014, at 5:32 PM, Russ Cox <r...@golang.org> wrote:

On Thu, Aug 7, 2014 at 4:17 PM, Brendan Tracey <tracey....@gmail.com> wrote:
The use case we have in Gonum is to allow the user to specify alternative BLAS libraries. Many of these libraries have been developed and tuned over a number of years, and can be tuned for specific architectures. Attempting to beat the Intel MKL is not on our development roadmap. Efficiency for such operations can be a big factor, and so we support a cgo interface. The C interface requires some integers and doubles, but importantly a double* that points to a possibly very large array of doubles. We pass this pointer with unsafe.Pointer(&x[0]), where x is a []float64. The C code modifies the values of the doubles in that array, which Go can then use for whatever. This is the use case we would like to be able to support. I'm not expecting any feedback or decisions at present, just trying to make the needs known.

Thanks for letting us know. Passing a Go pointer like that to C is problematic because eventually we will want the garbage collector to be able to move things. A collector must update all the references when it does the move, and it cannot update any references stored in C. Java allows pinning such objects so they cannot move, but that handcuffs the collector quite a bit and we'd like to avoid that. 

Where does your []float64 come from?

Our Matrix type [0] is as follows

type Dense struct {
    mat RawMatrix
}

type RawMatrix struct {     // [1]
    Rows, Cols int
    Stride int
    Data []float64
}

RawMatrix.Data is always generated from Go. Users then populate their matrix from wherever the data comes (machine learning features, linear algebra samples, random numbers, whatever).

The intended use case has always been that for cgo you would call C.malloc to get memory and then share *that* pointer between C and Go (and then call C.free when done). C doesn't care. If you could allocate your doubles that way, you'd certainly avoid any restrictions that might be necessary.

This doesn’t seem to be desirable or possible, even with a complete restructuring of how the package works. Reasoning:

- I’d like to allow pure Go code. The blas libraries are shared libraries (written in C or Fortran with a C wrapper), and they can be a pain to install and are often non-portable. Pure go makes that easy for unsophisticated users (and maybe the Go code will eventually be close enough to their counterparts). At the very least we’d have to sometimes allocate in Go, and sometimes allocate in C. We represent the blas implementation as an interface [2], and we could possibly do some type switching to choose a malloc language, but that seems error-prone.

- The blas functions only do operations on data, they do not allocate. For example, in matrix multiply (C = A*B), pointers are passed for the locations of A, B and C; the result of the matrix multiplication is stored in the already-allocated memory for C. When we pass unsafe.Pointer(&C[0]), similarly the float64 in RawMatrix.Data are modified, and all of the other methods of matrix (including extracting the data slice) are available

- There are a number of good reasons for the user to provide []float64 themselves. This would have to be forbidden if all of the data must be allocated in C.

- I don’t see how to call Free correctly. Deconstructors are not possible in Go, so we would have to Free method on Matrix. Unlike something like a file, where they are often opened, read from, and then closed, it’s pretty frequent for a Matrix to be created somewhere, and be used elsewhere (possibly in a concurrent environment). This is effectively manual memory management in a GCed language (and code that uses the Go interface will never observe a memory leak because the Go GC will take care of it otherwise). 

I don’t know how Java pinning works, but is it possible there could be some non-permanent GC pin? For the BLAS use case, the time the pointer is in C is temporary (though much longer than 50 milliseconds sadly). One idea would be that we pin those pointers during the call to C, and release them once the call is complete. There would be gaps where the pointer would exist purely in Go, and the data could be moved. 

Dan Kortschak

unread,
Aug 7, 2014, 9:28:48 PM8/7/14
to Brendan Tracey, Russ Cox, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
I'll chime in as the original author of the cblas code.

Brendan's concerns here are well founded. If we need to allocate matrix
data in C we lose a significant level of flexibility and ease of reading
code; we need to allocate up front and then keep a Go value that would
care for C allocations via cgo. Because of the design criteria in the
gonum matrix packages that we be able to replace BLAS implementations
(either between C <-> Go or different C impls) the impact of this change
would flow back to any Go implementations (currently being worked on by
Brendan). This would in turn mean that we would have a Go implementation
that had to fake all the rigmarole required for handling the C case, and
client code would need to include the boilerplate of dealing with that
as well.

These issue will need to be addressed by us in the longer term as we
grapple with similar issues when dealing with CUDA/OpenCL
implementations, but we are nowhere near that at this stage.

Dan

Stephen Gutekanst

unread,
Aug 7, 2014, 10:10:36 PM8/7/14
to golan...@googlegroups.com, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
I'd like to chime in as well, as this is of real concern for me. What about the performance of interacting with existing Go packages?

Today, I and many others use the image package to decode various image types into image.RGBA data. That data can then be fed directly into OpenGL or other C API's. If sending slice data from Go to C is broken in this way, then we are left with two fixes for this specific case:
  • use C.malloc to allocate memory, copy the entire image data into that C buffer, then tell OpenGL to upload that C buffer to the graphics hardware. This could be drastically less performant.
  • Fork all of the in-use image packages to allocate memory in C and not in Go, so that a copy doesn't occur. Ew.
I also share others concerns of ease-of-use and readability; passing Go slices into C is extremely convenient.

Stephen




minux

unread,
Aug 8, 2014, 12:35:02 AM8/8/14
to janm...@gmail.com, golang-dev
On Thu, Aug 7, 2014 at 4:16 PM, <janm...@gmail.com> wrote:
Does this mean the "Hello World" minimal executable size will double yet again?
No, of course not. Where did you infer that conclusion?

Ian Lance Taylor

unread,
Aug 8, 2014, 12:45:52 AM8/8/14
to janm...@gmail.com, golang-dev
On Thu, Aug 7, 2014 at 1:16 PM, <janm...@gmail.com> wrote:
>
> Does this mean the "Hello World" minimal executable size will double yet
> again?

I don't see any reason why this would affect the executable size at
all. What makes you concerned about that?

Ian

Russ Cox

unread,
Aug 8, 2014, 10:06:43 AM8/8/14
to Dan Kortschak, Brendan Tracey, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
On Thu, Aug 7, 2014 at 9:28 PM, Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:
These issue will need to be addressed by us in the longer term as we
grapple with similar issues when dealing with CUDA/OpenCL
implementations, but we are nowhere near that at this stage.

The issue with Go vs C ownership of data sounds a lot like the same issue you'd end up with in a CUDA/OpenCL implementation. Maybe you are closer than you think.

Russ

David Crawshaw

unread,
Aug 8, 2014, 10:09:49 AM8/8/14
to Brendan Tracey, Russ Cox, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev, Dan Kortschak
On Thu, Aug 7, 2014 at 9:13 PM, Brendan Tracey <tracey....@gmail.com> wrote:
> Our Matrix type [0] is as follows
>
> type Dense struct {
> mat RawMatrix
> }
>
> type RawMatrix struct { // [1]
> Rows, Cols int
> Stride int
> Data []float64
> }

I have been facing the same problem designing the OpenGL ES API for
the Android port. One option I explored is Go managed C memory. It
doesn't fit my API well, but yours is much closer.

If you have users use a *RawMatrix instead of a RawMatrix, you can
offer them a function like this:

func New(rows, cols int) *RawMatrix {
m := &RawMatrix{
Rows: rows,
Cols: cols,
}
const float64Size = 8
d := unsafe.Pointer(C.malloc(C.size_t(float64Size*rows*cols)))
runtime.SetFinalizer(m, func(m *RawMatrix) {
if uintptr(unsafe.Pointer(m.Data)) != uintptr(d) {
// tell the user they did something terrible, don't modify Data directly
}
C.free(d)
})
m.Data = (*[1<<28]float64)(d)[:len(rows*cols)]
return m
}

You are now free to pass raw pointers of m.Data into C in your
implementation, and the garbage collector is responsible for cleaning
up your C memory.

(Unfortunately for me, the OpenGL API really wants to take a raw
[]float32 as a parameter, which doesn't give me a Go object to use for
finalization. Without some extra support from the runtime, I'm
probably going to have to introduce some messy gl.Floats and gl.Bytes
containers.)

d.

ma...@kevac.org

unread,
Aug 8, 2014, 10:20:08 AM8/8/14
to golan...@googlegroups.com
> 10 msec out of every 50 msec

Do I understand it correctly? Does it mean the GC will use 1/5 of CPU time? It's... bad...

Dmitry Vyukov

unread,
Aug 8, 2014, 10:34:19 AM8/8/14
to Stephen Gutekanst, golang-dev, Brendan Tracey, Russ Cox, Ian Taylor, Luna Duclos, Vlad Didenko
Just another consideration.
Syscalls are very similar to cgo in this respect -- they run
concurrently with GC.
Now add to this generic io.Reader/Writer interfaces, that abstract you
from real data source destination.
Now the question is -- does net.Conn.Write need to allocate C memory,
copy the data from the slice, call syscall.Write and free the C
memory? Or all users of io.Writer need to use C memory in the case the
data will end up in syscall/cgo?
Neither option sounds good to me.

Michael Jones

unread,
Aug 8, 2014, 10:51:32 AM8/8/14
to Dmitry Vyukov, Stephen Gutekanst, golang-dev, Brendan Tracey, Russ Cox, Ian Taylor, Luna Duclos, Vlad Didenko
The ideal solution to the issue that Brendan raises would be to support real runtime arrays in Go and to have BLAS-like facilities defined for them. There could be a simple BLAS-in-Go in the tree as well as architecture-dependant local implementations such as the Intel code, which is tuned not just for every CPU/Cache/Vector configuration but also for invisible access to MIC coprocessors. This would be ideal for those of us who do significant computations in Go.

Compare this approach with the alternative of trying to make access and processing of gigabytes of data happen in some remote and inefficient way. I would think that the GC would profit from understanding about large coherent allocations that either should not be moved or that need to be moved at certain key points ("after and instruction" where "after" means after a 1000x1000 matrix inversion). Go would certainly profit from being natively first-class at matrix math.


--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

minux

unread,
Aug 8, 2014, 2:46:54 PM8/8/14
to Dmitry Vyukov, Stephen Gutekanst, golang-dev, Brendan Tracey, Russ Cox, Ian Taylor, Luna Duclos, Vlad Didenko
How about we add some buffers that are not managed by GC, and have pre-M caching
for Read/Write syscalls? (e.g. for small Read/Writes, per-M buffer will suffice, and for
extremely large Read/Writes, we can either allocate a large buffer through C.malloc or
just emulate it with smaller buffers [both not ideal]). I can't hink of a way that could
remove the extra copying without special casing syscall.Syscall*.

But then we have all those Windows syscalls.

Stephen Gutekanst

unread,
Aug 8, 2014, 3:52:05 PM8/8/14
to golan...@googlegroups.com, dvy...@google.com, stephen....@gmail.com, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
Disclaimer: I don't even pretend to understand the runtime/compiler/linker internals at all.

Is there any way we could detect when a Go allocated object would end up being passed into a syscall or CGO call and have a different, specific to that case, garbage collector manage that memory? That is, if a Go allocated object would ever end up being passed into syscall or CGO then the old GC pattern could be used (or something along those lines?).

I'm sure that there is probably a huge flaw with that line of thinking, so, what is it? Nearly impossible? Overly complex? 

Stephen

minux

unread,
Aug 8, 2014, 4:03:41 PM8/8/14
to Stephen Gutekanst, golang-dev, Dmitry Vyukov, Brendan Tracey, Russ Cox, Ian Taylor, Luna Duclos, Vlad Didenko
It means we will have two complete different GC engines, and to make matters worse, they have to cooperate.
 

Ian Lance Taylor

unread,
Aug 8, 2014, 4:51:48 PM8/8/14
to Stephen Gutekanst, golang-dev, Dmitry Vyukov, Brendan Tracey, Russ Cox, Luna Duclos, Vlad Didenko
On Fri, Aug 8, 2014 at 12:52 PM, Stephen Gutekanst
<stephen....@gmail.com> wrote:
>
> Is there any way we could detect when a Go allocated object would end up
> being passed into a syscall or CGO call and have a different, specific to
> that case, garbage collector manage that memory? That is, if a Go allocated
> object would ever end up being passed into syscall or CGO then the old GC
> pattern could be used (or something along those lines?).

In the general case, no, because values cross package boundaries. At
the time of allocating memory we can't know whether that memory will
be passed to a syscall or cgo.

Ian

Stephen Gutekanst

unread,
Aug 8, 2014, 5:03:32 PM8/8/14
to Ian Lance Taylor, golang-dev, Dmitry Vyukov, Brendan Tracey, Russ Cox, Luna Duclos, Vlad Didenko
I see, that makes sense. Am I correct in thinking that the concurrently compacting GC means memory can move under the feet of any concurrently executing CGO/syscall code?

What if for short Reads/Writes, we maintain per-M buffers as Minux suggested:
  - Copy the data into per-M buffer.
  - Make the syscall using the per-M buffer.
    - CGO users would maintain their own buffers for this purpose, perhaps.

Then for larger Read/Write where an intermediate copy would cause a large performance degration:
  - Have the GC consider performing a garbage collection immediately (if these larger reads/writes occur very often, this could prove crucial to keeping memory usage low).
  - Disable the GC.
  - Make the syscall, and wait for it to complete.
  - Enable GC again.

My main concern resides with large copies of data into CGO land (the image example I posted here earlier).
Stephen
--
Follow me on twitter @slimsag.

Keith Randall

unread,
Aug 8, 2014, 5:42:33 PM8/8/14
to Stephen Gutekanst, Ian Lance Taylor, golang-dev, Dmitry Vyukov, Brendan Tracey, Russ Cox, Luna Duclos, Vlad Didenko
Even in a future compacting collector we probably don't want to compact large objects, as they are expensive to move.  So maybe we can just assert that big (non-pointer-containing) objects won't move and thus can be passed to syscall/cgo directly.

We still would have to deal with small objects.  Maybe per-M buffers would work, although IMO pinning small objects would be a more workable solution.



--

Ian Lance Taylor

unread,
Aug 8, 2014, 9:12:46 PM8/8/14
to ma...@kevac.org, golang-dev
That's a maximum, not a minimum.

Ian

andrey mirtchovski

unread,
Aug 8, 2014, 9:12:49 PM8/8/14
to ma...@kevac.org, golang-dev
> It's... bad...

But it feels so... good!

(hint: it's an upper bound)

Dan Kortschak

unread,
Aug 8, 2014, 9:34:14 PM8/8/14
to Russ Cox, Brendan Tracey, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
On Fri, 2014-08-08 at 10:06 -0400, Russ Cox wrote:
> The issue with Go vs C ownership of data sounds a lot like the same
> issue you'd end up with in a CUDA/OpenCL implementation. Maybe you are
> closer than you think.

Yes, you are absolutely right it is pretty much exactly the same. I
should rephrase: It's not somewhere I am really ready to be right now;
it is the difference between having a car rush head long at you and
choosing to go for a serious free climb. One feels OK and the other
doesn't.

The other issue that factors into the decision here is that the effort
needed to be put into managing CUDA/OpenCL objects will be well paid
back by the performance of the GPU processing. I'm not convinced that
that is true for the same amount of work to retain the 3x benefit we get
from cgo calls into cblas. Again, this is a psychological argument, not
an engineering one.

Dan

Dan Kortschak

unread,
Aug 9, 2014, 1:13:40 AM8/9/14
to David Crawshaw, Brendan Tracey, Russ Cox, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
I don't think this works for us. We have the notion of a view
(essentially a 2D slice of the matrix - very much like the tables
proposal). That means it's possible for the original RawMatrix to be
garbage collected and there still be a desire to have the C-allocated
memory remain allocated. It would be possible to deal with this, but
that would mean we would be implementing some kind of garbage collector
ourselves.

Dan

gvdschoot

unread,
Aug 9, 2014, 3:14:23 AM8/9/14
to golan...@googlegroups.com, craw...@golang.org, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
What if you add a flag?

func New(rows, cols int, alloc bool) *RawMatrix { }

That way you can say up front whether it is a view or a memory allocated / freed slice.

However if you want Go to allocate the memory, how about a lock/unlock combination before/after each C call?
Would that do the trick of preventing the GC to move the memory? If that is true and not very costly in terms of benchmark time (that is two assumptions, I am on a slippery path), the only question left is how to wrap it up?

Dan Kortschak

unread,
Aug 9, 2014, 3:19:58 AM8/9/14
to gvdschoot, golan...@googlegroups.com, craw...@golang.org, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
If it comes to it, I would rather just reference count. It's getting
messy though.

gvdschoot

unread,
Aug 9, 2014, 3:28:39 AM8/9/14
to golan...@googlegroups.com, gvds...@gmail.com, craw...@golang.org, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
Why is a flag such a bad thing that you want to implement a whole bunch of code instead? The flag is there only once (at the New() call), and is easy for anyone to understand.

gvdschoot

unread,
Aug 9, 2014, 3:52:41 AM8/9/14
to golan...@googlegroups.com, gvds...@gmail.com, craw...@golang.org, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
Still, if Go did have a function, lets say in runtime, to lock the memory of a slice that would answer quite a few questions.

slice := make([]float64, size)
runtime.LockMemory(slice)

And why not? With C.malloc() the same is happening.

Or maybe an even better suggestion:

C.malloc_gc()

A function that works like C.malloc() with the exception that the memory is garbage collected.

Dan Kortschak

unread,
Aug 9, 2014, 4:55:06 AM8/9/14
to gvdschoot, golan...@googlegroups.com, craw...@golang.org, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
How does the code know when to free the C-allocated block? The flag
specifies allocation or not, that's fine, but what we need to handle is
when a block can be freed. This can happen when no matrix is holding a
view on it (either the original allocator or any sub matrix view that
has been created against it). I'm not opposed to flags, I just don't see
how it answers this particular question.

Gustavo Niemeyer

unread,
Aug 9, 2014, 5:16:38 AM8/9/14
to Dan Kortschak, golang-dev, Russ Cox, Ian Taylor
I'll surely have relevant issues in a few packages as well. For
example, the qml package must be able to represent a Go value within
C/C++ space or the whole concept will break down. We might use
something like a map[interface{}]uintptr to assign unique ids to the
values so that the pointers get moved but the ids don't, but then
these values become uncollectable.

Perhaps rather than memory pinning down the runtime might offer a
weak-referenced map implementation, resembling a bit the idea of
sync.Pool but with different semantics.
> --
> You received this message because you are subscribed to the Google Groups "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--

gustavo @ http://niemeyer.net

Gustavo Niemeyer

unread,
Aug 9, 2014, 5:20:55 AM8/9/14
to Dan Kortschak, golang-dev, Russ Cox, Ian Taylor
On Sat, Aug 9, 2014 at 11:16 AM, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
> Perhaps rather than memory pinning down the runtime might offer a
> weak-referenced map implementation, resembling a bit the idea of
> sync.Pool but with different semantics.

We of course need to be able to map back from the id to the pointer as
well, so maybe a sync.WeakPointer or similar might be a better model.


gustavo @ http://niemeyer.net

gvdschoot

unread,
Aug 9, 2014, 5:31:27 AM8/9/14
to golan...@googlegroups.com, gvds...@gmail.com, craw...@golang.org, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
You can use defer C.free() in the same function where you call the New() function. That is why you need to know if the New() function allocates memory or not.

daniel...@learnosity.com

unread,
Aug 9, 2014, 7:51:12 AM8/9/14
to golan...@googlegroups.com
The tone of this discussion is somewhat odd. A casual reader could be forgiven for coming away with the impression that Go has a complex, subtle memory model.

Here's the simplification of the Go memory model that I carry around in my head:

* if a variable is in scope, it is safe to use that variable (perhaps not concurrently)
* if a variable is not in scope, I probably don't have to worry about it taking up memory

I think understand the problem with cgo complicating this, but I don't feel that anyone has actually adequately stated the problem. Is it simply that some people want to be able to hold a reference to memory allocated by Go in C code, where the reference to it in Go has fallen out of scope?

On a side note, I think the Go team can be congratulated for their foresight in bringing some of the overhead into 1.4, so that the performance of 1.5 is not surprising. Very good thinking.

unread,
Aug 9, 2014, 10:19:19 AM8/9/14
to golan...@googlegroups.com
Question: Why not provide a command-line option to select the GC method at compile-time? Is the reason the complexity of implementation?

On Thursday, August 7, 2014 6:18:02 PM UTC+2, Rick Hudson wrote:
golang.org/s/go14gc holds the our current thinking about Garbage Collection (uppercase GC).

Comments are welcome here.

- Rick

rjeczalik

unread,
Aug 9, 2014, 10:31:15 AM8/9/14
to ⚛, golan...@googlegroups.com
On 9 August 2014 16:19, ⚛ <0xe2.0x...@gmail.com> wrote:
>
> Question: Why not provide a command-line option to select the GC method at compile-time? Is the reason the complexity of implementation?

Which option would you choose, if your application depends on two packages: first was designed to work with low-latency GC and the second to work with all the CPU cores?

unread,
Aug 9, 2014, 11:59:21 AM8/9/14
to golan...@googlegroups.com, 0xe2.0x...@gmail.com, rjec...@gmail.com
Impossible to solve problems remain impossible to solve whatever you do.

rjeczalik

unread,
Aug 9, 2014, 12:09:29 PM8/9/14
to ⚛, golang-dev
In general yes, in this particular example the problem exist because it could exist - users optimize packages for different GC strategies because they can make a pick. No choice, no problem.

unread,
Aug 9, 2014, 2:02:01 PM8/9/14
to golan...@googlegroups.com, daniel...@learnosity.com
On Saturday, August 9, 2014 1:51:12 PM UTC+2, daniel...@learnosity.com wrote:
The tone of this discussion is somewhat odd. A casual reader could be forgiven for coming away with the impression that Go has a complex, subtle memory model.

Here's the simplification of the Go memory model that I carry around in my head:

* if a variable is in scope, it is safe to use that variable (perhaps not concurrently)
* if a variable is not in scope, I probably don't have to worry about it taking up memory

In my opinion, a fundamental concept in a GC is reachability of objects.

I think understand the problem with cgo complicating this, but I don't feel that anyone has actually adequately stated the problem. Is it simply that some people want to be able to hold a reference to memory allocated by Go in C code, where the reference to it in Go has fallen out of scope?

The underlying problem is that the new garbage collector will want to see all memory operations involving a *T pointer passed to C. If the C code stores the pointer into say []*T (viewed as **T by C code), the concurrent GC wants to know about all such stores. C code can be made compatible with a concurrent GC provided that it cooperates with the GC. When C code stores the pointer into a memory location, it would need to tell the Go runtime about it by calling a function, such as: copied(dst, ptr).

In case C's usage of the pointer is limited, C code does not need to inform the Go runtime about load/store operations involving the pointer.

minux

unread,
Aug 9, 2014, 2:10:15 PM8/9/14
to ⚛, golang-dev, daniel...@learnosity.com
On Sat, Aug 9, 2014 at 2:02 PM, ⚛ <0xe2.0x...@gmail.com> wrote:
On Saturday, August 9, 2014 1:51:12 PM UTC+2, daniel...@learnosity.com wrote:
The tone of this discussion is somewhat odd. A casual reader could be forgiven for coming away with the impression that Go has a complex, subtle memory model.

Here's the simplification of the Go memory model that I carry around in my head:

* if a variable is in scope, it is safe to use that variable (perhaps not concurrently)
* if a variable is not in scope, I probably don't have to worry about it taking up memory

In my opinion, a fundamental concept in a GC is reachability of objects.

I think understand the problem with cgo complicating this, but I don't feel that anyone has actually adequately stated the problem. Is it simply that some people want to be able to hold a reference to memory allocated by Go in C code, where the reference to it in Go has fallen out of scope?

The underlying problem is that the new garbage collector will want to see all memory operations involving a *T pointer passed to C. If the C code stores the pointer into say []*T (viewed as **T by C code), the concurrent GC wants to know about all such stores. C code can be made compatible with a concurrent GC provided that it cooperates with the GC. When C code stores the pointer into a memory location, it would need to tell the Go runtime about it by calling a function, such as: copied(dst, ptr).
One crazy idea: provide our own gcc plugin to make gcc emit appropriate write barriers for us.
It's probably even doable with simple MELT plugins?

Although, that scheme will still break down when we have moving GC.

Ulrich Kunitz

unread,
Aug 9, 2014, 3:17:44 PM8/9/14
to golan...@googlegroups.com, tracey....@gmail.com, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
I'm definitely in favor of implement a parallel GC with extremely short Stop-the-World pauses. The GC should however support that pointers to Go objects can be provided to syscalls and cgo calls. Referring to C.malloc and C.free puts the burden for addressing the issue outside of the GC and will result in a lot of complicated code by different library developers that will break in a number of interesting ways.

I see three options to address the issue:

1) Non-Compacting GC as in use right now.
2) Pinning of allocated objects that are referenced in a syscall or CGO call.
3) Support of a non-compacting area where objects are moved before the syscall or cgo call happens.

Such a non-compacting area makes anyway sense for large objects to prevent the repeated copying of large amounts of data by the GC.

Uli

Dan Kortschak

unread,
Aug 9, 2014, 6:16:57 PM8/9/14
to gvdschoot, golan...@googlegroups.com, craw...@golang.org, tracey....@gmail.com, r...@golang.org, ia...@golang.org, luna....@palmstonegames.com, busi...@didenko.com
On Sat, 2014-08-09 at 02:31 -0700, gvdschoot wrote:
> You can use defer C.free() in the same function where you call the
> New() function. That is why you need to know if the New() function
> allocates memory or not.

That assumes the value persists only for the duration of the function.
That is not a valid assumption.

Dan Kortschak

unread,
Aug 9, 2014, 6:22:29 PM8/9/14
to daniel...@learnosity.com, golan...@googlegroups.com
My concern is that the allocated data will be moved underfoot by the GC
while C code is running (this is a stretch fear since compacting GC is a
way off), even though the reference is in scope.

Maxim Khitrov

unread,
Aug 9, 2014, 10:30:08 PM8/9/14
to Russ Cox, Brendan Tracey, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
On Thu, Aug 7, 2014 at 8:32 PM, Russ Cox <r...@golang.org> wrote:
> On Thu, Aug 7, 2014 at 4:17 PM, Brendan Tracey <tracey....@gmail.com>
> wrote:
>>
>> The use case we have in Gonum is to allow the user to specify alternative
>> BLAS libraries. Many of these libraries have been developed and tuned over a
>> number of years, and can be tuned for specific architectures. Attempting to
>> beat the Intel MKL is not on our development roadmap. Efficiency for such
>> operations can be a big factor, and so we support a cgo interface. The C
>> interface requires some integers and doubles, but importantly a double* that
>> points to a possibly very large array of doubles. We pass this pointer with
>> unsafe.Pointer(&x[0]), where x is a []float64. The C code modifies the
>> values of the doubles in that array, which Go can then use for whatever.
>> This is the use case we would like to be able to support. I'm not expecting
>> any feedback or decisions at present, just trying to make the needs known.
>
>
> Thanks for letting us know. Passing a Go pointer like that to C is
> problematic because eventually we will want the garbage collector to be able
> to move things. A collector must update all the references when it does the
> move, and it cannot update any references stored in C. Java allows pinning
> such objects so they cannot move, but that handcuffs the collector quite a
> bit and we'd like to avoid that.

To add yet another example, preventing C from using Go-allocated
memory would break my sqlite package. Not only would you no longer be
able to bind Go data to SQL queries without copying, but you would
also lose the ability to execute callback functions. For callbacks to
work, SQLite needs to store a pointer to a Go struct. If that struct
moves, we're in trouble. I think the same would be true of most other
Go -> C -> Go call sequences.

In my opinion, the current behavior should be covered by the
compatibility guarantee. In Go 1, passing pointers to Go-allocated
memory into C was not a problem. Thankfully, you didn't go down the
JNI route, which is a feature that helps Go's adoption. Please think
twice before throwing it away.

Andy Balholm

unread,
Aug 9, 2014, 11:41:42 PM8/9/14
to Ulrich Kunitz, golang-dev, Brendan Tracey, Ian Lance Taylor, luna....@palmstonegames.com, busi...@didenko.com

On Aug 9, 2014, at 12:17 PM, Ulrich Kunitz <uli.k...@gmail.com> wrote:

> 3) Support of a non-compacting area where objects are moved before the syscall or cgo call happens.
>
> Such a non-compacting area makes anyway sense for large objects to prevent the repeated copying of large amounts of data by the GC.

This sounds like a good idea to me. If a pointer is passed to a cgo function or a syscall, the runtime could check to see if it points into memory managed by the moving collector. If so, it would be moved to a non-moving GC arena, and all pointers to it would be updated. (But how much more overhead would this add to cgo calls?)

Ian Lance Taylor

unread,
Aug 10, 2014, 1:19:37 PM8/10/14
to Maxim Khitrov, Russ Cox, Brendan Tracey, Luna Duclos, Vlad Didenko, golang-dev
On Sat, Aug 9, 2014 at 7:29 PM, Maxim Khitrov <m...@mxcrypt.com> wrote:
>
> To add yet another example, preventing C from using Go-allocated
> memory would break my sqlite package. Not only would you no longer be
> able to bind Go data to SQL queries without copying, but you would
> also lose the ability to execute callback functions. For callbacks to
> work, SQLite needs to store a pointer to a Go struct. If that struct
> moves, we're in trouble. I think the same would be true of most other
> Go -> C -> Go call sequences.

I expect that that sort of code will have to maintain a mapping
between the value passed to C and the Go pointer, and then reverse the
mapping in the call back.


> In my opinion, the current behavior should be covered by the
> compatibility guarantee. In Go 1, passing pointers to Go-allocated
> memory into C was not a problem. Thankfully, you didn't go down the
> JNI route, which is a feature that helps Go's adoption. Please think
> twice before throwing it away.

The Go 1 compatibility guarantee was not intended to cover the Go -> C
interface.

We certainly don't intend to throw away the ability to call from Go to
C and back again. But GC performance is of course extremely
important, and improving it is going to require adjustments to what is
currently permitted.

Ian

Gustavo Niemeyer

unread,
Aug 10, 2014, 1:35:48 PM8/10/14
to Ian Taylor, Brendan Tracey, Russ Cox, Maxim Khitrov, golang-dev, Vlad Didenko, Luna Duclos

Hi Ian,

For that mapping to work we need some kind of weak-referencing mechanism, otherwise there's no way to preserve existing semantics.

gustavo @ http://niemeyer.net

Ian Lance Taylor

unread,
Aug 10, 2014, 1:39:28 PM8/10/14
to Gustavo Niemeyer, Brendan Tracey, Russ Cox, Maxim Khitrov, golang-dev, Vlad Didenko, Luna Duclos
On Sun, Aug 10, 2014 at 10:35 AM, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
>
> For that mapping to work we need some kind of weak-referencing mechanism,
> otherwise there's no way to preserve existing semantics.

Why? The existing semantics say that you must preserve a pointer on
the Go side when you pass it to the C side. So when you preserve it
on th Go side, stick it into a map[uintptr]*T. When you get the
pointer back from the C side, look it up in the map.

Ian

Gustavo Niemeyer

unread,
Aug 10, 2014, 1:51:09 PM8/10/14
to Ian Taylor, Russ Cox, Brendan Tracey, Maxim Khitrov, golang-dev, Luna Duclos, Vlad Didenko

Because right now we can track the lifetime of objects, and flag them as dead when they are being collected. Putting them in a map means holding a strong reference forever, and having no means besides an explicit request to take them out, which becomes manual memory management.

Forcing people to manage the memory behind every observed value manually when using packages such as qml is a deal breaker. It would spoil the experience enough that doing it might not be worth it.

There are still improvements I have to do on that area. If a decision around these points has already been taken, I'd appreciate knowing so I can take it into account.

gustavo @ http://niemeyer.net

Maxim Khitrov

unread,
Aug 10, 2014, 9:33:10 PM8/10/14
to Ian Lance Taylor, Gustavo Niemeyer, Brendan Tracey, Russ Cox, golang-dev, Vlad Didenko, Luna Duclos
That map would have to be global (can't get any context from C), so
you also need a mutex around it and, as Gustavo mentioned, there is
more cleanup to be done. Seems like an ugly hack that complicates the
code and degrades performance when Go should be moving in the exact
opposite direction with respect to the C interface.

Also, what's the key? It can't be the address of T, because the
original T may be moved by the GC and then a new one placed at the
same location. Two different structs, same uintptr key. Instead, you
need a global counter that gets incremented for each insertion. Sorry,
but I don't think it's a reasonable solution.

Gustavo Niemeyer

unread,
Aug 10, 2014, 11:25:16 PM8/10/14
to Maxim Khitrov, Russ Cox, Brendan Tracey, golang-dev, Luna Duclos, Vlad Didenko, Ian Lance Taylor

To be clear, I don't mind implementing the mapping itself. This is going to be hidden as an implementation detail, and is a minor issue all things considered. The concern is only being able to offer a convenient Go-like experience somehow.

gustavo @ http://niemeyer.net

andrey mirtchovski

unread,
Aug 10, 2014, 11:59:27 PM8/10/14
to Gustavo Niemeyer, Maxim Khitrov, Russ Cox, Brendan Tracey, golang-dev, Luna Duclos, Vlad Didenko, Ian Lance Taylor
> To be clear, I don't mind implementing the mapping itself. This is going to
> be hidden as an implementation detail, and is a minor issue all things
> considered.

I have done this for a heavily used piece of code here. it worked fine
until I hit https://code.google.com/p/go/issues/detail?id=7978...
unfortunately bugs at the cgo interface do not lend themselves to
being debugged easily, unlike the rest of Go...

Liam

unread,
Aug 11, 2014, 7:50:07 AM8/11/14
to golan...@googlegroups.com
Are there references regarding objectives and likely tactics for compacting GC in golang?
Specifically, why would pinning "handcuff the collector"?
Is the objective perfectly-compacted memory post-sweep?

Dmitry Vyukov

unread,
Aug 11, 2014, 9:49:28 AM8/11/14
to Maxim Khitrov, Russ Cox, Brendan Tracey, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
This was never fully working. Only some individual cases were working.

Ian Lance Taylor

unread,
Aug 11, 2014, 10:15:27 AM8/11/14
to Gustavo Niemeyer, Russ Cox, Brendan Tracey, Maxim Khitrov, golang-dev, Luna Duclos, Vlad Didenko
On Sun, Aug 10, 2014 at 10:51 AM, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
>
> Because right now we can track the lifetime of objects, and flag them as
> dead when they are being collected. Putting them in a map means holding a
> strong reference forever, and having no means besides an explicit request to
> take them out, which becomes manual memory management.
>
> Forcing people to manage the memory behind every observed value manually
> when using packages such as qml is a deal breaker. It would spoil the
> experience enough that doing it might not be worth it.
>
> There are still improvements I have to do on that area. If a decision around
> these points has already been taken, I'd appreciate knowing so I can take it
> into account.

You can still manage the memory using finalizers. However, I don't
know the memory usage of your library so I don't know what the effect
would be.

For example, you write your Go code to use *T. You write

type T struct {
qml *qmlVal
key uintptr
}

Clients use a value of *T, and call methods on *T that do the
appropriate operations on the qml field. When creating a *T, you add
the qml field to the map[uintptr]*qmlVal using the key field. When
calling C, you pass the key field. In the callbacks to Go, you run
the key through the map to get the *qmlVal. Also when creating a *T,
you give it a finalizer that removes the pointer from the map, again
using the key field.

The complexity is not good, but the situation is unusual, and the
complexity can be hidden entirely within the library.

Ian

Ian Lance Taylor

unread,
Aug 11, 2014, 10:18:26 AM8/11/14
to Maxim Khitrov, Gustavo Niemeyer, Brendan Tracey, Russ Cox, golang-dev, Vlad Didenko, Luna Duclos
On Sun, Aug 10, 2014 at 6:32 PM, Maxim Khitrov <m...@mxcrypt.com> wrote:
> On Sun, Aug 10, 2014 at 1:39 PM, Ian Lance Taylor <ia...@golang.org> wrote:
>> On Sun, Aug 10, 2014 at 10:35 AM, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
>>>
>>> For that mapping to work we need some kind of weak-referencing mechanism,
>>> otherwise there's no way to preserve existing semantics.
>>
>> Why? The existing semantics say that you must preserve a pointer on
>> the Go side when you pass it to the C side. So when you preserve it
>> on th Go side, stick it into a map[uintptr]*T. When you get the
>> pointer back from the C side, look it up in the map.
>>
> That map would have to be global (can't get any context from C), so
> you also need a mutex around it and, as Gustavo mentioned, there is
> more cleanup to be done. Seems like an ugly hack that complicates the
> code and degrades performance when Go should be moving in the exact
> opposite direction with respect to the C interface.

Go needs to have a solid and usable C interface. But I don't agree
with you about the direction. I think Go should be moving toward a
very fast garbage collector, one that stops the entire program for the
shortest possible amount of time, and one that reduces the cost of
memory allocation as much as reasonably possible. I believe that is
more important than simplifying the C interface.

Ian

Ian Lance Taylor

unread,
Aug 11, 2014, 10:20:24 AM8/11/14
to Liam, golang-dev
On Mon, Aug 11, 2014 at 4:50 AM, Liam <networ...@gmail.com> wrote:
>
> Are there references regarding objectives and likely tactics for compacting
> GC in golang?
> Specifically, why would pinning "handcuff the collector"?
> Is the objective perfectly-compacted memory post-sweep?

This discussion was started by the document at
http://golang.org/s/go14gc .

Ian

Dave Cheney

unread,
Aug 11, 2014, 10:20:42 AM8/11/14
to Ian Lance Taylor, Russ Cox, Brendan Tracey, Maxim Khitrov, golang-dev, Gustavo Niemeyer, Luna Duclos, Vlad Didenko

Well said Ian.

Brendan Tracey

unread,
Aug 11, 2014, 10:46:46 AM8/11/14
to Dave Cheney, Ian Lance Taylor, Russ Cox, Maxim Khitrov, golang-dev, Gustavo Niemeyer, Luna Duclos, Vlad Didenko
Is it necessary to choose? It was said elsewhere that large objects would be unlikely to be moved because the cost of doing so would be high. I don’t know GC algorithms, but this implies that the area of open memory is not a tower that "slides down” when objects are freed, and instead, objects will be inserted when there are holes. If object size is one way that objects could effectively get pinned, couldn’t cgo be another reason? I am thinking that cgo could expose a Pin() and Unpin() function, which would act as a counter under the hood. If the object is pinned by cgo, it won’t move. Most objects live and die without leaving Go, so a relatively small number of objects will be pinned allowing for an effective bump-the-pointer GC. 

Dmitry Vyukov

unread,
Aug 11, 2014, 10:53:41 AM8/11/14
to Brendan Tracey, Dave Cheney, Ian Lance Taylor, Russ Cox, Maxim Khitrov, golang-dev, Gustavo Niemeyer, Luna Duclos, Vlad Didenko
On Mon, Aug 11, 2014 at 6:46 PM, Brendan Tracey
<tracey....@gmail.com> wrote:
> Is it necessary to choose? It was said elsewhere that large objects would be
> unlikely to be moved because the cost of doing so would be high. I don’t
> know GC algorithms, but this implies that the area of open memory is not a
> tower that "slides down” when objects are freed, and instead, objects will
> be inserted when there are holes. If object size is one way that objects
> could effectively get pinned, couldn’t cgo be another reason?

Size is known at allocation time, so the object can be directly placed
in non-movable area. Cgo pin is not known at allocation time.


> I am thinking
> that cgo could expose a Pin() and Unpin() function, which would act as a
> counter under the hood. If the object is pinned by cgo, it won’t move. Most
> objects live and die without leaving Go, so a relatively small number of
> objects will be pinned allowing for an effective bump-the-pointer GC.

Small number of pinned object scattered through the heap is enough to
prevent bump-the-pointer. .NET hit this problem with C++/CLI object
pinning in the past. As far as I remember in their case, moderate
number of pinned objects caused OOM crashes.

I am not saying that pinning is impossible. I am saying that it's more
complex than one can think it is.

Brendan Tracey

unread,
Aug 11, 2014, 10:57:29 AM8/11/14
to Dmitry Vyukov, Dave Cheney, Ian Lance Taylor, Russ Cox, Maxim Khitrov, golang-dev, Gustavo Niemeyer, Luna Duclos, Vlad Didenko
On Aug 11, 2014, at 7:53 AM, Dmitry Vyukov <dvy...@google.com> wrote:

> On Mon, Aug 11, 2014 at 6:46 PM, Brendan Tracey
> <tracey....@gmail.com> wrote:
>> Is it necessary to choose? It was said elsewhere that large objects would be
>> unlikely to be moved because the cost of doing so would be high. I don’t
>> know GC algorithms, but this implies that the area of open memory is not a
>> tower that "slides down” when objects are freed, and instead, objects will
>> be inserted when there are holes. If object size is one way that objects
>> could effectively get pinned, couldn’t cgo be another reason?
>
> Size is known at allocation time, so the object can be directly placed
> in non-movable area.

Ah, I see.

> Cgo pin is not known at allocation time.

When a cgo call is made, could those objects be then moved to the non-movable area (at runtime)?


>> I am thinking
>> that cgo could expose a Pin() and Unpin() function, which would act as a
>> counter under the hood. If the object is pinned by cgo, it won’t move. Most
>> objects live and die without leaving Go, so a relatively small number of
>> objects will be pinned allowing for an effective bump-the-pointer GC.
>
> Small number of pinned object scattered through the heap is enough to
> prevent bump-the-pointer. .NET hit this problem with C++/CLI object
> pinning in the past. As far as I remember in their case, moderate
> number of pinned objects caused OOM crashes.
>
> I am not saying that pinning is impossible. I am saying that it's more
> complex than one can think it is.

I’m sure it is. Thanks for educating me to some of the complexities.

Dmitry Vyukov

unread,
Aug 11, 2014, 11:12:32 AM8/11/14
to Brendan Tracey, Dave Cheney, Ian Lance Taylor, Russ Cox, Maxim Khitrov, golang-dev, Gustavo Niemeyer, Luna Duclos, Vlad Didenko
On Mon, Aug 11, 2014 at 6:57 PM, Brendan Tracey
<tracey....@gmail.com> wrote:
> On Aug 11, 2014, at 7:53 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>
>> On Mon, Aug 11, 2014 at 6:46 PM, Brendan Tracey
>> <tracey....@gmail.com> wrote:
>>> Is it necessary to choose? It was said elsewhere that large objects would be
>>> unlikely to be moved because the cost of doing so would be high. I don’t
>>> know GC algorithms, but this implies that the area of open memory is not a
>>> tower that "slides down” when objects are freed, and instead, objects will
>>> be inserted when there are holes. If object size is one way that objects
>>> could effectively get pinned, couldn’t cgo be another reason?
>>
>> Size is known at allocation time, so the object can be directly placed
>> in non-movable area.
>
> Ah, I see.
>
>> Cgo pin is not known at allocation time.
>
> When a cgo call is made, could those objects be then moved to the non-movable area (at runtime)?


They probably can be moved. But I don't know yet what is the right answer.

Dmitry Vyukov

unread,
Aug 11, 2014, 11:30:30 AM8/11/14
to Brendan Tracey, Dave Cheney, Ian Lance Taylor, Russ Cox, Maxim Khitrov, golang-dev, Gustavo Niemeyer, Luna Duclos, Vlad Didenko
On Mon, Aug 11, 2014 at 7:12 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> On Mon, Aug 11, 2014 at 6:57 PM, Brendan Tracey
> <tracey....@gmail.com> wrote:
>> On Aug 11, 2014, at 7:53 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>>
>>> On Mon, Aug 11, 2014 at 6:46 PM, Brendan Tracey
>>> <tracey....@gmail.com> wrote:
>>>> Is it necessary to choose? It was said elsewhere that large objects would be
>>>> unlikely to be moved because the cost of doing so would be high. I don’t
>>>> know GC algorithms, but this implies that the area of open memory is not a
>>>> tower that "slides down” when objects are freed, and instead, objects will
>>>> be inserted when there are holes. If object size is one way that objects
>>>> could effectively get pinned, couldn’t cgo be another reason?
>>>
>>> Size is known at allocation time, so the object can be directly placed
>>> in non-movable area.
>>
>> Ah, I see.
>>
>>> Cgo pin is not known at allocation time.
>>
>> When a cgo call is made, could those objects be then moved to the non-movable area (at runtime)?
>
>
> They probably can be moved. But I don't know yet what is the right answer.

Another complication here: garbage collections cannot usually move
object at arbitrary times. They usually can move objects only during
compaction phase, it cannot be instantly enabled (e.g. requires a
short STW). So moving one object on every cgo call can be unfeasible.

Liam

unread,
Aug 11, 2014, 3:27:46 PM8/11/14
to golan...@googlegroups.com, networ...@gmail.com
On Monday, August 11, 2014 7:20:24 AM UTC-7, Ian Lance Taylor wrote:
>
>> Are there references regarding objectives and likely tactics for compacting GC in golang?
>> Specifically, why would pinning "handcuff the collector"?
>> Is the objective perfectly-compacted memory post-sweep?
>
> This discussion was started by the document at http://golang.org/s/go14gc

/s/go14gc does not mention compaction (I did read it and search for other refs before posting :)

I'm following a debate about compaction without any refs to mark-and-compact algorithms, e.g.
   http://useless-factor.blogspot.com/2008/08/new-mark-compact-algorithms.html

Ian Lance Taylor

unread,
Aug 11, 2014, 3:56:19 PM8/11/14
to Liam, golang-dev
On Mon, Aug 11, 2014 at 12:27 PM, Liam <networ...@gmail.com> wrote:
> On Monday, August 11, 2014 7:20:24 AM UTC-7, Ian Lance Taylor wrote:
>>
>>> Are there references regarding objectives and likely tactics for
>>> compacting GC in golang?
>>> Specifically, why would pinning "handcuff the collector"?
>>> Is the objective perfectly-compacted memory post-sweep?
>>
>> This discussion was started by the document at http://golang.org/s/go14gc
>
> /s/go14gc does not mention compaction (I did read it and search for other
> refs before posting :)

Sorry, you're quite right that no specific tactics are described.
Some sort of moving collector is implied in the paragraph about the
1.6 goals about adding bump pointer allocation and generational copy.
At that point the GC will have to be able to move pointers around,
presumably including compacting memory. But that's well in the future
at this point.

Ian

Liam Breck

unread,
Aug 11, 2014, 5:27:37 PM8/11/14
to Ian Lance Taylor, golang-dev

On Mon, Aug 11, 2014 at 12:56 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> /s/go14gc does not mention compaction (I did read it and search for other
> refs before posting :)

Sorry, you're quite right that no specific tactics are described.
Some sort of moving collector is implied in the paragraph about the
1.6 goals about adding bump pointer allocation and generational copy.
At that point the GC will have to be able to move pointers around,
presumably including compacting memory.  But that's well in the future
at this point.

I've worked on Node.js bindings for Xapian, SQLite, and other C/C++ libs. Bindings for the arcane V8 api must accommodate movable V8 handles; the result is not pretty. I'm planning to move to Go from Node for an Arm-v7 hosted project, so I hope a simple api for library-accessible Go pointers will be provided.

A first draft could simply disable heap movements while any foreign language holds a pointer into it. Such no-move periods would be brief for apps passing data into a library (e.g. packets comprising a blob). Other apps could fall back on C.malloc to gain the benefit of a compactable Go heap.

Zizon Qiu

unread,
Aug 11, 2014, 5:35:32 PM8/11/14
to Dmitry Vyukov, Brendan Tracey, Dave Cheney, Ian Lance Taylor, Russ Cox, Maxim Khitrov, golang-dev, Gustavo Niemeyer, Luna Duclos, Vlad Didenko
and then, the move should be recursive, since objects may have reference to other object.
this may cause some other problem.

Brendan Tracey

unread,
Aug 11, 2014, 5:36:04 PM8/11/14
to Dmitry Vyukov, Dave Cheney, Ian Lance Taylor, Russ Cox, Maxim Khitrov, golang-dev, Gustavo Niemeyer, Luna Duclos, Vlad Didenko

On Aug 11, 2014, at 8:30 AM, Dmitry Vyukov <dvy...@google.com> wrote:

On Mon, Aug 11, 2014 at 7:12 PM, Dmitry Vyukov <dvy...@google.com> wrote:
On Mon, Aug 11, 2014 at 6:57 PM, Brendan Tracey
<tracey....@gmail.com> wrote:
On Aug 11, 2014, at 7:53 AM, Dmitry Vyukov <dvy...@google.com> wrote:

On Mon, Aug 11, 2014 at 6:46 PM, Brendan Tracey
<tracey....@gmail.com> wrote:
Is it necessary to choose? It was said elsewhere that large objects would be
unlikely to be moved because the cost of doing so would be high. I don’t
know GC algorithms, but this implies that the area of open memory is not a
tower that "slides down” when objects are freed, and instead, objects will
be inserted when there are holes. If object size is one way that objects
could effectively get pinned, couldn’t cgo be another reason?

Size is known at allocation time, so the object can be directly placed
in non-movable area.

Ah, I see.

Cgo pin is not known at allocation time.

When a cgo call is made, could those objects be then moved to the non-movable area (at runtime)?


They probably can be moved. But I don't know yet what is the right answer.

Another complication here: garbage collections cannot usually move
object at arbitrary times. They usually can move objects only during
compaction phase, it cannot be instantly enabled (e.g. requires a
short STW). So moving one object on every cgo call can be unfeasible.


Thanks for the explanation Dmitry. I see the issues more clearly now.

Niklas Schnelle

unread,
Aug 11, 2014, 6:22:54 PM8/11/14
to golan...@googlegroups.com, dvy...@google.com, da...@cheney.net, ia...@golang.org, r...@golang.org, m...@mxcrypt.com, gus...@niemeyer.net, luna....@palmstonegames.com, busi...@didenko.com
Would it be possible to annotate types or even allocations for non-moveness. This would mean the pinning still works at compile or allocation time so that they can
be placed in a separate heap area. I think most of the time developers will know which memory will be passed to libraries. Especially annotated types sound like something less invasive
and it could be checked at compile time that no pointer to movable memory is passed to C code via the type system. Whichever way will be best I believe low overhead
usable C binding is immensely important for any language and even more so for a language striving to be used for systems programming.

Dan Kortschak

unread,
Aug 11, 2014, 7:13:30 PM8/11/14
to Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
On Thu, 2014-08-07 at 12:54 -0700, Ian Lance Taylor wrote:
> Technically the Go compatibility guarantee doesn't promise anything
> about cgo.

So far all the discussion has been around the interaction with cgo. Can
someone explain how moving GC may interact with asm?

Ian Lance Taylor

unread,
Aug 11, 2014, 7:25:40 PM8/11/14
to Dan Kortschak, Luna Duclos, Vlad Didenko, golang-dev
Good question.

Assembler functions that store pointers in memory will have to use
some sort of write barrier.

Some other changes may be required but I can't think of any.

Ian

Ian Lance Taylor

unread,
Aug 11, 2014, 7:32:49 PM8/11/14
to Niklas Schnelle, golang-dev, Dmitry Vyukov, Dave Cheney, Russ Cox, Maxim Khitrov, Gustavo Niemeyer, Luna Duclos, Vlad Didenko
That is an interesting idea. I don't think annotating types is very
safe, because it would break horribly if some value was converted to
that type. But one could imagine adding a new runtime or reflect
function to allocate pinned garbage collected memory. Then passing it
to a cgo function would be permitted without requiring a copy.

Ian

Dan Kortschak

unread,
Aug 11, 2014, 8:09:58 PM8/11/14
to Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
So surely that means that the problem is more significant since there is
asm throughout the standard library. There are a number of cases where
the asm functions take things which are pointers or functionally
pointers (slices, interfaces, etc).

Ian Lance Taylor

unread,
Aug 11, 2014, 8:26:51 PM8/11/14
to Dan Kortschak, Luna Duclos, Vlad Didenko, golang-dev
On Mon, Aug 11, 2014 at 5:09 PM, Dan Kortschak
<dan.ko...@adelaide.edu.au> wrote:
>
> So surely that means that the problem is more significant since there is
> asm throughout the standard library. There are a number of cases where
> the asm functions take things which are pointers or functionally
> pointers (slices, interfaces, etc).

I'm not sure what you mean by "more significant" here.

You are of course correct that standard library functions written in
asm that change memory, like memmove, will have to add write barriers.

Ian

Dan Kortschak

unread,
Aug 11, 2014, 8:43:30 PM8/11/14
to Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
On Mon, 2014-08-11 at 17:26 -0700, Ian Lance Taylor wrote:
> I'm not sure what you mean by "more significant" here.

It's not just a client and cgo issue.

minux

unread,
Aug 11, 2014, 11:34:58 PM8/11/14
to Niklas Schnelle, golang-dev, Dmitry Vyukov, Dave Cheney, Ian Taylor, Russ Cox, m...@mxcrypt.com, Gustavo Niemeyer, Luna Duclos, Vlad Didenko
On Mon, Aug 11, 2014 at 6:22 PM, Niklas Schnelle <niklas....@gmail.com> wrote:
Would it be possible to annotate types or even allocations for non-moveness. This would mean the pinning still works at compile or allocation time so that they can
be placed in a separate heap area. I think most of the time developers will know which memory will be passed to libraries.
Most but not all cases.
Imagine you have allocator function that returns objects, some are passed to C,
some are not. (e.g. a regexp package where some Regexps are passed to the Go
regexp package, and some other to a pcre binding.) How could you determine the
moveness at allocation time?

Please note if you propose a solution, please make sure it could handle all cases.
most cases is not better (and imo even worse) than no cases of all: because it will
give the developer a false sense of security.

minux

unread,
Aug 11, 2014, 11:35:37 PM8/11/14
to Dan Kortschak, Ian Lance Taylor, Luna Duclos, Vlad Didenko, golang-dev
On Mon, Aug 11, 2014 at 8:09 PM, Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:
So surely that means that the problem is more significant since there is
asm throughout the standard library. There are a number of cases where
the asm functions take things which are pointers or functionally
pointers (slices, interfaces, etc).
No. I think most of the assembly code is unaffected by the write barrier change
because they don't write printers to memory. Taking a pointer is ok, write a pointer
to heap is not.

the memmove in runtime does need to be modified though, but it's a runtime internal
thing.
It is loading more messages.
0 new messages