Proposal: Garbage collector pacing

1,954 views
Skip to first unread message

Austin Clements

unread,
Mar 10, 2015, 11:43:36 AM3/10/15
to golang-dev
Go 1.5's concurrent garbage collector poses new challenges for how to schedule garbage collection work so that the collector finishes quickly and on time without sacrificing concurrency. Here's my proposal for the garbage collector "pacing" algorithm I'm planning to implement to address this in Go 1.5: https://golang.org/s/go15gcpacing

Comments welcome (please comment in this thread).

Antoine Grondin

unread,
Mar 11, 2015, 3:11:33 AM3/11/15
to golan...@googlegroups.com
Hi!

Nice read, I've found the explanations very clear!

I was trying to follow the maths in appendix A and saw this (image with annotation):










It seems the derivation should be:

tomwilde

unread,
Mar 11, 2015, 7:15:45 AM3/11/15
to golan...@googlegroups.com
Chrome on Linux here, can't see some symbols in your text. Example:


It's not entirely illegible but it gets confusing further down. 

Austin Clements

unread,
Mar 11, 2015, 11:30:30 AM3/11/15
to Antoine Grondin, golang-dev
Good catch. I've now fixed this, as well as a few other typos in the derivation. I believe the final derivation was correct (plus it has held up in simulations), I just copied things off my whiteboard wrong. :)

Thanks!

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Austin Clements

unread,
Mar 11, 2015, 11:33:22 AM3/11/15
to tomwilde, golang-dev
The screenshot you sent is showing up correctly. The box is a placeholder saying that this relation applies to H_a/h_a, H_g/h_g, and H_T/h_T. Are there other places where the symbols are not showing up correctly?

Dmitry Vyukov

unread,
Mar 11, 2015, 12:07:17 PM3/11/15
to Austin Clements, golang-dev
"This includes time in the background collector and assists from the
mutator, but not time in write barriers (simply because the accounting
would increase write barrier overhead) or secondary effects like
increased cache misses."

This assumes that write barriers and cache missed do not introduce
significant slowdown, right? Do we have any estimations? Enabled write
barrier can introduce some overhead, and user generally does not care
whether the additional latency is introduced by assists or by write
barriers. Probably we need to shoot for something like 21% of measured
overhead and 4% of aux overheads.

"this idle utilization does not count against the GC CPU budget"

I am not sure I understand this correctly. Do you mean that GC will
eat 25% on top of the 50% that it eat from idle time? Idle utilization
should count towards total CPU budged. For example, if a program
consumes 50% of CPU time, GC should just eat the idle 50% rather than
eat the idle 50% and then 25% more.
Similarly, "Goal 2. Minimize ug - ua(n)" should be "Minimize max(Ug,
Uidle) - Ua".

"We(n)=w(n)Hm(n-1)"

How is it used? I only see a reference in the "ideal assist work",
which you discard and replace with credit/debit system.

"This system of work credit is quite flexible."

How exactly does it work?
It is possible that the background scanner has created a fullload of
credit, but we still don't want mutators to use the credit but rather
stop where they are and assist, otherwise they will over-allocate.

"Instead, the runtime will adjust hT based on an estimate of what the
heap growth would have been if GC CPU utilization was ug=0.25".

What if GC actually scans with Ug=0.9 because of idle time? If my math
is correct, this formula will emphatically schedule GC way ahead of
ideal time.

Austin Clements

unread,
Mar 11, 2015, 12:34:36 PM3/11/15
to Dmitry Vyukov, golang-dev
On Wed, Mar 11, 2015 at 12:06 PM, Dmitry Vyukov <dvy...@google.com> wrote:
On Tue, Mar 10, 2015 at 6:43 PM, 'Austin Clements' via golang-dev
<golan...@googlegroups.com> wrote:
> Go 1.5's concurrent garbage collector poses new challenges for how to
> schedule garbage collection work so that the collector finishes quickly and
> on time without sacrificing concurrency. Here's my proposal for the garbage
> collector "pacing" algorithm I'm planning to implement to address this in Go
> 1.5: https://golang.org/s/go15gcpacing
>
> Comments welcome (please comment in this thread).


"This includes time in the background collector and assists from the
mutator, but not time in write barriers (simply because the accounting
would increase write barrier overhead) or secondary effects like
increased cache misses."

This assumes that write barriers and cache missed do not introduce
significant slowdown, right? Do we have any estimations? Enabled write
barrier can introduce some overhead, and user generally does not care
whether the additional latency is introduced by assists or by write
barriers. Probably we need to shoot for something like 21% of measured
overhead and 4% of aux overheads.
 
We haven't seen significant slowdowns from write barriers yet, but that could simply be because other things are higher on the list. There is some low-hanging fruit for speeding up write barriers when the time comes. And if they still turn out to be non-trivial, then I think we can account for them by doing exactly what you suggested and lowering the goal utilization to give some margin.

"this idle utilization does not count against the GC CPU budget"

I am not sure I understand this correctly. Do you mean that GC will
eat 25% on top of the 50% that it eat from idle time? Idle utilization
should count towards total CPU budged. For example, if a program
consumes 50% of CPU time, GC should just eat the idle 50% rather than
eat the idle 50% and then 25% more.
Similarly, "Goal 2. Minimize ug - ua(n)" should be "Minimize max(Ug,
Uidle) - Ua".

GC will consume its 25% and *then* whatever's left of idle time. That is, if the mutator is consuming 50% of the CPU (not counting assists), and assists are taking x%, then background GC will be scheduled for (50-x)% and "billed" for (25-x)% (assuming x<25).
 
"We(n)=w(n)Hm(n-1)"

How is it used? I only see a reference in the "ideal assist work",
which you discard and replace with credit/debit system.

The credit/debit system is still driven by the ideal assist work. The only change is that rather than performing A(x) work directly, the mutator first tries to debit A(x) work from the background GC's credit and if it can't debit all of the work, it does some itself.
 
"This system of work credit is quite flexible."

How exactly does it work?
It is possible that the background scanner has created a fullload of
credit, but we still don't want mutators to use the credit but rather
stop where they are and assist, otherwise they will over-allocate.

Since the assist is still being driven by the ideal work, then (assuming the scan work estimate is accurate) mutators still won't be able to over-allocate.

Does that clarify?
 
"Instead, the runtime will adjust hT based on an estimate of what the
heap growth would have been if GC CPU utilization was ug=0.25".

What if GC actually scans with Ug=0.9 because of idle time? If my math
is correct, this formula will emphatically schedule GC way ahead of
ideal time.

I'm not quite sure what you're asking. u_g is a constant (0.25) and u_a excludes idle time slurped up by background GC. (You might be right that something isn't quite right with idle time handling. I haven't checked that part as carefully as I have the other parts and it's the one part I haven't modeled in my simulator.)

Dmitry Vyukov

unread,
Mar 11, 2015, 12:46:53 PM3/11/15
to Austin Clements, golang-dev
Then what is Ug in the formula and how it is measured?

Dmitry Vyukov

unread,
Mar 11, 2015, 12:50:18 PM3/11/15
to Austin Clements, golang-dev
On Wed, Mar 11, 2015 at 7:34 PM, Austin Clements <aus...@google.com> wrote:
Yes, it clarifies. But the formula for the ideal assist work should
contain Wa which we don't know instead of We. Because of that we can
easily over-allocate which is not acceptable.

Dmitry Vyukov

unread,
Mar 11, 2015, 12:52:10 PM3/11/15
to Austin Clements, golang-dev
On Wed, Mar 11, 2015 at 7:34 PM, Austin Clements <aus...@google.com> wrote:
You schedule GC assuming that GC will consume only 25% of CPU. It can
easily happen that GC consumes more because of idle time. Then you
will consistently schedule GC too early.

Dmitry Vyukov

unread,
Mar 11, 2015, 12:54:07 PM3/11/15
to Austin Clements, golang-dev
I mean what is Ua and how it is measured?

Dmitry Vyukov

unread,
Mar 11, 2015, 1:16:53 PM3/11/15
to Austin Clements, golang-dev
On Tue, Mar 10, 2015 at 6:43 PM, 'Austin Clements' via golang-dev
<golan...@googlegroups.com> wrote:
The proposal puts strong mathematical foundation under the GC, but the
problem is that it operates with lots of unknown values (or at least I
don't see how to measure them). For example:
- we don't know the alloc rate during the coming GC cycle
- we don't know the actual scanning speed and heap configuration
during the coming GC cycle
- we don't know live heap size during the coming GC cycle
- we don't know how much idle time we will have during the coming GC cycle
- we don't know how OS will schedule threads during the coming GC cycle
- we can't precisely measure CPU time consumed by GC (how?)
And each of these values seem to be critical for the math.
So I suspect that this algorithm can introduce runtime overheads and
complexity into runtime and at the same time fail to do the right
thing.
What is hot-path overhead of the proposed scheme? As far as I see it
adds work to both scanner and mutator.
It will work in steady state, but there are simpler ways to schedule
GC in steady state.

I would consider the following dumber approach.
Start GOMAXPROCS/4 background scanning threads.
Idle procs help GC. Procs don't ensure that they drained the whole
system, instead check local workqueue, global workqueue and poller; if
all that is empty and GC is in progress they help GC before resorting
to stealing.
If we finish GC significantly earlier target heap size, we start next
GC a bit later.
If we hit target heap size during GC, we start next GC a bit earlier.
To avoid over-allocation mutators start helping GC when heap_alloc
approaches next_gc. The input data for the decision: live heap after
the previous GC, current heap size, target heap size, size of the
scanned objects during the current GC. We don't know the critical
piece here: current live heap size. Without it we can aim only for:
(1) no assisting in steady state and (2) no over-allocation. These
goals should be easy to meet. Assisting happens in coarse-grained
pieces only when mallocgc allocates from heap (we don't have the
critical information to do the right decision in non-steady state
anyway).

Austin Clements

unread,
Mar 11, 2015, 1:29:40 PM3/11/15
to Dmitry Vyukov, golang-dev
If the assist work is based on something we don't know, then we won't be able to compute how much assist work to do. That's why it's based on an estimate.

I'm not sure what you mean by "easily" over-allocate, but, yes, if the work estimate is wrong we can exceed GOGC in that cycle. That's why the system revises the work estimate as it runs so that future cycles won't exceed GOGC. As far as I can tell, the only way to ensure a hard limit is to stop the world if allocation exceeds that limit. Given how rough the GOGC limit is anyway, this seems like a bad trade-off.

Dmitry Vyukov

unread,
Mar 11, 2015, 2:02:06 PM3/11/15
to Austin Clements, golang-dev
GOGC is not rough today. It is quite precise. Over-allocation by 10%
is bad. Over-allocation by 30% is simply unacceptable, it will cause
program crashes.
We cannot rely on a potentially wrong work estimate for heap size bounding.

Austin Clements

unread,
Mar 11, 2015, 2:43:15 PM3/11/15
to Dmitry Vyukov, golang-dev
I meant that it's rough as a tuning parameter since it's relative to live heap size.  Unless a program is somehow managing its live heap size to within 10%, then it's not managing its maximum heap size to within 10%.

Can you give an example of something that's so sensitive to its maximum heap size that it can't exceed GOGC?  Rick, Russ and I have been operating under the assumption that heap and CPU limits are essentially soft and we need to know if there's a really compelling case for making them hard.

Austin Clements

unread,
Mar 11, 2015, 3:46:04 PM3/11/15
to Dmitry Vyukov, golang-dev
On Wed, Mar 11, 2015 at 1:16 PM, Dmitry Vyukov <dvy...@google.com> wrote:
On Tue, Mar 10, 2015 at 6:43 PM, 'Austin Clements' via golang-dev
<golan...@googlegroups.com> wrote:
> Go 1.5's concurrent garbage collector poses new challenges for how to
> schedule garbage collection work so that the collector finishes quickly and
> on time without sacrificing concurrency. Here's my proposal for the garbage
> collector "pacing" algorithm I'm planning to implement to address this in Go
> 1.5: https://golang.org/s/go15gcpacing
>
> Comments welcome (please comment in this thread).


The proposal puts strong mathematical foundation under the GC, but the
problem is that it operates with lots of unknown values (or at least I
don't see how to measure them). For example:
- we don't know the alloc rate during the coming GC cycle
- we don't know the actual scanning speed and heap configuration
during the coming GC cycle
- we don't know live heap size during the coming GC cycle
- we don't know how much idle time we will have during the coming GC cycle
- we don't know how OS will schedule threads during the coming GC cycle
- we can't precisely measure CPU time consumed by GC (how?)
And each of these values seem to be critical for the math.

Many of these are handled by the trigger controller. If we're in steady state, it will have figured them out (or, rather, figured out their effects). If the mutator changes phases, the controller will find the new optimum within a few GC cycles. If the mutator is continuously changing its behavior, then any scheduling algorithm is going to have a hard time figuring out what to do.

So I suspect that this algorithm can introduce runtime overheads and
complexity into runtime and at the same time fail to do the right
thing.
What is hot-path overhead of the proposed scheme? As far as I see it
adds work to both scanner and mutator.

I believe it will be extremely small. All of the computations are simple. Updating the state variables only needs to be done once per GC cycle, so these computations can be effectively ignored. Things like tracking scan work will be done per P and aggregated at the end of a cycle. Assist work will be done in batches (probably when we fetch a new span) so the overhead of going into the assist path will be amortized. Deciding when to schedule background GC will be done on the basis of periodically aggregated per-P counters.

The win of a well-scheduled garbage collector, on the other hand, will be significant. For example, our current heuristic is to start the garbage collector at 7/8 of GOGC (not because we thought this was a good idea, just because we needed something until we had a real answer). A consequence of this is that speeding up the garbage collector often paradoxically slows down the overall application because we perform more GC cycles!

It will work in steady state, but there are simpler ways to schedule
GC in steady state.

I would consider the following dumber approach.
Start GOMAXPROCS/4 background scanning threads.
Idle procs help GC. Procs don't ensure that they drained the whole
system, instead check local workqueue, global workqueue and poller; if
all that is empty and GC is in progress they help GC before resorting
to stealing.
If we finish GC significantly earlier target heap size, we start next
GC a bit later.
If we hit target heap size during GC, we start next GC a bit earlier.
To avoid over-allocation mutators start helping GC when heap_alloc
approaches next_gc. The input data for the decision: live heap after
the previous GC, current heap size, target heap size, size of the
scanned objects during the current GC. We don't know the critical
piece here: current live heap size. Without it we can aim only for:
(1) no assisting in steady state and (2) no over-allocation. These
goals should be easy to meet. Assisting happens in coarse-grained
pieces only when mallocgc allocates from heap (we don't have the
critical information to do the right decision in non-steady state
anyway).

At this high level, this doesn't seem significantly different from what I'm proposing. One difference is that you're proposing assists only kick in if we're nearing the heap goal, but that poses the danger of significantly overloading the CPU toward the end of a cycle when there's little time left to correct for this overload.

One thing you may or may not have intended in your proposal (I may just be filling in details in my own way here) is the idea that how much work assists have to do is based on how much work has happened in the current GC cycle. This is possible because, in your proposal, assists don't kick in until well in to the cycle, so this estimate would be pretty accurate. In contrast, in my current proposal, this estimate is based solely on past cycles. I like the idea of using more information from the current cycle. In the terms from my proposal, I think this could easily be incorporated by revising the scan work ratio w (and the things derived from it) as the cycle progresses based on the scan work ratio observed so far in this cycle. I think this would give stronger heap size guarantees that you're going for as it converged towards the "truth" for this cycle, while also smoothing out the CPU utilization.

Austin Clements

unread,
Mar 11, 2015, 6:01:30 PM3/11/15
to Dmitry Vyukov, golang-dev
I assume you mean ua (or u_a), not Ua. If there's a capital U in the doc it's a typo.

u_a is the fraction of GOMAXPROCS used by background GC up to the 25% budget plus mutator assists. It does not include background GC run during idle time beyond this 25% budget.

For example, if the mutator (excluding assists) is using 10% of GOMAXPROCS, and assists are using 5% of GOMAXPROCS, non-idle background GC will use 20% of GOMAXPROCS, and idle background GC will use the remaining 65% of GOMAXPROCS. u_a will be 0.25 from assists and non-idle background GC.

In terms of the actual implementation of this, this will be based on wall-clock time (not raw CPU time as measured by, say, getrusage). I'm planning to use nanotime around assists and at scheduling quantums that consider running background GC. nanotime isn't quite as cheap as cputicks, but on most OSes it's close, it doesn't have problems with monotonicity, and I expect any overhead it does have to be amortized.

Austin Clements

unread,
Mar 11, 2015, 6:15:23 PM3/11/15
to Dmitry Vyukov, golang-dev
If GC consumes more CPU because of idle time (and does so consistently for a few GC cycles), then the trigger controller will adapt to this and move the trigger higher. It's true that if the mutator goes from very idle to very busy, then the trigger may be too high, so the next few cycles may miss their CPU utilization goals (assuming the mutator also ramps up its allocation rate), but as in other cases, the controller will pull the trigger lower and adapt after a few GC cycles.

BTW, I added idle time modeling to my simulator and confirmed that the formulas do the right thing, given that u_a excludes idle time background GC. (If u_a were to include idle time background GC, then the error term in the trigger controller would penalize background GC for using idle time and pull the trigger lower, just like it would if mutator assists were overloading the CPU. With mutator assists, this has the effect of spreading their utilization over a longer period in the next cycle. But, idle time background GC is different. Assuming the next cycle has the same idle time, the background GC will continue to make use of the idle time and now it will just finish early. So it's important that u_a *not* include idle time background GC.)

Dmitry Vyukov

unread,
Mar 12, 2015, 3:09:03 AM3/12/15
to Austin Clements, golang-dev
On Wed, Mar 11, 2015 at 9:43 PM, Austin Clements <aus...@google.com> wrote:
>> GOGC is not rough today. It is quite precise. Over-allocation by 10%
>> is bad. Over-allocation by 30% is simply unacceptable, it will cause
>> program crashes.
>> We cannot rely on a potentially wrong work estimate for heap size
>> bounding.
>
>
> I meant that it's rough as a tuning parameter since it's relative to live
> heap size. Unless a program is somehow managing its live heap size to
> within 10%, then it's not managing its maximum heap size to within 10%.

Live heap size is fully transparent for user. If memory consumption
becomes a problem user can measure and optimize size of persistent
data structures, per-request memory consumption and number of requests
in flight. And the live heap == persistent data + per-request memory *
number of concurrent requests.


> Can you give an example of something that's so sensitive to its maximum heap
> size that it can't exceed GOGC? Rick, Russ and I have been operating under
> the assumption that heap and CPU limits are essentially soft and we need to
> know if there's a really compelling case for making them hard.

I agree that everything related to CPU is soft. There is always OS
scheduler and other processes; CPU is harder to measure and in some
sense it is harder to even define CPU consumption; also you have
infinite amount of CPU time (every second you have another second of
CPU time). So besides hard-real-time systems hard CPU limits do not
make lots of sense.

Memory is a completely different story. It's easy to define and
measure; environments frequently have hard bounds on memory
consumption (hard in the sense that a process will be terminated
rather temporary throttled as with CPU). For systems w/o swap (and to
some degree for systems with swap) any over-allocation becomes
persistent. So it is super important to not allow any noticeable
over-allocation. Consider that a program has a live set of 3GB, and
container has hard limit of 8GB. If we don't provide hard limit on
heap size, it is not possible to operate Go programs in such
environment at all.

Dmitry Vyukov

unread,
Mar 12, 2015, 3:19:49 AM3/12/15
to Austin Clements, golang-dev
I still don't understand. You are basically saying that you want to
keep 25% as close to 25% as possible.
This looks like a very synthetic and non-interesting goal. I still
think that the real goal is: "Minimize max(u_g, u_idle) - u_a"

> For example, if the mutator (excluding assists) is using 10% of GOMAXPROCS,
> and assists are using 5% of GOMAXPROCS, non-idle background GC will use 20%
> of GOMAXPROCS, and idle background GC will use the remaining 65% of
> GOMAXPROCS. u_a will be 0.25 from assists and non-idle background GC.
>
> In terms of the actual implementation of this, this will be based on
> wall-clock time (not raw CPU time as measured by, say, getrusage). I'm
> planning to use nanotime around assists and at scheduling quantums that
> consider running background GC. nanotime isn't quite as cheap as cputicks,
> but on most OSes it's close, it doesn't have problems with monotonicity, and
> I expect any overhead it does have to be amortized.

How will you capture the concurrency aspect of utilization?
For a 4-second GC period, stopping the world for 1 second gives
exactly 25%. But that's obviously unacceptable.

Dmitry Vyukov

unread,
Mar 12, 2015, 3:47:05 AM3/12/15
to Austin Clements, golang-dev
I am probably thinking in different terms.
Do I understand it correctly that:
- we start GOMAXPROCS/4 background GC threads, this gives us u_g of at
least 25%, so u_g cannot be lower than 25% (GOMAXPROCS%4 != 0 aside)
- mutator assists add to u_g, mutator assists are always above the cap of 25%
- idle time GC is not included in u_g
?

Dmitry Vyukov

unread,
Mar 12, 2015, 3:50:26 AM3/12/15
to Austin Clements, golang-dev
On Thu, Mar 12, 2015 at 10:19 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>>> > Then what is Ug in the formula and how it is measured?
>>>
>>>
>>> I mean what is Ua and how it is measured?
>>
>>
>> I assume you mean ua (or u_a), not Ua. If there's a capital U in the doc
>> it's a typo.
>>
>> u_a is the fraction of GOMAXPROCS used by background GC up to the 25% budget
>> plus mutator assists. It does not include background GC run during idle time
>> beyond this 25% budget.
>
> I still don't understand. You are basically saying that you want to
> keep 25% as close to 25% as possible.
> This looks like a very synthetic and non-interesting goal. I still
> think that the real goal is: "Minimize max(u_g, u_idle) - u_a"

I am probably thinking in different terms.
What you are saying is that we want to minimize mutator assists, is it correct?
We always consume the 25% with GOMAXPROCS/4 background GC threads, so
we can just ignore them. What's left is assists, they are always above
the cap, and so we want to minimize them.

Dmitry Vyukov

unread,
Mar 12, 2015, 4:37:48 AM3/12/15
to Austin Clements, golang-dev
Right.
And so my proposal below is to rely more on the trigger controller,
rather than than on complex computations based on unknown values.
If we are in steady state trigger controller will do its thing
eventually and we don't need anything else; if we are not in steady
state, computations based on unknown values have little value, we can
as well just wait for trigger controller to autotune.



>> So I suspect that this algorithm can introduce runtime overheads and
>> complexity into runtime and at the same time fail to do the right
>> thing.
>> What is hot-path overhead of the proposed scheme? As far as I see it
>> adds work to both scanner and mutator.
>
>
> I believe it will be extremely small. All of the computations are simple.
> Updating the state variables only needs to be done once per GC cycle, so
> these computations can be effectively ignored. Things like tracking scan
> work will be done per P and aggregated at the end of a cycle. Assist work
> will be done in batches (probably when we fetch a new span) so the overhead
> of going into the assist path will be amortized. Deciding when to schedule
> background GC will be done on the basis of periodically aggregated per-P
> counters.

ack
Yes, but I propose to rely more on trigger controller and do not do
computations of work, utilization and credits/debits. See above why.


> One difference is that you're proposing assists only kick in if
> we're nearing the heap goal, but that poses the danger of significantly
> overloading the CPU toward the end of a cycle when there's little time left
> to correct for this overload.

Yes, and it is the only option that makes sense.
If you had enough information in the begging on GC to figure out that
you need to start assists earlier, why did not you just schedule GC
earlier? We already did our best using all information we have to
trigger GC at such a point that assists are not required at all. Now,
even if we look at how GC progresses, we still don't know whether we
need to add assists or not (because we don't know actual live heap
size). So the only point when assists make sense is very late during
GC.

So what I am thinking about is:
- tune trigger point based on feedback aiming at finishing marking at
0.9 of target heap size
- enable assists when we cross 0.9 point
- stop the world when we cross 1.0 point

The exact numbers are, of course, subject for tuning. For example, it
may be a good idea to aim for modest assisting at the end of marking
(just to slow down mutators and being able to set expected marking
completion closer to 1.0). Also assists can be more aggressive as we
approach 1.0.
Another simpler and useful lever that we have is number of GC threads.
For example, when we cross 0.95 point we can dedicate GOMAXPROCS/2
threads for GC. This both speedups marking, slows down mutators in a
simple coarse-grained, cache-friendly and _latency_-friendly way.


> One thing you may or may not have intended in your proposal (I may just be
> filling in details in my own way here) is the idea that how much work
> assists have to do is based on how much work has happened in the current GC
> cycle. This is possible because, in your proposal, assists don't kick in
> until well in to the cycle, so this estimate would be pretty accurate. In
> contrast, in my current proposal, this estimate is based solely on past
> cycles. I like the idea of using more information from the current cycle. In
> the terms from my proposal, I think this could easily be incorporated by
> revising the scan work ratio w (and the things derived from it) as the cycle
> progresses based on the scan work ratio observed so far in this cycle. I
> think this would give stronger heap size guarantees that you're going for as
> it converged towards the "truth" for this cycle, while also smoothing out
> the CPU utilization.

Difficult to say. This again boils down to computations based on
assumptions that are not necessary true. For example heap scanning
speed can be radically different for large []*int and []byte. So we
can observe that we've scanned only 0.25 of planned memory in 0.75
time, and decide that we need to start assisting like insane. Or we
can observe that we've scanned 0.9 in 0.8 time, so we are ahead of
schedule. Both of these decisions can be incorrect.

I would implement the mandatory part first (like trigger point
controller) and then see what problems happen on real program and then
investigate mechanisms to address them.

r...@google.com

unread,
Mar 12, 2015, 9:50:31 AM3/12/15
to golan...@googlegroups.com, aus...@google.com
When the GC cycle is triggered is our best estimation of the GC thread
being able to keep ahead of the mutators. The mutator assist is required when 
that estimation is wrong and to help (softly) bound any overrun. 
The earlier in the cycle we provide mutator assist the less disruptive the assist 
will be. Delaying the assist will adversely affect our MMU/MUT latency 
numbers since it will push the assists into the last portion of the cycle 
when we also have to do the STW mark termination phase which dominates
the MMU/MUT numbers.

As to your last point about the time spent scanning []byte vs []*int, the assists 
credits are in terms of pointers and not bytes.

Dmitry Vyukov

unread,
Mar 12, 2015, 10:14:27 AM3/12/15
to Rick Hudson, golang-dev, Austin Clements
On Thu, Mar 12, 2015 at 4:50 PM, <r...@google.com> wrote:
> When the GC cycle is triggered is our best estimation of the GC thread
> being able to keep ahead of the mutators. The mutator assist is required
> when
> that estimation is wrong and to help (softly) bound any overrun.
> The earlier in the cycle we provide mutator assist the less disruptive the
> assist
> will be. Delaying the assist will adversely affect our MMU/MUT latency
> numbers since it will push the assists into the last portion of the cycle
> when we also have to do the STW mark termination phase which dominates
> the MMU/MUT numbers.

But we don't know when to start assists and whether we need them at all.
If out best estimation is correct, then we don't need assists at all.
If it is wrong, then everything we know about work/idle procs/live
heap/etc is also wrong. One guess would be to always start assists
instantly, but this introduces unnecessary latency. Another guess
would be to always start assists very late (when we are sure that our
estimations are wrong, this is what I propose). We can't do a
well-founded decision to do something in between.



> As to your last point about the time spent scanning []byte vs []*int, the
> assists
> credits are in terms of pointers and not bytes.

This does introduce overhead onto scanning hot path.

Rick Hudson

unread,
Mar 12, 2015, 10:53:17 AM3/12/15
to Dmitry Vyukov, golang-dev, Austin Clements
We are repeating ourselves. The basic framework in the paper is complete and well enough articulated to implement which is what I propose we do. If we see use cases (and we will) where our estimations are wrong we can adjust them within the framework. If we can't then the framework presented will need to be adjusted.

Dmitry Vyukov

unread,
Mar 12, 2015, 11:13:54 AM3/12/15
to Rick Hudson, golang-dev, Austin Clements
The proposal lacks information on why this approach is better than
simpler approaches. You seem to do some modelling. Can you share
results of the modelling on how the framework reacts on: (1) different
amount of idle time available, (2) different size of live heap, (3)
different heap structure/uneven scanning speed, (4) different
allocation rate?

Russ Cox

unread,
Mar 13, 2015, 9:58:19 AM3/13/15
to Dmitry Vyukov, Rick Hudson, golang-dev, Austin Clements
On Thu, Mar 12, 2015 at 11:13 AM, 'Dmitry Vyukov' via golang-dev <golan...@googlegroups.com> wrote:
The proposal lacks information on why this approach is better than
simpler approaches. You seem to do some modelling. Can you share
results of the modelling on how the framework reacts on: (1) different
amount of idle time available, (2) different size of live heap, (3)
different heap structure/uneven scanning speed, (4) different
allocation rate?

Austin has proposed something that seems completely reasonable (and much more principled than just about anything else in the runtime). I don't believe he needs to justify every thing he _didn't_ choose to do in order to move forward, implement, see how it works on real workloads, and iterate from there.

Russ

2pau...@googlemail.com

unread,
Mar 17, 2015, 5:11:59 AM3/17/15
to golan...@googlegroups.com, dvy...@google.com, r...@google.com, aus...@google.com
" see how it works on real workloads, and iterate from there." ... that's what I was thinking too. Maybe translated that means: monitor, profile, optimize. I also think there has to be a bias in the math depending on application type (realtime, batch, medium latency etc) 

Overall, fantastic work everybody. As always. 






Austin Clements

unread,
Sep 3, 2015, 7:08:35 PM9/3/15
to golang-dev
Now that 1.5 is out I've gone back and added an appendix to the original GC pacer design proposal documenting what changed between the proposal and the final implementation. The overall feedback system didn't change between the proposal and the final implementation, but many of the details were revised in response to experimentation with the real implementation on real workloads.
Reply all
Reply to author
Forward
0 new messages