Right.
And so my proposal below is to rely more on the trigger controller,
rather than than on complex computations based on unknown values.
If we are in steady state trigger controller will do its thing
eventually and we don't need anything else; if we are not in steady
state, computations based on unknown values have little value, we can
as well just wait for trigger controller to autotune.
>> So I suspect that this algorithm can introduce runtime overheads and
>> complexity into runtime and at the same time fail to do the right
>> thing.
>> What is hot-path overhead of the proposed scheme? As far as I see it
>> adds work to both scanner and mutator.
>
>
> I believe it will be extremely small. All of the computations are simple.
> Updating the state variables only needs to be done once per GC cycle, so
> these computations can be effectively ignored. Things like tracking scan
> work will be done per P and aggregated at the end of a cycle. Assist work
> will be done in batches (probably when we fetch a new span) so the overhead
> of going into the assist path will be amortized. Deciding when to schedule
> background GC will be done on the basis of periodically aggregated per-P
> counters.
ack
Yes, but I propose to rely more on trigger controller and do not do
computations of work, utilization and credits/debits. See above why.
> One difference is that you're proposing assists only kick in if
> we're nearing the heap goal, but that poses the danger of significantly
> overloading the CPU toward the end of a cycle when there's little time left
> to correct for this overload.
Yes, and it is the only option that makes sense.
If you had enough information in the begging on GC to figure out that
you need to start assists earlier, why did not you just schedule GC
earlier? We already did our best using all information we have to
trigger GC at such a point that assists are not required at all. Now,
even if we look at how GC progresses, we still don't know whether we
need to add assists or not (because we don't know actual live heap
size). So the only point when assists make sense is very late during
GC.
So what I am thinking about is:
- tune trigger point based on feedback aiming at finishing marking at
0.9 of target heap size
- enable assists when we cross 0.9 point
- stop the world when we cross 1.0 point
The exact numbers are, of course, subject for tuning. For example, it
may be a good idea to aim for modest assisting at the end of marking
(just to slow down mutators and being able to set expected marking
completion closer to 1.0). Also assists can be more aggressive as we
approach 1.0.
Another simpler and useful lever that we have is number of GC threads.
For example, when we cross 0.95 point we can dedicate GOMAXPROCS/2
threads for GC. This both speedups marking, slows down mutators in a
simple coarse-grained, cache-friendly and _latency_-friendly way.
> One thing you may or may not have intended in your proposal (I may just be
> filling in details in my own way here) is the idea that how much work
> assists have to do is based on how much work has happened in the current GC
> cycle. This is possible because, in your proposal, assists don't kick in
> until well in to the cycle, so this estimate would be pretty accurate. In
> contrast, in my current proposal, this estimate is based solely on past
> cycles. I like the idea of using more information from the current cycle. In
> the terms from my proposal, I think this could easily be incorporated by
> revising the scan work ratio w (and the things derived from it) as the cycle
> progresses based on the scan work ratio observed so far in this cycle. I
> think this would give stronger heap size guarantees that you're going for as
> it converged towards the "truth" for this cycle, while also smoothing out
> the CPU utilization.
Difficult to say. This again boils down to computations based on
assumptions that are not necessary true. For example heap scanning
speed can be radically different for large []*int and []byte. So we
can observe that we've scanned only 0.25 of planned memory in 0.75
time, and decide that we need to start assisting like insane. Or we
can observe that we've scanned 0.9 in 0.8 time, so we are ahead of
schedule. Both of these decisions can be incorrect.
I would implement the mandatory part first (like trigger point
controller) and then see what problems happen on real program and then
investigate mechanisms to address them.