Strange blocking in allocations during GC

142 views
Skip to first unread message

Alexey Borzenkov

unread,
Apr 9, 2017, 11:28:05 PM4/9/17
to golang-nuts
Hi,

I recently stumbled upon a very strange behaviour during garbage collection that can produce extremely long pauses in goroutines, which might not even be clear from the output of GODEBUG=gctrace=1. This is especially puzzling given the promise of sub-millisecond pauses since Go 1.8. First, the latest stressing program I cooked up is here:


This started from shipilev's twit about allocating 100M objects in Java and I kind of tried doing the same in Go. The throughput is not great, but that's not the issue. What I found surprising is when I tried measuring how much other goroutines are blocked I was shocked to find many-second pauses. :(

Later it boiled down to this:

* if the goroutine doesn't allocate it's good and pauses are minimal
* however if it does allocate even a little there's a chance it will be stopped for many seconds during GC

When looking at go trace on my Mac (4 cpus) I found that this time is spent in runtime.gcAssistAlloc, which is not surprising in itself (except for the time an innocent goroutine may become subject to it is extremely surprising).

On my Linux virtual server (2 cpus), however, I sometimes get output like this:

$ go build && GODEBUG=gctrace=1 ./go-mem-stress
gc 1 @0.003s 4%: 0.10+4.3+0.085 ms clock, 0.20+0.083/0.25/3.8+0.17 ms cpu, 4->4->2 MB, 5 MB goal, 2 P
gc 2 @0.009s 6%: 0.025+13+0.049 ms clock, 0.050+0.28/1.8/11+0.098 ms cpu, 4->4->2 MB, 5 MB goal, 2 P
gc 3 @0.024s 5%: 0.015+19+0.055 ms clock, 0.030+0/2.0/17+0.11 ms cpu, 5->5->3 MB, 6 MB goal, 2 P
gc 4 @0.046s 7%: 0.012+20+0.058 ms clock, 0.025+0/4.4/16+0.11 ms cpu, 7->7->4 MB, 8 MB goal, 2 P
gc 5 @0.071s 6%: 0.015+30+0.10 ms clock, 0.030+0/3.6/26+0.21 ms cpu, 10->10->5 MB, 11 MB goal, 2 P
gc 6 @0.103s 7%: 0.93+36+0.13 ms clock, 1.8+0.073/4.6/0+0.27 ms cpu, 9->10->9 MB, 11 MB goal, 2 P
gc 7 @0.152s 6%: 0.021+87+0.20 ms clock, 0.042+0/10/77+0.40 ms cpu, 22->22->8 MB, 23 MB goal, 2 P
gc 8 @0.244s 6%: 1.3+71+0.13 ms clock, 2.6+0/9.7/1.3+0.26 ms cpu, 17->17->17 MB, 18 MB goal, 2 P
gc 9 @0.320s 6%: 0.019+90+0.11 ms clock, 0.039+0/9.1/81+0.22 ms cpu, 28->28->12 MB, 35 MB goal, 2 P
gc 10 @0.416s 6%: 0.013+112+0.092 ms clock, 0.027+0/12/100+0.18 ms cpu, 26->26->15 MB, 27 MB goal, 2 P
gc 11 @0.535s 6%: 0.018+124+0.089 ms clock, 0.036+0/16/108+0.17 ms cpu, 32->32->18 MB, 33 MB goal, 2 P
gc 12 @0.672s 5%: 0.023+199+0.056 ms clock, 0.046+0/18/180+0.11 ms cpu, 40->40->23 MB, 41 MB goal, 2 P
gc 13 @0.873s 7%: 16+205+0.093 ms clock, 32+0.085/28/0+0.18 ms cpu, 49->49->49 MB, 50 MB goal, 2 P
gc 14 @1.138s 7%: 0.020+336+3.0 ms clock, 0.040+0/43/292+6.0 ms cpu, 82->82->35 MB, 99 MB goal, 2 P
gc 15 @1.494s 6%: 0.019+402+0.20 ms clock, 0.038+0/29/373+0.40 ms cpu, 76->76->43 MB, 77 MB goal, 2 P
gc 16 @1.921s 6%: 0.019+371+0.095 ms clock, 0.039+0/42/329+0.19 ms cpu, 95->95->53 MB, 96 MB goal, 2 P
gc 17 @2.338s 4%: 0.017+1248+0.13 ms clock, 0.035+0/43/1204+0.26 ms cpu, 118->118->66 MB, 119 MB goal, 2 P
gc 18 @3.617s 10%: 0.040+641+0.11 ms clock, 0.081+0.083/571/69+0.22 ms cpu, 147->147->82 MB, 148 MB goal, 2 P
gc 19 @4.322s 9%: 0.027+958+0.086 ms clock, 0.054+0.11/67/890+0.17 ms cpu, 183->183->103 MB, 184 MB goal, 2 P
gc 20 @5.373s 8%: 0.016+1075+0.10 ms clock, 0.032+0.11/118/956+0.21 ms cpu, 229->229->128 MB, 230 MB goal, 2 P
gc 21 @6.516s 7%: 0.015+1311+0.058 ms clock, 0.030+0.13/146/1164+0.11 ms cpu, 286->286->159 MB, 287 MB goal, 2 P
gc 22 @7.914s 14%: 0.012+1610+0.28 ms clock, 0.024+0.26/1457/151+0.56 ms cpu, 357->357->199 MB, 358 MB goal, 2 P
gc 23 @9.617s 14%: 131+1607+0.20 ms clock, 262+0.24/403/5.9+0.40 ms cpu, 446->446->446 MB, 447 MB goal, 2 P
gc 24 @11.440s 15%: 252+2005+0.094 ms clock, 504+0.25/321/65+0.18 ms cpu, 754->754->557 MB, 892 MB goal, 2 P
gc 25 @13.877s 15%: 200+2183+0.15 ms clock, 401+0.20/357/81+0.31 ms cpu, 942->942->696 MB, 1114 MB goal, 2 P
gc 26 @16.405s 21%: 137+3805+0.073 ms clock, 274+0.082/3447/354+0.14 ms cpu, 1178->1178->483 MB, 1392 MB goal, 2 P
gc 27 @20.515s 19%: 130+3504+0.12 ms clock, 261+0.053/430/283+0.25 ms cpu, 1086->1086->1086 MB, 1087 MB goal, 2 P
gc 28 @24.362s 19%: 467+4245+0.12 ms clock, 935+0.18/940/0+0.25 ms cpu, 1839->1839->1357 MB, 2172 MB goal, 2 P
Collected 1512 points
Min:   1.073843ms
p50:   1.114427ms
p95:   20.159163ms
p99:   330.065386ms
p99.9: 3.584889808s
Max:   15.854872425s
gc 29 @29.200s 30%: 693+15553+0.095 ms clock, 1387+0.17/15160/701+0.19 ms cpu, 2299->2299->943 MB, 2715 MB goal, 2 P

And it's actually fortunate that the last line managed to be written out before the program exited, as otherwise it wouldn't even be clear why the maximum is almost 16 seconds (it just happened to be the last cycle is so extreme). And the trace shows that there are two goroutines running where one in main.main that keeps allocating like crazy, and the other is something called "GC (fractional)" which runs for the whole 15 seconds, and then finally both stop and runtime.timerproc finally runs.

I suspect this shouldn't be like that, and there's something wrong with preemption related to timers. The "GC (fractional)", if I read documentation for gcMarkWorkerFractionalMode correctly, is supposed to get preempted (but clearly for 15 seconds it was not). In the case of gcAssistAlloc it seems very strange that a goroutine with tiny allocations pays the price of another goroutine that allocated millions of objects.

Maybe this case is too extreme and this never happens in real applications, but maybe it's something else?

Ian Lance Taylor

unread,
Apr 10, 2017, 7:54:46 PM4/10/17
to Alexey Borzenkov, Rick Hudson, Austin Clements, golan...@googlegroups.com
[ moving to golang-dev ]
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages