Debugging long GC pauses

304 views
Skip to first unread message

Oliver Beattie

unread,
Feb 23, 2017, 2:15:08 PM2/23/17
to golang-nuts
I am looking for some advice about how I can debug some long GC pauses I am observing in our production workloads under go 1.8 (the problem is not specific to 1.8, though). This is a very simple network server – basically a HTTP ping endpoint – but I regularly see tail request latencies of >100ms. I have enabled GODEBUG=gctrace=1, and I can see some quite long STW pauses amid lots of much less worrying pauses:

gc 54 @348.007s 0%: 0.061+81+0.040 ms clock, 0.12+0.39/81/81+0.081 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 55 @358.007s 0%: 0.21+83+0.019 ms clock, 0.43+80/2.7/81+0.039 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 56 @367.507s 0%: 80+1.3+0.065 ms clock, 161+0.080/1.2/82+0.13 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 57 @377.726s 0%: 0.054+63+0.023 ms clock, 0.10+0.68/61/0.44+0.046 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 58 @388.007s 0%: 0.033+81+0.036 ms clock, 0.067+0.32/80/81+0.072 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 59 @398.007s 0%: 0.021+82+0.019 ms clock, 0.043+0.17/80/82+0.038 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 60 @407.630s 0%: 0.012+57+0.031 ms clock, 0.025+0.25/0.64/57+0.063 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 61 @418.007s 0%: 0.19+1.0+79 ms clock, 0.38+0.28/0.69/0.98+159 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 62 @427.507s 0%: 0.21+81+0.29 ms clock, 0.42+81/0.96/81+0.58 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 63 @437.507s 0%: 0.015+81+0.053 ms clock, 0.031+0.29/0.98/80+0.10 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
gc 64 @443.507s 0%: 81+1.2+0.032 ms clock, 162+0.040/1.2/0.44+0.065 ms cpu, 4->4->1 MB, 5 MB goal, 2 P
scvg2: inuse: 4, idle: 2, sys: 7, released: 0, consumed: 7 (MB)
gc 65 @453.507s 0%: 0.13+81+0.051 ms clock, 0.26+0.20/81/82+0.10 ms cpu, 4->4->1 MB, 5 MB goal, 2 P

If I am reading this correctly, some of these STW pauses are 80+ milliseconds, in order to scan a minuscule heap. I am not experienced with debugging the GC in Go, so I'd appreciate any pointers as to why this could happening and what I can do to get to the bottom of the behaviour. Many thanks :)

Caleb Spare

unread,
Feb 23, 2017, 2:26:48 PM2/23/17
to Oliver Beattie, golang-nuts, Austin Clements, Rick Hudson
+rlh, austin

Interesting behavior. I would recommend filing a bug. It would be most
helpful if you can create a complete repro case including either the
production code or some minimization of it that demonstrates the same
pauses.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Austin Clements

unread,
Feb 23, 2017, 2:46:42 PM2/23/17
to Oliver Beattie, golang-nuts
AFAIK, the only thing that can cause this in Go 1.8 is a non-preemptible loop. It's not related to the heap size at all.

To test this theory, you can set GOEXPERIMENT=preemptibleloops and rebuild your Go tree (the compiler has to be built with this, so you can't just turn it on to build your project). I wouldn't recommend running in production with this, but if it eliminates the long pauses, we'll at least know that's the culprit.

Since these are quite long, the other thing you can do is run with the execution tracer (https://godoc.org/runtime/trace). You'll be able to see exactly what's happening at the beginning of each GC cycle. If you do have non-preemptible loops, you should also see goroutines executing for much longer than 10ms at a time, which is the default preemption bound.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

Jesper Louis Andersen

unread,
Feb 23, 2017, 5:19:03 PM2/23/17
to Austin Clements, Oliver Beattie, golang-nuts
What would happen if you inserted a dummy function call inside the non-preemptible loop and made an appropriate sacrifice on the altar of the SSA dead-code-eliminator so it doesn't trigger?


To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

John Souvestre

unread,
Feb 23, 2017, 5:28:43 PM2/23/17
to golan...@googlegroups.com

Or…  Call runtime.Gosched().  I seem to recall benchmarking it a while back and found that it was very efficient.

 

John

    John Souvestre - New Orleans LA

Oliver Beattie

unread,
Feb 24, 2017, 6:54:18 AM2/24/17
to golang-nuts, oli...@obeattie.com
Thanks for getting back to me. I have posted the output from a 30-second runtime/trace session here, which appears to show three 80ms GC's:


I'm struggling a bit to interpret what's there though.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages