measuring/analyzing GC workload (number of pointers to process etc)

359 views
Skip to first unread message

die...@raintank.io

unread,
Mar 11, 2016, 7:15:12 PM3/11/16
to golang-nuts
Hello everyone,
I run a data server application in Go and I'm seeing that every time the GC runs (typically every 2 minutes) cpu shoots up from a few % cpu used to about 50%, and this for several seconds in a row.
Around the same time this happens, every 2 minutes, I see elevated response latencies of said app. Since the incoming request rate, as well as the nature of the requests (amount of data requested etc) doesn't change, i have a suspicion the GC workload slows my app down.
I already went through the (fruitful) exercises of collecting profiles, reducing my allocations, using sync.Pool where appropriate, etc and seeing GC stop-the-world time go down (now to ~1 ms in go 1.6) and learned quite a bit in the process.
However the elevated cpu usage for several seconds in a row is still a problem. I have a suspicion the reason is my application uses millions of pointers to track data structures that are in use and the GC at each run just has to do too much work.
I already have a few ideas on how to reduce my use of pointers, but I'ld like to measure my progress so i validate each change I make.
I know cpu profiles tell me time spent in GC and I know GODEBUG=gctrace=1 emits some useful information. For this scenario though the former seems quite coarse grained and sensitive to noise, the latter seems more applicable, in particular the clock and cpu times measurements.
I was wondering though if there's any tool that I'm missing that can give me more insights into the workload i'm putting onto the GC? something like number of pointers it has to scan, amount of references it's following, amount of "non-obvious" pointers maybe (such as string and slice types).  Having these numbers would make things crystal clear .

For now I'm going to patch https://github.com/davecheney/gcvis to also show the cpu/clock times, but I hope that maybe there's another tool out there that I missed.

thanks,
Dieter

Dave Cheney

unread,
Mar 11, 2016, 8:52:30 PM3/11/16
to golang-nuts, die...@raintank.io
Are you sure these spikes are GC related ? Can you post your gctrace data ?

Tools like perf(1) are very useful in getting system wide measurements. If the problem happens perodically running perf(1) for a period of time when the event happens will give you useful information.

die...@raintank.io

unread,
Mar 12, 2016, 2:20:39 AM3/12/16
to golang-nuts, die...@raintank.io
gctrace confirms the multi-second mark-and-sweep spikes : http://sprunge.us/DDKF
does it still make sense to use perf? if so, which particular commands would you recommend?
my gcvis changes are proving pretty useful already (https://github.com/davecheney/gcvis/pull/26) but i'm still curious if there's a way to get more fine grained details about the GC workload.

thanks,
Dieter

die...@raintank.io

unread,
Mar 12, 2016, 4:20:48 AM3/12/16
to golang-nuts, die...@raintank.io
in runtime/mgc.go i'm seeing some interesting stats on the gcControllerState  struct, such as scanWork and bgScanCredit.
could these be the stats i'm looking for? looks like they could easily be added to the gctrace output.
thanks.

Dave Cheney

unread,
Mar 12, 2016, 4:27:24 AM3/12/16
to golang-nuts
Can you log a bug please, 1.5 seconds to mark 5gb is unexpected.

die...@raintank.io

unread,
Mar 14, 2016, 3:09:53 PM3/14/16
to golang-nuts
Reply all
Reply to author
Forward
0 new messages