Any idea what's going on here? What can I do to reduce the spikes?
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
$ go version
go version go1.8 linux/amd64
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
gc 347 @6564.164s 0%: 0.89+518+1.0 ms clock, 28+3839/4091/3959+33 ms cpu, 23813->23979->12265 MB, 24423 MB goal, 32 PWhat I'm seeing here is that you have 32 HW threads and you spend .89+518+1 or 520 ms wall clock in the GC. You also spend 28+3839+4091+3959+33 or 11950 ms CPU time out of total of 520*32 or 16640 available CPU cycles while the GC is running. The GC will reserve 25% of the CPU to do its work. That's 16640*.25 or 4160 ms. If the GC finds any of the remaining 24 threads idle it will aggressively enlist them to do GC work.The graph only shows the 8 HW threads but not the other 24 so it is hard to tell what is going on with them.
Yes, the GC seems to finish early but that is the result of having to be very conservative about the amount of work left to do. Add to this a lot of Ps with nothing scheduled that are enlisted to do GC work. It seems counter intuitive to leave Ps idle so the approach will need numbers to support it. There is no easy way to change the assist and idle settings.
>> 1. GOMAXPROCS=24: no latency change (they are the same as GOMAXPROCS=32, both when GC is running and not running)This is an interesting clue.My guess:The application is memory bandwidth limited. If the GC's concurrent mark phase is memory bandwidth bound then the application would also be memory bandwidth bound. If one reduces the amount of bandwidth the GC is using then the application would get the bandwidth it needs. One easy way to reduce the GC's bandwidth requirement would be to reduce the Ps the GC is given from 8 to 6 which is exactly what the reducing GOMAXPROCS=32 to 24 does. The remaining 16 Ps would no longer be memory bandwidth bound so could meet the latency SLO. The graph of latency vs. GOMAXPROCS might have an interesting shape.Alternatively if the GC flushes the cache at a high rate it would cause the application to experience capacity misses and this would also slow the application down. It could be both since capacity misses tend to increase memory bandwidth requirements.I'll discuss this at lunch and see if we can come up with a simple experiment to confirm/refute the guess.The good news is that the application has a work around, just reduce GOMAXPROCS to 24.
On Wed, May 31, 2017 at 10:25 AM, Xun Liu <pas...@gmail.com> wrote:
Hi Xun. Thanks for the trace. Nothing jumps out at me, either. I would suggest trying with a reduced idleCheckThreshold in the runtime, but you said that disabling idle workers entirely didn't make a difference, so I wouldn't expect that to, either.Would it be possible to run your workload with Go 1.9 (either current master or the beta which should be out any day now)? 1.9 adds some extra information in the tracer that could be useful.How are you correlating your latency spikes with GC? I'm wondering if, for example, you know exactly which phase it spikes during (early in GC or late?) or if it's possible that it actually spikes just after a GC cycle (which would implicate sweeping rather than marking).