Weird problem that CPU is nearly 100%

171 views
Skip to first unread message

tokers

unread,
May 21, 2020, 11:55:19 PM5/21/20
to golang-nuts
Hi!

We have a go program (an api server) on a virtual machine(with 8 cores) with a long time stable running.
However, the program recently suffered a weird problem that only a single CPU reached 100%
usage while others were very low, in the meanwhile, the network bandwidth was totally zero,
also, there were a bunch of tcp connections with CLOSE_WAIT state on the server side.
So it seems to me that the program was busily spinning on some events and cannot execute our codes.

We sent a QUIT signal to it and got its goroutine stacks, there were 3000+ goroutines on there, only two goroutines
were running but 370 goroutines were runnable, others were blocked on the channel events. Unfortunately, these two gouroutine stacks
were not available since the "goroutine running on other thread".

We didn't adjust runtime.GOMAXPROCS so the default Ps in Go should be the number of processors, i.e. 8. In my
view, the number of running goroutines should be larger, and it seems the runq size was somewhat large (even we have
8 Ms which are running user goroutines, the average runq size is 46, if we only the global runq).

I don't know what did other Ms do at that time, I know there is a mark assistant mechanism in the garbage collector implementation.
But will it use a log of Ms and make the scheduler in trouble?

Go version we use: go/1.12.13.
Os we use: CentOS/3.10.0.

Ian Lance Taylor

unread,
May 22, 2020, 12:31:44 AM5/22/20
to tokers, golang-nuts
Without seeing the code it's impossible to know, but the most likely
cause is that the goroutines were running in an unpreemptible loop,
that the rest of the goroutines were stuck waiting for a garbage
collection phase change, and that the garbage collector was waiting
for those two goroutines to complete.

Fortunately this kind of problem was fixed in 1.14, so I recommend upgrading.

Ian

tokers

unread,
May 22, 2020, 3:04:44 AM5/22/20
to golang-nuts
Thanks for you reply.

Yeah, we have the plan to upgrade our go version to 1.13.10.

Jan Mercl

unread,
May 22, 2020, 3:25:39 AM5/22/20
to tokers, golang-nuts
On Fri, May 22, 2020 at 9:05 AM tokers <zcha...@gmail.com> wrote:
>
> Thanks for you reply.
>
> Yeah, we have the plan to upgrade our go version to 1.13.10.

Note that 1.13 does not have goroutine preemption Ian was talking
about wrt 1.14.

tokers

unread,
Jul 23, 2020, 10:42:13 PM7/23/20
to golang-nuts
We detected this problem once again, and this time we observed the stacks.

Reply all
Reply to author
Forward
0 new messages