go tip: random high CPU usage on VirtualBox

398 views
Skip to first unread message

Mhd Shulhan

unread,
Mar 10, 2020, 5:59:25 AM3/10/20
to golan...@googlegroups.com
Hi all,

I have Go tip fetch and build every once or more in a week on my local
development (office environment).
For local testing, I have VirtualBox setup with all third parties.
The host machine is macOS Catalina version 10.15.3,

Model Identifier: MacBookPro14,1
Processor Name: Dual-Core Intel Core i5
Processor Speed: 2,3 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 4 MB
Hyper-Threading Technology: Enabled
Memory: 8 GB
Boot ROM Version: 204.0.0.0.0


The guest machine is Arch Linux on VirtualBox 6.1.4 r136177 (Qt5.6.3),

5.5.7-arch1-1 #1 SMP PREEMPT Sat, 29 Feb 2020 19:06:02 +0000 x86_64 GNU/Linux

HPET: disabled
CPUProfile: host
Chipset: piix3
Firmware: BIOS
Number of CPUs: 1
PAE: disabled
Long Mode: enabled
Triple Fault Reset: disabled
APIC: enabled
X2APIC: enabled
Nested VT-x/AMD-V: disabled
ACPI: enabled
IOAPIC: enabled
BIOS APIC mode: APIC
Time offset: 0ms
RTC: local time
Hardware Virtualisation: enabled
Nested Paging: enabled
Large Pages: enabled
VT-x VPID: enabled
VT-x Unrestricted Exec.: enabled
Paravirt. Provider: None
Effective Paravirt. Prov.: None


At the guest, we have five Go services that depends on Nats, Redis, and MySQL.

The problem is at random time one or two random service(s) will run with high
CPU usage.

There is no issue when building with Go 1.14.

I have add "net/http/pprof" to each of them and try to run the profiling tools
from host to service that run with high CPU, but it can't reach the service with the following error,

Fetching profile over HTTP from http://192.168.56.10:6079/debug/pprof/profile
192.168.56.10:6079/debug/pprof/profile: Get "http://192.168.56.10:6079/debug/pprof/profile": net/http: timeout awaiting response headers
failed to fetch any source profiles


My question is any one have idea how to debug this so I can provide more
detailed report?

Thank you in advance.

Mhd Shulhan

unread,
Mar 12, 2020, 2:14:02 AM3/12/20
to golan...@googlegroups.com

> My question is any one have idea how to debug this so I can provide more
> detailed report?
>
> Thank you in advance.

So, I try to debug with gdb 9.1. Here is the sample of stack when the CPU got high,

...
[New LWP 1885]
[New LWP 1886]
[New LWP 1887]
[New LWP 1888]
[New LWP 1889]
^C
Thread 1 "XYZ" received signal SIGINT, Interrupt.
runtime.futex () at /Users/XXX/share/go/src/runtime/sys_linux_amd64.s:568
568 /Users/X/share/go/src/runtime/sys_linux_amd64.s: No such file or directory.
(gdb) where
#0 runtime.futex () at /Users/XXX/share/go/src/runtime/sys_linux_amd64.s:568
#1 0x0000000000431666 in runtime.futexsleep (addr=0xd6a3e8 <runtime.m0+328>, val=0, ns=-1) at /Users/XXX/share/go/src/runtime/os_linux.go:44
#2 0x000000000040bd9f in runtime.notesleep (n=0xd6a3e8 <runtime.m0+328>) at /Users/XXX/share/go/src/runtime/lock_futex.go:151
#3 0x000000000043b858 in runtime.stoplockedm () at /Users/XXX/share/go/src/runtime/proc.go:1972
#4 0x000000000043d4c6 in runtime.schedule () at /Users/XXX/share/go/src/runtime/proc.go:2455
#5 0x000000000043d89d in runtime.park_m (gp=0xc000166c00) at /Users/XXX/share/go/src/runtime/proc.go:2691
#6 0x00000000004651fb in runtime.mcall () at /Users/XXX/share/go/src/runtime/asm_amd64.s:318
#7 0x0000000000465114 in runtime.rt0_go () at /Users/XXX/share/go/src/runtime/asm_amd64.s:220
#8 0x0000000000000000 in ?? ()
...

Rebuild with `-gcflags=all="-N -l"` and running it again result in the same stack trace.


Looking at git blame for each files does not shown any new commit introduced since after Go 1.14. Maybe others can look.

The next thing I will do is bissecting and rebuild and report again.

Mhd Shulhan

unread,
Mar 13, 2020, 4:06:27 AM3/13/20
to 'Benjamin' via golang-nuts, Ian Lance Taylor
According to my bisection, the following commit cause it,


14:18 ~/src/go
(af1f3b0082...)|BISECTING tokenomy 0 % git bisect good
98858c438016bbafd161b502a148558987aa44d5 is the first bad commit
commit 98858c438016bbafd161b502a148558987aa44d5
Author: Ian Lance Taylor <ia...@golang.org>
Date: Tue Feb 25 20:23:15 2020 -0800

runtime: don't panic on racy use of timers

If we see a racy use of timers, as in concurrent calls to Timer.Reset,
do the operations in an unpredictable order, rather than crashing.

Fixes #37400

Change-Id: Idbac295df2dfd551b6d762909d5040fc532c1b34
Reviewed-on: https://go-review.googlesource.com/c/go/+/221077
Run-TryBot: Ian Lance Taylor <ia...@golang.org>
TryBot-Result: Gobot Gobot <go...@golang.org>
Reviewed-by: Michael Knyszek <mkny...@google.com>

src/runtime/time.go | 216 ++++++++++++++++----------------------------------
src/time/time_test.go | 40 ++++++----
2 files changed, 92 insertions(+), 164 deletions(-)


Link to CL: https://go-review.googlesource.com/c/go/+/221077

If anyone have any idea the minimal test code to test it on my VM, I will test it.

Ian Lance Taylor

unread,
Mar 13, 2020, 7:16:11 PM3/13/20
to Mhd Shulhan, golang-nuts
That seems like a fairly unlikely culprit for significantly increased
CPU usage. Can you double check?

If you are sure that is the CL, please open a bug report at
https://golang.org/issue. Anything you can give us to recreate the
problem ourselves would be very helpful. Thanks.

Ian

Mhd Shulhan

unread,
Mar 14, 2020, 4:56:53 AM3/14/20
to Ian Lance Taylor, golang-nuts
Thanks for the response Ian.

I am actually run the bisect twice.

I have also test by reverting that commit on top of latest tip and rebuild and redeploy, the result is the random CPU spike doesn't happened anymore.

The weird thing is between all 5 services not all of them suddenly consume high CPU, sometimes only one service, sometimes two of them but not three, four, or five at the same time.

I will give it one more bisect next Monday.


If you are sure that is the CL, please open a bug report at
https://golang.org/issue.  Anything you can give us to recreate the
problem ourselves would be very helpful.  Thanks.

Its quite hard for me to create reproducible program since its depends on many third party modules.  I try to comprehend the bad CL but its beyond my knowledge.

Ian Lance Taylor

unread,
Mar 14, 2020, 8:02:54 PM3/14/20
to Mhd Shulhan, golang-nuts
Thanks for retrying.

The CL you are citing, https://golang.org/cl/221077, introduces one
additional compare-and-swap when stopping a timer and when resetting a
timer. It changes several cases that previously panicked to no longer
panic, but presumably your program was not panicking before the change
so those shouldn't make any difference. Otherwise there shouldn't be
any noticeable performance changes at all. So it's difficult for me
to understand how that change could be introducing a CPU spike into
any program. Unless your program is recovering and ignoring panics,
but that seems unlikely.

A CPU profile would likely help. I don't know why you are unable to fetch one.

When using gdb you are showing a backtrace for a single thread, but Go
programs have many threads. The best way to get a stack backtrace
would be to kill the program with SIGQUIT when it is running at high
CPU. That might possibly show something interesting. Or it might
not.

Ian

Personal

unread,
Mar 23, 2020, 12:52:54 PM3/23/20
to Ian Lance Taylor, golang-nuts
Someone has created the issue [1] before I finish the bisection and looking for more information. I add some information in the issue, I hope that help.

[1] https://github.com/golang/go/issues/38023


Reply all
Reply to author
Forward
0 new messages