Constant CPU usage on an idle HTTP server from Go 1.17

714 views
Skip to first unread message

Santiago Corredoira

unread,
Apr 10, 2022, 6:02:40 PM4/10/22
to golang-nuts
Hi,

On Linux, if I compile and run this simple program:

package main

import (
         "fmt"
         "net/http"
         "time"
)

type x struct{}

func (x) ServeHTTP(w http.ResponseWriter, r *http.Request) {
     w.Write([]byte("aaaa"))
}

func main() {
sp := &http.Server{
     ReadHeaderTimeout: 5 * time.Second,
     ReadTimeout: 5 * time.Second,
     WriteTimeout: 5 * time.Second,
     IdleTimeout: 30 * time.Second,
     Addr: ":8080",
     Handler: x{},
}

     fmt.Println(sp.ListenAndServe())
}

After a while I start to see constant CPU activity (1% to 7%) even when the server is idle.

If I run "sudo strace -f -c -p <pid>" I get constant very fast calls like:

[pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 131277] 23:42:16 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
....

It never stops. Sometimes the CPU goes to 100% for a while and then go back down to 1%.. This happens only in one or two cores.

This doesn't happen with Go 1.16. I have been checking all versions and it starts in Go 1.17

If I run the program with go run instead of compiling and running the binary, I can't reproduce it.

Sometimes it starts after a couple of http calls and sometimes I do a few "wrk -d2 http://localhost:8080" calls to trigger it (wrk is a web stress tool). But as soon as it starts making the nanosleep calls, it doesn't stop, no matter how long the server is iddle.

If I remove the timeout values or set them to a short period, like one second for example, I am also not able to reproduce it. But with longer values it happens very quickly. I am surprised of not seing it reported.

Do you know what can be going on?

Thank you!

Brian Candler

unread,
Apr 11, 2022, 2:54:10 AM4/11/22
to golang-nuts
Those nanosleep calls are at 10ms intervals - i.e. 100 calls per second.  I think this is normal behaviour of the Go scheduler, which signals itself every 10ms to force a preemptive context switch.  That feature was introduced in go 1.14, at which point you were able to disable it at runtime setting environment variable GODEBUG=asyncpreemptoff=1 (I haven't tested whether you can still do this).

There were some problems when this feature was first introduced (see this and this) but improvements were made in 1.15 and 1.16.  I don't see anything in the 1.17 release notes which would affect this.

But in any case:
1. That won't cause CPU to go to 100%
2. I can't think why there should be any difference between "go run", versus "go build" and running the resulting executable.
3. I can't think why changing HTTP timeouts from 30 seconds to 1 second should make any difference.

Point (2) is going to make the problem hard to reproduce for other people.

To start with, I think you need to be more specific about:
- exactly what versions of go you're comparing ("go version"), and where you got them from (e.g. directly from go downloads page or from OS vendor packages)
- what exact Linux kernel you're running (uname -a)
- what exact version and distro of Linux you're using
- your CPU type (grep "model" /proc/cpuinfo | sort | uniq -c)
- anything that's potentially different about your environment - e.g. are you running Linux under an emulation layer like WSL (Windows)

Santiago Corredoira

unread,
Apr 11, 2022, 4:15:23 AM4/11/22
to golang-nuts
Hi Brian, thanks for your help.

Al my go downloads are from go.dev
In this tests I am using

$ go version
go version go1.18 linux/amd64

I run the tests in a laptop with linux as the main OS, no emulation.

$ grep "model" /proc/cpuinfo | sort | uniq -c
     12 model        : 165
     12 model name    : Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz

$ uname -r
5.4.0-107-generic

$ cat /etc/os-release
NAME="Linux Mint"
VERSION="20.3 (Una)"

I just spent 5 hours obsesively trying to reproduce it and I know it sounds very strange. The 100% CPU spikes is how I started noticing this, because the laptop fan would start spinning at full speed. Since then, I am using go1.16 from go.dev and "go install golang.org/dl/go1.16@latest && go1.16 download" and I haven't experienced it again.

I checked the issue links and it seems similar to what I am seeing, but only from version 1.17 and on. Also they say you can dissable it with GODEBUG=asyncpreemptoff=1 but I see the problem in the build version. Maybe the difference with go run is that it probably adds debug flags? I expected it to just compile the program in a temp location and run it, but this is a consistent difference I just tested:

$ go run test.go

And in another console:

$ ps aux | grep test.go
bill      143464  2.5  0.1 1897584 16480 pts/3   Sl+  09:54   0:00 go run test.go
bill      143604  0.0  0.0  11568  2764 pts/4    S+   09:54   0:00 grep --color=auto test.go
$ sudo strace -f -tt -p 143464
strace: Process 143464 attached with 17 threads
[pid 143544] 09:54:42.564230 futex(0xc000224148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143485] 09:54:42.564321 futex(0xc00056a148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143478] 09:54:42.564340 epoll_pwait(4,  <unfinished ...>
[pid 143477] 09:54:42.564366 futex(0xc000580548, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143476] 09:54:42.564386 futex(0xc000680148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143475] 09:54:42.564407 futex(0xc000580148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143474] 09:54:42.564430 futex(0xc000500148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143473] 09:54:42.564451 futex(0xdf3e80, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143472] 09:54:42.564471 futex(0xc000400148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143471] 09:54:42.564493 futex(0xc000380148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143470] 09:54:42.564515 futex(0xc000300148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143469] 09:54:42.564535 futex(0xdf3d38, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143468] 09:54:42.564556 futex(0xc000090148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143467] 09:54:42.564576 futex(0xc000023148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143466] 09:54:42.564596 futex(0xc000022d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143465] 09:54:42.564616 restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 143464] 09:54:42.564637 waitid(P_PID, 143586,  <unfinished ...>
[pid 143465] 09:55:14.806222 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 143465] 09:55:14.806608 nanosleep({tv_sec=0, tv_nsec=20000}, NULL) = 0
[pid 143465] 09:55:14.807053 futex(0xdc55d8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}

In another console I would request multiple times:

$ curl http://localhost:8080
or
$ wrk -d2 http://localhost:8080

but strace won't show anything else. It just stays like that and CPU's show 0% activity when is not getting requests.

On the other hand, if I compile it:

$ go build test.go
$ ./test

Check strace in another console, it starts like this:

$ ps aux | grep ./test
bill      143817  0.0  0.0 1077296 4448 pts/3    Sl+  09:57   0:00 ./test
bill      143829  0.0  0.0  11568   712 pts/4    S+   09:58   0:00 grep --color=auto ./test
$ sudo strace -f -tt -p 143817
strace: Process 143817 attached with 6 threads
[pid 143818] 09:58:23.811515 restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 143822] 09:58:23.811538 futex(0x85f438, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143821] 09:58:23.811561 futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143820] 09:58:23.811580 futex(0xc000015148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143819] 09:58:23.811585 futex(0xc000014d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143817] 09:58:23.811613 epoll_pwait(5,

But below is what I get with a single curl request:

$ sudo strace -f -tt -p 143817
strace: Process 143817 attached with 6 threads
[pid 143818] 09:58:23.811515 restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 143822] 09:58:23.811538 futex(0x85f438, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143821] 09:58:23.811561 futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143820] 09:58:23.811580 futex(0xc000015148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143819] 09:58:23.811585 futex(0xc000014d48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143817] 09:58:23.811613 epoll_pwait(5,  <unfinished ...>
[pid 143818] 09:58:42.447774 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
[pid 143818] 09:58:42.447898 nanosleep({tv_sec=0, tv_nsec=20000}, NULL) = 0
[pid 143818] 09:58:42.448141 futex(0x830ed8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
[pid 143817] 09:59:30.164453 <... epoll_pwait resumed>[{EPOLLIN, {u32=1288671128, u64=140674352517016}}], 128, -1, NULL, 0) = 1
[pid 143817] 09:59:30.164531 futex(0x830ed8, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 143818] 09:59:30.164725 <... futex resumed>) = 0
[pid 143817] 09:59:30.164737 accept4(3,  <unfinished ...>
[pid 143818] 09:59:30.164859 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 143817] 09:59:30.164889 <... accept4 resumed>{sa_family=AF_INET6, sin6_port=htons(40122), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, [112->28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 4
[pid 143817] 09:59:30.164962 epoll_ctl(5, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1288670888, u64=140674352516776}} <unfinished ...>
[pid 143818] 09:59:30.164993 <... nanosleep resumed>NULL) = 0
[pid 143817] 09:59:30.165040 <... epoll_ctl resumed>) = 0
[pid 143818] 09:59:30.165072 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 143817] 09:59:30.165097 getsockname(4, {sa_family=AF_INET6, sin6_port=htons(8080), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_scope_id=0}, [112->28]) = 0
[pid 143817] 09:59:30.165142 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 143817] 09:59:30.165162 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid 143817] 09:59:30.165180 setsockopt(4, SOL_TCP, TCP_KEEPINTVL, [15], 4) = 0
[pid 143817] 09:59:30.165199 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [15], 4 <unfinished ...>
[pid 143818] 09:59:30.165209 <... nanosleep resumed>NULL) = 0
[pid 143817] 09:59:30.165215 <... setsockopt resumed>) = 0
[pid 143818] 09:59:30.165222 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 143817] 09:59:30.165271 futex(0xc000015148, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 143820] 09:59:30.165311 <... futex resumed>) = 0
[pid 143817] 09:59:30.165317 accept4(3, 0xc00012db20, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 143818] 09:59:30.165373 <... nanosleep resumed>NULL) = 0
[pid 143820] 09:59:30.165380 epoll_pwait(5,  <unfinished ...>
[pid 143818] 09:59:30.165403 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 143820] 09:59:30.165438 <... epoll_pwait resumed>[{EPOLLIN|EPOLLOUT, {u32=1288670888, u64=140674352516776}}], 128, 0, NULL, 0) = 1
[pid 143820] 09:59:30.165501 epoll_pwait(5,  <unfinished ...>
[pid 143818] 09:59:30.165509 <... nanosleep resumed>NULL) = 0
[pid 143817] 09:59:30.165515 read(4,  <unfinished ...>
[pid 143818] 09:59:30.165536 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 143817] 09:59:30.165564 <... read resumed>"GET / HTTP/1.1\r\nHost: localhost:"..., 4096) = 78
[pid 143817] 09:59:30.165595 futex(0xc000080148, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 143821] 09:59:30.165610 <... futex resumed>) = 0
[pid 143821] 09:59:30.165624 nanosleep({tv_sec=0, tv_nsec=3000},  <unfinished ...>
[pid 143817] 09:59:30.165632 write(4, "HTTP/1.1 200 OK\r\nDate: Mon, 11 A"..., 120 <unfinished ...>
[pid 143818] 09:59:30.165642 <... nanosleep resumed>NULL) = 0
[pid 143817] 09:59:30.165669 <... write resumed>) = 120
[pid 143821] 09:59:30.165700 <... nanosleep resumed>NULL) = 0
[pid 143818] 09:59:30.165706 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 143821] 09:59:30.165716 futex(0xc000080148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143817] 09:59:30.165739 read(4, 0xc000176000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[pid 143817] 09:59:30.165785 futex(0x830108, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 143820] 09:59:30.165794 <... epoll_pwait resumed>[{EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=1288670888, u64=140674352516776}}], 128, 4999, NULL, 34914803367208) = 1
[pid 143820] 09:59:30.165809 read(4, "", 4096) = 0
[pid 143820] 09:59:30.165832 epoll_ctl(5, EPOLL_CTL_DEL, 4, 0xc000123974) = 0
[pid 143820] 09:59:30.165850 close(4 <unfinished ...>
[pid 143818] 09:59:30.165858 <... nanosleep resumed>NULL) = 0
[pid 143818] 09:59:30.165870 nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid 143820] 09:59:30.165879 <... close resumed>) = 0
[pid 143820] 09:59:30.165892 epoll_pwait(5, [], 128, 0, NULL, 1) = 0
[pid 143820] 09:59:30.165909 epoll_pwait(5,  <unfinished ...>
[pid 143818] 09:59:30.165953 <... nanosleep resumed>NULL) = 0
[pid 143818] 09:59:30.165963 futex(0x830ed8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0}

and the output keeps growing like that with every new curl request until eventually it starts constantly printing output like this:

[pid 143818] 10:02:22.697120 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.707609 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.718125 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.728753 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.739195 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.749618 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.760160 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.770669 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.781137 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.791581 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.802060 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.812540 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.822981 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.833468 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0

In htop, without strace attached, CPU is constant about 0.7% for the first one or two cores.




Santiago Corredoira

unread,
Apr 11, 2022, 4:26:28 AM4/11/22
to golang-nuts
I discovered this in another very similar pc, a bit older Lenovo Legion laptop:

$ grep "model" /proc/cpuinfo | sort | uniq -c
     12 model        : 158
     12 model name    : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz

$ uname -r
5.4.0-107-generic

$ cat /etc/*ease
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=19.3

The 100% CPU spikes would sart at any moment and would usually stop after a I made a new http request to the server.

Brian Candler

unread,
Apr 11, 2022, 7:28:48 AM4/11/22
to golang-nuts
On Monday, 11 April 2022 at 09:26:28 UTC+1 scorr...@gmail.com wrote:
and the output keeps growing like that with every new curl request until eventually it starts constantly printing output like this:

[pid 143818] 10:02:22.697120 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.707609 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.718125 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.728753 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.739195 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.749618 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.760160 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.770669 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.781137 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.791581 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.802060 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.812540 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.822981 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid 143818] 10:02:22.833468 nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0


That is normal, I believe, for a process with multiple runnable goroutines - one alarm/wakeup call every 10ms.

What is odd is the 100% CPU utilisation.  Can you attach an strace when that happens?

As for the binary produced by "go run" - use "ps" to see where it is.  For example:

==> sleepy.go <==
package main

import (
    "sync"
        "time"
)

func main() {
    var wg sync.WaitGroup
    wg.Add(1)
        go func() {
                time.Sleep(30 * time.Second)
                wg.Done()
        }()
    wg.Wait()
}
==> end <==

$ go run sleepy.go

... in another window ...
$ ps auxwww | grep sleepy
ubuntu     17743  0.0  0.1 1237424 12428 pts/1   Sl+  11:23   0:00 go run sleepy.go
ubuntu     17782  0.0  0.0 702468  1052 pts/1    Sl+  11:23   0:00 /tmp/go-build2834561198/b001/exe/sleepy

You can 'cp' it somewhere to keep it safe.

If I compare this with the output of "go build sleepy.go", I see that the "go build" version is larger and unstripped:

$ cp /tmp/go-build2834561198/b001/exe/sleepy sleepy-run
$ file sleepy-run
sleepy-run: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
$ go build sleepy.go
$ file sleepy
sleepy: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

Then "strip sleepy" gives me a binary which is almost, but not quite, the same size as the one from "go run".  So that gives you another way to try to reproduce the problem - although if the difference is just between stripped and unstripped, then it's most likely some sort of edge timing case.

Santiago Corredoira

unread,
Apr 11, 2022, 11:08:40 AM4/11/22
to golang-nuts
I tried with the stripped tmp version of "go run" and after a while I was able to reproduce it too. It definitely took me more time but probably is just random.

I tried "$ GODEBUG=asyncpreemptoff=1 ./test" and I kept seeing the the calls every 10ms. Is this expected to happen?

I compiled the test with "$ go1.16 build test.go" and I tried hard to reproduce it but I couldn't. It is very strange because this behaviour was introduced in Go 1.14 and I only see it in Go 1.17. Also asyncpreemptoff=1 seems to make no difference.

I am going to work again with Go 1.18 by default to see if I get the 100% CPU usage.

Santiago Corredoira

unread,
Apr 11, 2022, 5:59:55 PM4/11/22
to golang-nuts
I tried with a more complex application and already got constant 100% CPU usage twice.

When I attach strace I see the same as with the minimal demo but with much more frecuency:

[pid 42659] 23:36:49.035120 epoll_pwait(3, [], 128, 0, NULL, 140341803388040) = 0
[pid 42659] 23:36:49.035133 epoll_pwait(3, [], 128, 0, NULL, 18704162493558) = 0
[pid 42659] 23:36:49.035145 epoll_pwait(3, [], 128, 0, NULL, 140341803388040) = 0
[pid 42659] 23:36:49.035158 epoll_pwait(3, [], 128, 0, NULL, 18704162493558) = 0
[pid 42659] 23:36:49.035170 epoll_pwait(3, [], 128, 0, NULL, 140341803388040) = 0
[pid 42659] 23:36:49.035182 epoll_pwait(3, [], 128, 0, NULL, 18704162493558) = 0
[pid 42659] 23:36:49.035195 epoll_pwait(3, [], 128, 0, NULL, 140341803388040) = 0
[pid 42659] 23:36:49.035207 epoll_pwait(3, [], 128, 0, NULL, 18704162493558) = 0
[pid 42659] 23:36:49.035219 epoll_pwait(3, [], 128, 0, NULL, 140341803388040) = 0
[pid 42659] 23:36:49.035232 epoll_pwait(3, [], 128, 0, NULL, 18704162493558) = 0
[pid 42659] 23:36:49.035244 epoll_pwait(3, [], 128, 0, NULL, 140341803388040) = 0
[pid 42659] 23:36:49.035257 epoll_pwait(3, [], 128, 0, NULL, 18704162493558) = 0
[pid 42659] 23:36:49.035269 epoll_pwait(3, [], 128, 0, NULL, 140341803388040) = 0
[pid 42659] 23:36:49.035281 epoll_pwait(3, [], 128, 0, NULL, 18704162493558) = 0
[pid 42659] 23:36:49.035293 epoll_pwait(3, [], 128, 0, NULL, 140341803388040) = 0
[pid 42659] 23:36:49.035306 epoll_pwait(3, [], 128, 0, NULL, 18704162493558) = 0

The program is too large to post here. I have tested it with go race and it doesn't run into any problem. Any idea of what to look for?

Thank you!

Brian Candler

unread,
Apr 12, 2022, 2:59:06 AM4/12/22
to golang-nuts
At the start of the thread, you said "if I compile and run this simple program ..."

Does the problem no longer reproduce with that?

Santiago Corredoira

unread,
Apr 12, 2022, 3:13:41 AM4/12/22
to golang-nuts
Hi, yes, all about the toy program is the same. It was an attempt to get to a minimal example that anyone can reproduce so I could ask for help. The real problem is with a larger program where I get this 100% CPU usages ramdomly when the server is idle.

I was thinking and with the basic example the nanosleep calls maybe is not the scheduller signaling itself because:

 - GODEBUG=asyncpreemptoff=1 has no effect
 - It doesn't happen with Go 1.16
 - It is random. If it where the normal working of the scheduler, wouldn't it be predictable?


Santiago Corredoira

unread,
Apr 12, 2022, 3:19:38 AM4/12/22
to golang-nuts
By the way, the difference that I thought I saw between go run and the compiled version was because the pid of the process to look at in the below example should be 63668, the tmp file, not 63548, the go run call.

$ ps aux | grep test
bill       63548  0.2  0.1 2044984 17588 pts/0   Sl+  09:15   0:00 go run test.go
bill       63668  155  0.0 1891292 12140 pts/0   Sl+  09:15   1:52 /tmp/go-build910547164/b001/exe/test
bill       63750  0.0  0.0  11564   652 pts/1    S+   09:16   0:00 grep --color=auto test

Brian Candler

unread,
Apr 12, 2022, 8:55:41 AM4/12/22
to golang-nuts
Correct: "go run" builds a binary and then fork/execs the binary it has built, to run it.

Klaus Post

unread,
Apr 28, 2022, 12:23:25 PM4/28/22
to golang-nuts
This "smells" a lot like https://github.com/golang/go/issues/51654 - though the repro is with TLS.

This is very disruptive for us (MinIO) as well, seeing this at several customers. Obviously downgrading to Go 1.16.6 is less than optimal for us.

/Klaus

Santiago Corredoira

unread,
Apr 29, 2022, 4:06:35 AM4/29/22
to golang-nuts
I was having the same problem without TLS but the symptoms are exactly the same, even the part "The "loop" seems to be interruptable by triggering a new connection, or by itself after ~1m54s"

I spent 4 days trying to get a consistent way to reproduce it but coulnd't so I gave up and keept using Go 1.16. I am glad that it may be fixed in 1.19. 

Thank you for pointing it out!

Santiago Corredoira

unread,
Aug 2, 2022, 5:14:43 PM8/2/22
to golang-nuts
go 1.19 has just been released and the problem is gone!

Reply all
Reply to author
Forward
0 new messages