Andrew Athan
unread,Jan 10, 2024, 6:51:16 PM1/10/24Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to golang-nuts
As you can see in the below output from pprof, the golang TLS implementation seems to be in some kind of tight loop in crypto/internal/bigmod.addMulVVW2048 causing CPU starvation, wherein the net.HTTP server stops calling my request handler. Eventually the TLS handshakes fail, and the connections are dropped before my handler ever sees them. I guess I'll look for an even newer version of golang, but wanted to post this here to see if you have any feedback. It seems unlikely to me that go's TLS package would exhibit this type of regression.
$ go tool pprof /tmp/cpu.prof2
File: sgs-server-dev
Type: cpu
Time: Jan 10, 2024 at 10:10pm (UTC)
Duration: 180.20s, Total samples = 892.25s (495.14%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10
Showing nodes accounting for 840.73s, 94.23% of 892.25s total
Dropped 1278 nodes (cum <= 4.46s)
Showing top 10 nodes out of 52
flat flat% sum% cum cum%
595.80s 66.78% 66.78% 595.80s 66.78% crypto/internal/bigmod.addMulVVW2048
83.10s 9.31% 76.09% 772.62s 86.59% crypto/internal/bigmod.(*Nat).montgomeryMul
53.66s 6.01% 82.10% 53.86s 6.04% crypto/internal/bigmod.(*Nat).assign (inline)
42.83s 4.80% 86.90% 42.93s 4.81% crypto/internal/bigmod.addMulVVW (inline)
23.23s 2.60% 89.51% 25.10s 2.81% crypto/internal/bigmod.(*Nat).shiftIn
14.93s 1.67% 91.18% 14.93s 1.67% crypto/internal/bigmod.(*Nat).sub (inline)
7.75s 0.87% 92.05% 7.75s 0.87% runtime.memmove
7.33s 0.82% 92.87% 7.33s 0.82% runtime.duffzero
7.12s 0.8% 93.67% 7.12s 0.8% runtime/internal/syscall.Syscall6
4.98s 0.56% 94.23% 4.98s 0.56% crypto/sha256.block
I am building an executable on this box:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.04
Release: 23.04
Codename: lunar
$ uname -a
Linux AASalad 6.2.0-36-generic #37-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 4 10:14:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ go version
go version go1.21.4 linux/amd64
Using this command:
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o a.out -buildvcs=true -a -ldflags '-extldflags "-static"'
This executable is then deployed to a Debian 11 host:
# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
# uname -a
Linux sgs-dandelion 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux
This was done because without the -static build, the right glibc so's are not found on the target system. When running the executable under high incoming TLS connection load, I observe that the net.HTTP handler is eventually not called and lots of these errors are logged:
http: TLS handshake error from ...
with this stacktrace:
log.(*Logger).output
/usr/local/go/src/log/log.go:245
log.(*Logger).Printf
/usr/local/go/src/log/log.go:268
net/http.(*Server).logf
/usr/local/go/src/net/http/server.go:3212
net/http.(*conn).serve
/usr/local/go/src/net/http/server.go:1900
This exe handles high load and lots of incoming TLS connections. I have exhaustively looked for a source of CPU starvation such as excessive numbers of goroutines started by my code, or tight loops, etc. and have not found evidence for that.
Here is the output of pprof tree:
(pprof) tree
Showing nodes accounting for 853.85s, 95.70% of 892.25s total
Dropped 1278 nodes (cum <= 4.46s)
----------------------------------------------------------+-------------
flat flat% sum% cum cum% calls calls% + context
----------------------------------------------------------+-------------
595.80s 100% | crypto/internal/bigmod.(*Nat).montgomeryMul
595.80s 66.78% 66.78% 595.80s 66.78% | crypto/internal/bigmod.addMulVVW2048
----------------------------------------------------------+-------------
718.77s 93.03% | crypto/internal/bigmod.(*Nat).Exp
47.75s 6.18% | crypto/internal/bigmod.(*Nat).ExpShort
4.76s 0.62% | crypto/internal/bigmod.(*Nat).montgomeryRepresentation
1.34s 0.17% | crypto/rsa.decrypt
83.10s 9.31% 76.09% 772.62s 86.59% | crypto/internal/bigmod.(*Nat).montgomeryMul
595.80s 77.11% | crypto/internal/bigmod.addMulVVW2048
42.93s 5.56% | crypto/internal/bigmod.addMulVVW (inline)
36.25s 4.69% | crypto/internal/bigmod.(*Nat).maybeSubtractModulus
6.19s 0.8% | crypto/internal/bigmod.(*Nat).reset (inline)
4.19s 0.54% | runtime.duffzero
3.54s 0.46% | runtime.memmove