Amplifying effects of TLS connection flood

97 views
Skip to first unread message

Ben Burkert

unread,
Oct 21, 2016, 7:14:45 PM10/21/16
to golang-nuts
Hi folks,

This started out as a response to the 'http ssl "can't identify protocol"'
thread but encompasses more than the original discussion:

https://groups.google.com/d/msg/golang-nuts/FeI0f4TBhWk/hkWKrDNpAgAJ

I have also noticed a similar problems with high numbers of TCP connections in
the CLOSE_WAIT state for a Go 1.7 service doing TLS termination that sits
behind an AWS ELB in TCP mode. This service periodically receives a burst of
request from new clients without existing connections.

These new TLS connections are causing the service's CPU load to increase to
100% of all available cores. A dump of the goroutines during one of these
spikes shows that most goroutines are blocked while trying to perform a TLS
handshake, specifically the rsa.SignPKCS1v15 step:

https://gist.github.com/benburkert/ee0319839bfec4c740e081ab4738cba4

This large number of goroutines contending for an insufficient amount of CPU
capacity causes a high number of the TLS handshake to stall for many seconds.
The ELB instances in front of this service is configured to timeout connections
when no data is received for 30 seconds. When this timeout is triggered the ELB
sends a FIN packet to the server and the kernel moves the connection into the
CLOSE_WAIT state.

At this point the goroutine corresponding to the TCP connection is still
attempting to compute the TLS handshake, even though the connection for the
handshake is dead. This is causing an amplifying effect where handshakes for
dead connections are being scheduled instead of handshakes for live
connections.

I have not observed the same amplifying effect when using Nginx for TLS
termination in front of the service. I expect that this is because RSA
signatures are more performant in OpenSSL vs crypto/tls and do not introduce
the same CPU bottleneck.

Perhaps there is a way to cancel TLS handshakes when the underlying connection
is dead and avoid the amplification. A HandshakeContext method could be added
to tls.Conn that pushes the handshake attempt onto a per-config work queue,
then watches for a result or canceled context. If the context is canceled, it
marks the handshake as dead. The other side of the work queue could check for
dead handshakes before performing expensive work. I hope that this could tie in
nicely with the improvements to CloseNotifier in go 1.8.

Thanks,
-Ben
Reply all
Reply to author
Forward
0 new messages