How to improve the performance of io.Copy during copying data from one TCP connection to another TCP connection

2,806 views
Skip to first unread message

subrata das

unread,
Nov 29, 2016, 2:01:51 PM11/29/16
to golang-dev
Hi All,


Debugging HTTP CONNECT Proxy which is written in golang.
In HTTP connect proxy, when the HTTP connect request comes from Client to Proxy, Proxy Hijacking the Client connection and then do the dialing using net.Dial on Server's Host:Port and get the Server's connection.
Establish the SSL connection between Client and Server.
So, When 200Kb data Copy Starts from Server to Client and Client to Server through Proxy using io.Copy(), 
that time io.Copy() is taking more time.

I'm doing the load testing on this HTTP Connect Proxy using Spirent Tool,
I have installed the proxy in the system which contains 32 Cores and 10GB network card.

Spirent is act as a Client and Server.
Without the Proxy between them, Spirent Client and Server can able to handle 5000 TPS(Transaction Per Second)

When the Proxy Sits between Spirent Client and Server, it can handle only 500 TPS(Transaction Per Second)
I checked the CPU and MEM at that time ---
CPU = 10% and MEM = 8%

I installed HA Proxy on the same system where the Proxy had installed to check all TCP linux tuning parameters are correct or not. Through HA Proxy, Client and Server is able to handle around 4000 TPS.

Run the proxy by setting both runtime.GOMAXPROCS(1) and runtime.GOMAXPROCS(runtime.NumCPU), but haven't found any improvement.
Run the proxy by doing logging off, but haven't found any improvement.


Then run 4 instances of proxy and loadbalanced using Ha Proxy and using this configuration also got 600 TPS and not more than that.

Then I ran the go profiler on HTTP Connect proxy, during sending 1000 TPS for 200Kb data downloading,
I saw io.Copy function is taking much more time,

================================
(pprof) list copyAndClose
Total: 12.04s
ROUTINE ======================== thirdparty/goproxy.copyAndClose in /home/visp-verizon/repos/new/VISP/goobee/src/thirdparty/goproxy/https.go
         0      2.68s (flat, cum) 22.26% of Total
         .          .    273: }
         .          .    274:}
         .          .    275:
         .          .    276:func copyAndClose(ctx *ProxyCtx, w, r net.Conn) {
         .          .    277: connOk := true
         .      1.79s    278: if _, err := io.Copy(w, r); err != nil {
         .          .    279: connOk = false
         .      110ms    280: ctx.Warnf("Error copying to client: %s", err)
         .          .    281: }
         .      780ms    282: if err := r.Close(); err != nil && connOk {
         .          .    283: ctx.Warnf("Error closing: %s", err)
         .          .    284: }
         .          .    285:}

ROUTINE ======================== thirdparty/goproxy.(*ProxyHttpServer).dial in /home/visp-verizon/repos/new/VISP/goobee/src/thirdparty/goproxy/https.go
         0      910ms (flat, cum)  7.56% of Total
         .          .     48:
         .          .     49:func (proxy *ProxyHttpServer) dial(network, addr string) (c net.Conn, err error) {
         .          .     50: if proxy.Tr.Dial != nil {
         .          .     51: return proxy.Tr.Dial(network, addr)
         .          .     52: }
         .      910ms     53: return net.Dial(network, addr)
         .          .     54:}

====================================

Is it possible to improve the improve the performance net.Dial function after creating Dialer object ?
Can we increase the performance of io.Copy() after setting some TCP connection related parameters ?



Here is the TCP related Tuning did in Linux system ---
=======================================
kernel.core_pattern = /var/cores/core.%e.%p.%h.%t
net.core.rmem_default = 131072
net.core.rmem_max = 262144
net.core.wmem_max = 2097152
net.ipv4.tcp_rmem = 4096 131072 262144
net.ipv4.tcp_wmem = 10240 524288 2097152
net.ipv4.tcp_congestion_control = cubic
net.core.somaxconn = 65535
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_max_syn_backlog = 3240000
net.ipv4.tcp_window_scaling = 1
vm.min_free_kbytes = 65536
vm.swappiness = 0
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_orphan_retries=1
net.ipv4.tcp_fin_timeout=10
net.ipv4.tcp_max_orphans=400000
net.ipv4.tcp_mtu_probing=1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_low_latency = 1
net.core.netdev_max_backlog = 250000
net.core.wmem_default = 524288
net.core.optmem_max = 16777216
net.ipv6.conf.all.use_tempaddr = 0
net.ipv6.conf.default.use_tempaddr = 0 
fs.mqueue.msg_max = 8192
kernel.panic_on_oops = 1
kernel.panic = 1

net.ipv4.tcp_retrans_collapse = 0

net.ipv6.route.max_size = 8388608

net.ipv6.route.max_size = 8388608

net.ipv6.route.gc_thresh = 8388608

net.ipv6.route.gc_elasticity = 11


net.ipv4.tcp_retries2 = 8

net.ipv4.tcp_no_metrics_save = 1

 

# EndCustomSettingsForWASP

 

#conntrack max value

net.ipv4.netfilter.ip_conntrack_max = 10000000

 

fs.file-max = 10000000

fs.nr_open = 10000000

============================================================

We set the ulimit to 10000000



Thanks & Regards,
Subrata Das

Daniel Martí

unread,
Nov 29, 2016, 2:53:57 PM11/29/16
to subrata das, golang-dev
Have you tried io.CopyBuffer? io.Copy uses a buffer of 32*1024 bytes.
Perhaps a bigger (or smaller?) buffer size makes a difference.

In any case, this should probably go in golang-nuts since it's probably
a program problem instead of an std problem.

--
Daniel Martí - mv...@mvdan.cc - https://mvdan.cc/

subrata das

unread,
Nov 30, 2016, 7:55:51 AM11/30/16
to Daniel Martí, golang-dev
Hi Daniel,

Thanks for the quick response.
I use the io.CopyBuffer(), even if the performance it's not increasing.

I ran the go profiler for 180 second ...
It shows,  io.Copy() or io.CopyBuffer() both is taking 60.50 second, out of 180 second.
io.Copy() is taking 0.03 second for single call.

It looks like, it can't able to use the network interface card bandwidth which is 10GB link.

Thanks,
Subrata

--
Regards,
Subrata Das

Russ Cox

unread,
Nov 30, 2016, 10:16:48 AM11/30/16
to subrata das, golang-nuts
+golang-nuts, bcc golang-dev

How did you get the pprof profile? How long did your test run for? pprof has 12 seconds of CPU profile samples (the Total: line). If your test ran for a lot more than 12 seconds (typically a profile is fetched for 30 seconds), then the problem is not the CPU usage.

If the test really did run for 12 seconds (or less) then usually the thing to do is look at the whole profile, with commands like 'top10' or 'web'. 
 

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ramki...@hotmail.com

unread,
May 23, 2020, 4:25:03 PM5/23/20
to golang-dev
Did you ever find the solution to this? If so what was it?
Reply all
Reply to author
Forward
0 new messages