I've been running some performance tests with Flannel on AWS and noticed an interesting result when switching from m3.large to c4.xlarge with enhanced networking. I'm on CoreOS stable running Flannel 0.3.0.
I have two m3.large nodes running in the same AZ, in a VPC, in us-west-2. When running a container with net=host using iperf I see 713 Mbits/sec with a ping latency of round-trip min/avg/max = 0.269/0.317/0.393 ms.
```
/ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.252.128.209 port 5001 connected with 10.252.128.104 port 40821
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 853 MBytes 713 Mbits/sec
```
Running in a container using flannel this drops down to 614 Mbits/sec with a latency of round-trip min/avg/max = 0.404/0.455/0.547 ms.
```
/ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 172.21.102.27 port 5001 connected with 172.21.6.13 port 43429
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.4 sec 758 MBytes 614 Mbits/sec
```
This seems to be inline with the flannel overhead I've been reading out. I also wanted to test ec2 with enhanced networking to see if the margin of separation would be minimized. The same setup running on c4.xlarges yielded 1 Gbits/sec which is the same as running the Amazon Linux AMI.
```
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.252.129.146 port 5001 connected with 10.252.129.145 port 44632
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 1.19 GBytes 1.02 Gbits/sec
```
However, when running with flannel, I get a drastic reduction in throughput. I tried several runs and always came in around the 130 Mbits/sec
```
/ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 172.21.4.17 port 5001 connected with 172.21.71.2 port 46283
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.4 sec 171 MBytes 138 Mbits/sec
[ 5] local 172.21.4.17 port 5001 connected with 172.21.71.2 port 46319
[ 5] 0.0-10.1 sec 161 MBytes 134 Mbits/sec
[ 4] local 172.21.4.17 port 5001 connected with 172.21.71.3 port 58268
[ 4] 0.0-10.3 sec 188 MBytes 153 Mbits/sec
```
I see a couple of possible reasons:
1) I'm doing something drastically wrong
2) There's odd behavior when handling UDP packets with c4.xlarge instances (I should run another explicit test with UDP, but in previous tests, I don't remember this being an issue).
3) I got some wonky instances/some other stuff was going on with the nodes.
I want to try a few different backends, but I'm curious if others have run into this issue, or if anyone can offer advice for optimizing the flannel overlay network?
Thanks,
Mike