Google Cloud has an MTU of 1460. And this is propagated via DHCP so that when you create an Instance, it has that properly on eth0.
But, when setting up GKE, it sets up a MTU of 1500 on the docker0 bridge.
This means that containers are doing path-mtu-discovery (PMTU) as they route back, and, getting ICMP would frag from their own docker0 address.
In practise this leads to poor performance on e.g. TLS. We can see this in the image below.
Non-image:
ClientIP -> ClusterIP SYN
ClusterIP -> ClientIP SYN/ACK
ClientIP -> ClusterIP ACK
ClientIP -> ClusterIP Client Hello
ClusterIP -> ClientIP ACK
ClusterIP -> ClientIP Server Hello [>1460 payload]
docker0 -> ClusterIP ICMP Would Fragment
ClusterIP -> CLientIP Server Hello Done [this is an error, it still has the hello in flight]
ClusterIP -> ClientIP : out of order, now re-segment and resent server hello
Now, If i login to each of the nodes that were created by the GKE template, I can set the MTU manually on docker0.
What is the right way to do this? It has a fairly big affect on connection setup performance.
Is this only an issue on GKE? or more general?
this is with calico if it matters (not sure it does, this is internal to the node).
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
link/ether 42:01:0a:a2:00:03 brd ff:ff:ff:ff:ff:ff
valid_lft 58015sec preferred_lft 58015sec
inet6 fe80::4001:aff:fea2:3/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:93:c1:14:a1 brd ff:ff:ff:ff:ff:ff
valid_lft forever preferred_lft forever
