Hairpin mode interferes with IPv6 duplicate address detection (DAD)

178 views
Skip to first unread message

Dane LeBlanc

unread,
Apr 13, 2017, 7:37:22 PM4/13/17
to kubernetes-...@googlegroups.com

I’m following up on a discussion that we had on our conference call last Thursday involving Issue #32291 “Container IPv6 address is marked as duplicate (dadfailed)” whenever virtual bridges are used (e.g. for the CNI bridge plugin). As I described on the call, the root cause of this problem is that Kubelet enables hairpin mode on bridge veth interfaces. The hairpin mode causes a pod to see echoes of its own IPv6 Neighbor Solicit (NS) messages during duplicate address detection for each of the IPv6 addresses being assigned (including link local), and it therefore marks the addresses as duplicates.

 

This is a race condition in that sometimes the pod can send out its NS packets for DAD before kubelet turns on hairpin mode on the bridge.

 

The enabling of hairpin mode was added about 1 ½ years ago to enable a pod to access its own service IP, i.e. using a service IP for a service that it is hosting/serving.

 

Here is a tshark capture on a bridge veth showing the echoed NS message (Note: this is only visible on the veth, you won’t see the echo on the bridge intf).

 

[root@kube-minion-1 mynet6]# tshark -i veth1462bed0

Running as user "root" and group "root". This could be dangerous.

Capturing on 'veth1462bed0'

  1 0.000000000           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2

  2 0.000047855           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2

  3 0.371273780           :: -> ff02::1:fff4:6502 ICMPv6 78 Neighbor Solicitation for fe80::858:aff:fef4:6502

  4 0.371300444           :: -> ff02::1:fff4:6502 ICMPv6 78 Neighbor Solicitation for fe80::858:aff:fef4:6502

  5 0.989371479           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2

  6 0.989412639           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2

 

The NS echo causes the following “dadfailed” failures in the pod:

 

sh-4.2# ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

    inet6 ::1/128 scope host

       valid_lft forever preferred_lft forever

3: eth0@if19936: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP

    link/ether 0a:58:0a:f4:65:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

    inet 10.244.101.2/24 scope global eth0

       valid_lft forever preferred_lft forever

    inet6 2001:101::2/64 scope global tentative dadfailed

       valid_lft forever preferred_lft forever

    inet6 fe80::858:aff:fef4:6502/64 scope link tentative dadfailed

       valid_lft forever preferred_lft forever

sh-4.2#

 

Here is a look at the hairpin settings on the bridge veths:

 

[root@kube-minion-1 mynet6]# cd /sys/devices/virtual/net/cbr0/brif

[root@kube-minion-1 brif]# for i in `ls`; do echo -n "$i: "; cat $i/hairpin_mode; done

veth1462bed0: 1

vethcfebc91e: 1

[root@kube-minion-1 brif]#

 

To prove that this is indeed the issue, manually turn off hairpin mode:

 

[root@kube-minion-1 brif]# echo 0 > veth1462bed0/hairpin_mode

[root@kube-minion-1 brif]# echo 0 > vethcfebc91e/hairpin_mode

[root@kube-minion-1 brif]#

 

And then toggle the pod’s eth0 to re-trigger DAD. Now there is a single NS message, and DAD is successful:

 

root@kube-minion-1 brif]# tshark -i veth1462bed0Running as user "root" and group "root". This could be dangerous.

Capturing on 'veth1462bed0'

  1 0.000000000           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2

  2 0.813286276           :: -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2

  3 0.913861831           :: -> ff02::1:fff4:6502 ICMPv6 78 Neighbor Solicitation for fe80::858:aff:fef4:6502

  4 1.919123804 fe80::858:aff:fef4:6502 -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2

  5 1.919173117 fe80::858:aff:fef4:6502 -> ff02::2      ICMPv6 70 Router Solicitation from 0a:58:0a:f4:65:02

  6 2.809869645 fe80::858:aff:fef4:6502 -> ff02::16     ICMPv6 90 Multicast Listener Report Message v2

  7 5.933181709 fe80::858:aff:fef4:6502 -> ff02::2      ICMPv6 70 Router Solicitation from 0a:58:0a:f4:65:02

  8 9.943048880 fe80::858:aff:fef4:6502 -> ff02::2      ICMPv6 70 Router Solicitation from 0a:58:0a:f4:65:02

^C8 packets captured

[root@kube-minion-1 brif]#

 

I have also been able to eliminate the DAD failures by manually restarting kubelet with hairpin-mode set to ‘none’. (Curiously, setting hairpin-mode to ‘promiscuous-bridge’ didn’t seem to help).

 

There is an Enhanced DAD feature (RFC 7527) that was designed to fix this very issue, but unfortunately it is not readily available in kernels. This may provide a long-term solution.

 

One short term solution would be to disable IPv6 DAD in the pod. (I’ve included this in CNI PR #416). The major issue with doing this in the CNI bridge plugin is that this would have to be done regardless of whether the plugin is being used in a Kubernetes environment or not: There’s no way to anticipate whether hairpin mode will be used on the bridge veths… that is, hairpin mode is enabled by kubelet *after* the CNI plugin completes an addCmd operation.

 

Also, disabling IPv6 DAD will work only if IPv6 addresses assigned to pods are guaranteed to be unique. This will be true if the host-local IPAM plugin is used (IPv6 addresses get assigned round-robin), but uniqueness may not be guaranteed for other IPAM plugins, and it won’t be guaranteed if/when we introduce SLAAC addressing.

 

I don’t know if other solutions can be considered (e.g. fixing the pod self-service issue with something other than hairpin mode, or filtering out the echoed NS messages somehow?).

 

Any thoughts, suggestions?

 

Regards,

Dane

 

Tim Hockin

unread,
May 12, 2017, 12:36:30 AM5/12/17
to Dane LeBlanc, kubernetes-sig-network
Sorry to let this sit. We're looking at other ways to get the same
results without hairpin or maybe without a bridge. It's not helpful
right now, but I am hopeful that in the future we won't need it, so
maybe OK to just work-around for a while?

Maybe disable_dad should be a param passed in?
> --
> You received this message because you are subscribed to the Google Groups
> "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-sig-ne...@googlegroups.com.
> To post to this group, send email to
> kubernetes-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages