Experiences with a simple CNI plugin that maps to Docker networking

983 views
Skip to first unread message

Mike Spreitzer

unread,
Feb 25, 2016, 9:48:12 AM2/25/16
to kubernetes-sig-network
I wrote a simple CNI plugin that works by invoking `docker network` commands (you can see it in https://github.com/kubernetes/kubernetes/pull/21956).  I have tested it with k8s v1.1.7 and three libnetwork drivers/plugins (using the cluster/ubuntu install scripting as generalized by https://github.com/kubernetes/kubernetes/pull/21072 to support CNI plugins).  Below you will find some remarks on those experiences and some generalities about CNI plugins.

I first spent some time trying the "overlay" driver that ships with Docker.  Specifically, I tried Docker 1.10.1 on Ubuntu 14.04.03 on the 3.19 kernel.  Things mostly worked, but I was unable to enable hosts to open connections to general containers.  That overlay network does not seem to be designed to support that.  I found that a given host can open connections to containers on that host but not containers on other hosts.  That overlay does not advertise any gateway (or any router at all) functionality; it is a pure isolated virtual ethernet.  In fact, I noticed that Docker was treating my network made with the overlay driver as if it did not have a gateway --- even though I did configure a gateway IP address in that Docker network.  When Docker creates a container on a network with no gateway, Docker conveniently gives the container a second network interface (on 172.18.0.0/16) and ensures the host has a network interface on that subnet too, so that the container can open connections to the outside world.  Hint: do not configure your Docker driver=overlay network to use a subnet in 172.18.0.0/16.  Having eventually created my Docker overlay network with subnet 172.19.0.0/16, I then got containers with two network interfaces: one on 172.18.0.0/16 and one on 172.19.0.0/16.  Aha!  Could such a container serve as a gateway?  I found that its IPv4 non-multicast forwarding flags are all turned on.  I tried adding a route to host A saying that it can reach 172.19.0.0/16 through the 172.18.0.0/16 address of one of its containers.  That enabled request packets to flow forward from host A through my gateway container to another container on host B.  But the container on host B did not send any replies (as far as I could see with `tcpdump -i any`).  And the client address was 172.18.0.1 --- which, it turns out, every host has.

I then tried Calico's libnetwork plugin.  That mostly worked, once I got the Calico network policy adjusted to allow all inbound connections to containers.  However, all health checks fail.  I looked into the case of the k8s dashboard.  Manually doing a `curl` from the dashboard container's host, and from the one master machine, to the dashboard container's endpoint succeeds.  Right before logging each health check failure, the kubelet also logs a complaint about being unable to get status due to lack of `eth0` (indeed, calico's libnetwork plugin gives the name `cali0` to each container's network interface).  Here is an example couplet:

E0224 22:16:44.536571   18072 manager.go:377] NetworkPlugin cni failed on the status hook for pod 'kubernetes-dashboard-v0.1.0-zjjq7' - invalid CIDR address: Device "eth0" does not exist.
I0224 22:16:44.539020   18072 manager.go:1769] pod "kubernetes-dashboard-v0.1.0-zjjq7_kube-system" container "kubernetes-dashboard" is unhealthy (probe result: failure), it will be killed and re-created.

I then tried the Kuryr libnetwork plugin to link to OpenStack Neutron.  It worked, as far as I could tell.  More on that later.

In general, I noticed that the current (v1.1.7) support for CNI plugins is inflexible in this way: there is one "network configuration" (in CNI terms) that is chosen at kubelet startup time and applied to all pods --- unless I am missing something (and if I am, where is it documented?).

Regards,
Mike

Mike Spreitzer

unread,
Feb 25, 2016, 11:43:34 AM2/25/16
to kubernetes-sig-network
Regarding testing with the Kuryr libnetwork plugin to link to OpenStack Neutron, it worked as far as could be expected.  I have not yet got a Neutron network configured to allow connections between hosts and containers.  But the CNI plugin succeeds to invoke `docker network` correctly and the Kuryr libnetwork plugin works as expected.

Regards,
Mike

Dan Williams

unread,
Feb 25, 2016, 1:36:55 PM2/25/16
to Mike Spreitzer, kubernetes-sig-network
Kubernetes pass "eth0" as the CNI_IFNAME and expects plugins to honor
that.  If the Calico plugin is not honoring CNI_IFNAME, then it's not
conforming to the CNI specification.

That said, I think the only reason CNI_IFNAME is done is because CNI
doesn't have a way to pass the interface name back to the caller.
 Honestly I'd like to see that, and then plugins could name the
interface whatever they want and kubernetes would know it.

> I0224 22:16:44.539020   18072 manager.go:1769] pod "kubernetes-
> dashboard-v0.1.0-zjjq7_kube-system" container "kubernetes-dashboard"
> is unhealthy (probe result: failure), it will be killed and re-
> created.
>
>
> I then tried the Kuryr libnetwork plugin to link to OpenStack
> Neutron.  It 
> worked, as far as I could tell.  More on that later.
>
> In general, I noticed that the current (v1.1.7) support for CNI
> plugins is 
> inflexible in this way: there is one "network configuration" (in CNI
> terms) 
> that is chosen at kubelet startup time and applied to all pods ---
> unless I 
> am missing something (and if I am, where is it documented?).

Correct; that was something I argued against originally, but it was
done for two reasons:

1) simplicity: how do you specify different networks at kube startup or
during operations?  There was no agreement on how to do this, but more
importantly:

2) multiple networks meant different things to different people, and
that's how we started this whole Networking SIG conversation, to
resolve exactly that point.

At some point we moved towards policy instead of adding the concept of
"networks" to Kubernetes.  So the point here is that Kube will only
send one network down to the plugin, becuase its up to *your* plugin to
figure out how to map the request to a specific network.  Kubernetes
doesn't care about "networks" because not all plugins use the concept
of a "network" in the same way; network != IP subnet for everyone.

Your CNI plugin probably needs to figure out how it's going to map
containers to networks, or NetworkPolicy objects to whatever concept of
networks your backend has.  When its done that, it knows what "network"
to send to docker or whatever backend your backend has :)

Dan

Mike Spreitzer

unread,
Feb 25, 2016, 1:50:43 PM2/25/16
to kubernetes-sig-network, gmik...@gmail.com
Dan, regarding the Calico case: remember I used the Calico libnetwork plugin with my simple CNI plugin; I did not use the Calico CNI plugin.  It is my CNI plugin that gets the CNI_IFNAME parameter, but the CNI plugin can do nothing with that parameter --- `docker network connect` does not take an interface name parameter.

Regarding network configuration: I think there is still a problematic poverty of interface.  The CNI plugin needs sufficient parameters to decide how to connect the container.  Right now the k8s API user (e.g., composer of a pod spec) can convey very little information down to the CNI plugin.

My next step will probably be to use a distinct Docker network for each k8s Namespace.  My current main interest is using Neutron tenant network = Docker network via the Kuryr libnetwork plugin.  Equating Neutron network with k8s Namespace has one mildly obscure benefit.  In Neutron, security groups can isolate IP layer traffic but not other ethernet traffic.  But I am currently focused on the case where each Neutron network is a distinct virtual ethernet, so there will be no non-IP traffic between Neutron networks ... and thus Neutron security groups will suffice to isolate between k8s Namespaces.

Regards,
Mike

Casey Davenport

unread,
Feb 25, 2016, 1:58:46 PM2/25/16
to Dan Williams, Mike Spreitzer, kubernetes-sig-network
>Kubernetes pass "eth0" as the CNI_IFNAME and expects plugins to honor
>that. If the Calico plugin is not honoring CNI_IFNAME, then it's not
>conforming to the CNI specification.

Yep, the Calico CNI plugin does honor the CNI_IFNAME parameter. However,
I believe Mike was trying to use Calico¹s Docker libnetwork driver, which
doesn¹t use CNI, and so doesn¹t see the CNI_IFNAME parameter.



On 2/25/16, 10:36 AM, "kubernetes-...@googlegroups.com on behalf
of Dan Williams" <kubernetes-...@googlegroups.com on behalf of
>--
>You received this message because you are subscribed to the Google Groups
>"kubernetes-sig-network" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>email to kubernetes-sig-ne...@googlegroups.com.
>To post to this group, send email to
>kubernetes-...@googlegroups.com.
>Visit this group at
>https://groups.google.com/group/kubernetes-sig-network.
>For more options, visit https://groups.google.com/d/optout.

Dan Williams

unread,
Feb 25, 2016, 2:13:06 PM2/25/16
to Mike Spreitzer, kubernetes-sig-network
On Thu, 2016-02-25 at 10:50 -0800, Mike Spreitzer wrote:
> Dan, regarding the Calico case: remember I used the Calico
> *libnetwork* 
> plugin with my simple CNI plugin; I did not use the Calico CNI
> plugin.  It 
> is my CNI plugin that gets the CNI_IFNAME parameter, but the CNI
> plugin can 
> do nothing with that parameter --- `docker network connect` does not
> take 
> an interface name parameter.

Yeah, well, that's a mismatch between the two APIs and you're SOL :(
 However, as I suggested there is room to update/change the CNI spec
and I think that should probably be done.  I'll take an action item to
push that forward.

> Regarding network configuration: I think there is still a
> problematic 
> poverty of interface.  The CNI plugin needs sufficient parameters to
> decide 
> how to connect the container.  Right now the k8s API user (e.g.,
> composer 
> of a pod spec) can convey very little information down to the CNI
> plugin.

Yes.  But the problem is that plugins are different and they may need
many different types of information.  Most of us here will be creating
fairly complex networking backends that will need a lot of information.
 I don't think kubelet should pass down whole stacks of API objects to
the plugins since there's no end to that and it would be completely
Kubernetes specific.

Instead your plugin probably needs to get the information it needs out-
of-band from the apiserver itself, which you can do by including some
of the same code that kubelet itself does and grab the objects that
way.  We had discussed in the last meeting that plugins will likely
need long-running processes to interface with that keep this
information anyway, and mutate it into a form that they can consume
internally.

That said, kubelet could hugely help by passing references to the
authentication methods and apiserver addresses it's using.  It seems
pointless to have to specify certificates/addresses in two places
(first in kubelet and second in some plugin-specific configuration).

I honestly don't think this is any different in libnetwork-land.  CNI
and libnetwork are just simple ways to configure container networking,
but they don't have anything to do with how the logical network is set
up.  For a libnetwork plugin you *still* need to map the docker network
to some construct that Neutron knows about, and that would be the same
thing in Kubernetes with CNI.  What you're missing in Kubernetes is a
convenient "network" object, but as explained earlier I'm not sure
that's really appropriate for everyone.

> My next step will probably be to use a distinct Docker network for
> each k8s 
> Namespace.  My current main interest is using Neutron tenant network
> = 
> Docker network via the Kuryr libnetwork plugin.  Equating Neutron
> network 
> with k8s Namespace has one mildly obscure benefit.  In Neutron,
> security 
> groups can isolate IP layer traffic but not other ethernet
> traffic.  But I 
> am currently focused on the case where each Neutron network is a
> distinct 
> virtual ethernet, so there will be no non-IP traffic between Neutron 
> networks ... and thus Neutron security groups will suffice to
> isolate 
> between k8s Namespaces.

Yes, that's one approach.  Unfortunately with OpenShift we found that
Namespace == Network was too limiting and something customers needed
flexibility on, so we moved away from that model.  If it works for you
thats great though :)

Dan

Mike Spreitzer

unread,
Feb 25, 2016, 2:24:03 PM2/25/16
to kubernetes-sig-network, gmik...@gmail.com
Dan, what were the pain points in OpenShift with Namespace ~ Network?

Thanks,
Mike

Salvatore Orlando

unread,
Feb 25, 2016, 3:04:28 PM2/25/16
to Dan Williams, Mike Spreitzer, kubernetes-sig-network
Some (possibly random) comments inline.

Salvatore

On 25 February 2016 at 20:13, Dan Williams <dc...@redhat.com> wrote:
On Thu, 2016-02-25 at 10:50 -0800, Mike Spreitzer wrote:
> Dan, regarding the Calico case: remember I used the Calico
> *libnetwork* 
> plugin with my simple CNI plugin; I did not use the Calico CNI
> plugin.  It 
> is my CNI plugin that gets the CNI_IFNAME parameter, but the CNI
> plugin can 
> do nothing with that parameter --- `docker network connect` does not
> take 
> an interface name parameter.

Yeah, well, that's a mismatch between the two APIs and you're SOL :(
 However, as I suggested there is room to update/change the CNI spec
and I think that should probably be done.  I'll take an action item to
push that forward.

That is unfortunate, but fairly common when you write adaptors between two different and only relatively stable APIs.


> Regarding network configuration: I think there is still a
> problematic 
> poverty of interface.  The CNI plugin needs sufficient parameters to
> decide 
> how to connect the container.  Right now the k8s API user (e.g.,
> composer 
> of a pod spec) can convey very little information down to the CNI
> plugin.

Yes.  But the problem is that plugins are different and they may need
many different types of information.  Most of us here will be creating
fairly complex networking backends that will need a lot of information.
 I don't think kubelet should pass down whole stacks of API objects to
the plugins since there's no end to that and it would be completely
Kubernetes specific.

Instead your plugin probably needs to get the information it needs out-
of-band from the apiserver itself, which you can do by including some
of the same code that kubelet itself does and grab the objects that
way.  We had discussed in the last meeting that plugins will likely
need long-running processes to interface with that keep this
information anyway, and mutate it into a form that they can consume
internally.

This is fine for the initial iterations. Going forward this might become challenging from a scalability perspective.
For reference this was how OpenStack Neutron networking worked initially, and it failed miserably as soon as people started running this components in scale larger than a test lab.
While it is totally true that the plugin interface has to be kept as simple as possible, the plugin should be facilitated into retrieving the info it needs.
Something like node-level data caches might help, but I don't know if something like this is already available, in the work, or completely bonkers.

 

That said, kubelet could hugely help by passing references to the
authentication methods and apiserver addresses it's using.  It seems
pointless to have to specify certificates/addresses in two places
(first in kubelet and second in some plugin-specific configuration).

My personal opinion here is that probably this is a problem that should be solved by tools for configuration mgmt and deployment automation.
Anyway, is your suggestion that the kubelet should shell out to the plugin passing authentication credentials or certificates on the command line?
Security is not my major area of expertise, but I feel like there might be some implications there.
 

I honestly don't think this is any different in libnetwork-land.  CNI
and libnetwork are just simple ways to configure container networking,
but they don't have anything to do with how the logical network is set
up.  For a libnetwork plugin you *still* need to map the docker network
to some construct that Neutron knows about, and that would be the same
thing in Kubernetes with CNI.  What you're missing in Kubernetes is a
convenient "network" object, but as explained earlier I'm not sure
that's really appropriate for everyone.

I think there's been a bit of handwaving on whether this is something that needs to be exposed in kubernetes.
I personally see pro and cons in doing that, so perhaps at some point we might want to start discussing why that's not appropriate for everyone!


> My next step will probably be to use a distinct Docker network for
> each k8s 
> Namespace.  My current main interest is using Neutron tenant network
> = 
> Docker network via the Kuryr libnetwork plugin.  Equating Neutron
> network 
> with k8s Namespace has one mildly obscure benefit.  In Neutron,
> security 
> groups can isolate IP layer traffic but not other ethernet
> traffic.  But I 
> am currently focused on the case where each Neutron network is a
> distinct 
> virtual ethernet, so there will be no non-IP traffic between Neutron 
> networks ... and thus Neutron security groups will suffice to
> isolate 
> between k8s Namespaces.

Yes, that's one approach.  Unfortunately with OpenShift we found that
Namespace == Network was too limiting and something customers needed
flexibility on, so we moved away from that model.  If it works for you
thats great though :)

Considering the network policy design and the Docker integration with Neutron via Kuryr what Mike is doing appears very natural.
On the other hand, this is too much of a semantic change in the definition of a Kubernetes namespace.
On the other hand this network/namespace association could be something specific to your plugin - and in the neutron case you would then have a logical router connecting them all.
This will preserve the benefit of L2 isolation, still enforce network policies, and perhaps might give you the best of both world.

Mike Spreitzer

unread,
Feb 25, 2016, 3:51:13 PM2/25/16
to kubernetes-sig-network, dc...@redhat.com, gmik...@gmail.com
Sorry I did not make it clear enough, but Salvatore you filled in the blanks.  In fact, let me try to push the reset button on my complaint about interface poverty.  The discussion we have been having is exactly the answer: let the k8s user specify the desired connectivity, and let the network plugin implement it.  I suggested an implementation outline that has a Neutron network per k8s Namespace and a Neutron router that connects them all.  Remember, we are still not attempting multi-tenancy.  Use Neutron security groups to allow the desired connectivity.

Regards,
Mike

Antoni Segura Puimedon

unread,
Feb 25, 2016, 4:31:53 PM2/25/16
to Salvatore Orlando, Dan Williams, Mike Spreitzer, kubernetes-sig-network
You hit my main concern with the proposed approach Salvatore. With my
current understanding of k8s I also arrived to the conclusion that going for
the getting the info from the api server out of band would have to be the
way. And this just sounds like the biggest scaling issues that the Neutron
reference implementation had retrieving things like metadata that needed
the hosts to go back to the api server.

I would like for another kind of synchronization to be possible. I haven't
read enough of the k8s controller code to know if this is possible, but if
it were, I think this would be a good option for the future:

k8s scheduler have a hook or plugin point (it can be sending things on
stdin/out to another process ala neovim for example, and that this process
can take an action over the objects and return them for re-insertion in the
scheduler. For example it could read the policy, convert that policy into
what it means for its backend network and add an annotation that said
"You shall be plugged to my port with this security mode".

With this, CNI job would be very simple and likely we would be able to
have non-trusted worker nodes, i.e., with as little access to backend
API networks as possible.

Such hook/plugin point would also allow people to augment the k8s
scheduler to optimize the selection with monitoring data that the plugin
may have access to.

Dan Williams

unread,
Feb 25, 2016, 4:50:12 PM2/25/16
to Mike Spreitzer, kubernetes-sig-network
On Thu, 2016-02-25 at 11:24 -0800, Mike Spreitzer wrote:
> Dan, what were the pain points in OpenShift with Namespace ~ Network?

Namespaces are the current mechanism of admission/access control.  So
if you have two organizations and you assign them each a namespace, but
then later your business merges these and you'd like to allow these two
organizations' resources to talk to each other, you can't do that as
easily if namespace == network.

Anyway, what we ended up doing was just abstracting things a bit with
separate network objects that you can assign to Namespaces.  So
Namespace A and B can share the same network object, and if they no
longer need to talk to each other you can split B away from A by
assigning it a separate network.

I'm probably simplifying what you're doing too much, but I was just
trying to point out that assuming that a Kubernetes Namespace is the
same thing as some "network" object didn't work for us.  A level of
abstraction so you can mix & match works better, but it's still not as
flexible as we need to be for OpenShift.

Dan

Mike Spreitzer

unread,
Feb 25, 2016, 4:57:01 PM2/25/16
to kubernetes-sig-network, gmik...@gmail.com
I think it's important to remember the distinction between Namespace and tenant.  It sounds like your problem was with equating Namespace and tenant, as you have mentioned before.  Since we are not attempting multi-tenancy yet, I think it may be viable to have a Neutron network per Namespace and all the Neutron networks connected via a router, as I have outlined and motivated elsewhere in this conversation.

Avery Davis

unread,
Apr 17, 2017, 6:44:24 PM4/17/17
to kubernetes-sig-network
Mike Spreitzer:

I just stumbled across your to_docker CNI plugin while investigating options to run k8s using Docker's native network stack.  How did your test using  Kuryr Neutrons to subnet your cluster network in accordance with k8s container connectivity requirements?  I would like to avoid using etcd and flannel when the same features seem to present in Docker's latest (v17.03) built-in network management interface.  Please let me know if you've posted any results / updates elsewhere.

Thanks!
Avery Davis

mspr...@us.ibm.com

unread,
Apr 21, 2017, 12:43:12 AM4/21/17
to kubernetes-sig-network
On Monday, April 17, 2017 at 6:44:24 PM UTC-4, Avery Davis wrote:
Mike Spreitzer:

I just stumbled across your to_docker CNI plugin while investigating options to run k8s using Docker's native network stack.  How did your test using  Kuryr Neutrons to subnet your cluster network in accordance with k8s container connectivity requirements?  I would like to avoid using etcd and flannel when the same features seem to present in Docker's latest (v17.03) built-in network management interface.  Please let me know if you've posted any results / updates elsewhere.

Thanks!
Avery Davis

It's been a while, let me see if I can remember correctly.

I had two big problems.  One was that, by deliberate design, the k8s master and worker nodes' main network namespaces did not handle the pod-to-pod traffic.  That made `kube-proxy` useless.  A colleague developed an alternative that deploys an haproxy for each non-headless k8s Service to implement the implied load balancing.

I also wanted to allow "overlapping IPs".  That is, each tenant gets to use all of 172.19.0.0/16 for his pods and there is no communication possible between one tenant's private addresses and another's.  The down side of this is that, because those addresses are not uniquely meaningful, the kubelet can not use them to probe pods.  I extended the CNI plugin to also create a Neutron "floating IP" for each pod and sneak it back to the kubelet, which a colleague slightly modified to use the floating IPs for network-based probes of pods.

Regards,
Mike
Reply all
Reply to author
Forward
0 new messages