It's hard to talk about network plugins without also getting into how
we can better align with docker and rocket-native plugins, but given
the immaturity of the whole space, let's try to ignore them and think
about what is the overall behavior we really want to get.
Forking from the kickoff thread.
On Sun, Aug 16, 2015 at 10:00 AM, Gurucharan Shetty <
she...@nicira.com> wrote:
> From what I understand, this hairpin flag is only needed if one uses
> the kube-proxy to do L4 load-balancing. If there are people that would
> be doing L7 load balancing or a more feature rich L4 load balancing
> (based not only on L4 of the packet but also based on the traffic load
> on a particular destination pod) and they don't want to use iptables,
> I guess, mandating the hair-pin is not needed. So would it be correct
> to say that, if a network solution intends to use kube-proxy to do L4
> load balancing then their network plugin should enable hair-pin? This
> also raises a more general question: Is kube-proxy replaceable by
> something else?
kube-proxy is supposed to be one implementation of Services, but not
the only. In fact I have seen at least two demos of Services without
kube-proxy and know of two or three others that I have not seen
directly.
Given that, it seems to make more sense to describe the behavior that
we expect from a network plugin (ideally in the form of a test) and
let people conform to that.
> 1. Right now, there is a requirement that all the pods should also be
> able to speak to every host that hosts pods. This was clearly needed
> for the previous kube-proxy implementation where the source ip and mac
> would change to that of the src host. With the new implementation of
> kube-proxy, do we still need the host to be able to speak to the pods
> via IP? May be there were other reasons for this requirement?
> Thoughts?
This question touches on both directions: pod-to-node and node-to-pod.
It's a fair question - we jump through a lot of hoops to ensure that
access to services from nodes works. Is that really necessary? It's
certainly useful for debugging. If we want things like docker being
able to pull from a registry that is running as a Service or nodes to
access DNS in the cluster we need this access, or we start doing
per-node proxies and host-ports and localhost for everything. See
recent work by one of our interns on making an in-cluster docker
registry work for a taste.
Those are the "obvious" ones. I am interested to hear what other
things people might be doing where a process on the node needs to
access a Pod or Service, or vice-versa. Simplifying this connectivity
would be a win.
> 2. The current network plugin architecture prevents the network plugin
> from changing the IP address of the pod (from the Docker generated
> one). Well, it does not prevent you from changing the IP, but things
> like 'kubectl get endpoints' would only see the docker generated IP
> address. Kubernetes currently is a single-tenant system, so it
> probably is not very important to be able to change the IP address of
> the pod. But in the future, if there are plans for multi-tenancy (dev,
> qa, prod, staging etc in single environment), then overlapping IP
> addresses and logical networks (SDN, network virtualization) may be
> needed , in which case, ability to change the IP address will become
> important. Any thoughts?
As of recently plugins can return status, which includes a different IP.
On Mon, Aug 17, 2015 at 11:01 AM, Casey Davenport
<
casey.d...@metaswitch.com> wrote:
> I don't think the hairpin flag will come into play for Calico. We don't
> build on the Docker bridge, instead creating new veth pair with one end in
> the pod's namespace and one end in the host's for each new pod. I won't
> know for sure if this is true until I test it, but I plan on doing so this
> week.
So you're assuming (rightly, so far) that network plugins are
monolithic and not composeable. Is it valuable or interesting to have
network plugins be composeable? For example, should it be possible to
write a plugin that handles things like installing special iptables
rules and use that plugin alongside a Calico plugin? For a more
concrete example, let's look at what we do in GCE in the default
Docker bridge mode. All of this is done in kubelet, but should be
plugins (IMO).
1) set a broad MASQUERADE rule in iptables (required to egress traffic
from containers because of GCE's edge NAT).
2) configure cbr0 (our docker bridge) with the per-node CIDR
3) tweak Docker config (I think?) to use cbr0
4) soon: install hairpin config on each cbr0 interface
At least two things stand out as pretty distinct - the MASQUERADE
rules and the cbr0 stuff.
The MASQUERADE stuff is needed regardless of whether you use a docker
bridge or Calico or Weave or Flannel, but it's actually pretty
site-specific. In GCE we shouldbasically say "anything that is not
destined for
10.0.0.0/8 needs masquerade", but that's not right. It
should be "anything destined for and RFC1918 address". But there are
probably cases where we would want the masquerade even withing RFC1918
space (I'm thinking VPNs, maybe?). Outside of GCE, the rules are
obviously totally different. sShould this be something that
kubernetes understands or handles at all? Or can we punt this down to
the node-management layer (in as much as we have one)?
The bridge management seems like a pretty clear case of plugin. We
could/should move all of the cbr0/docker0 management into a plugin and
make that the default, probably. This touches on another sore point -
docker itself has flags that we have historically suggested people set
(or not set) around iptables and masquerade. Those flags do things
that conflict with what we want to do, sometimes, but we don't
actively check that they are not set.
Is there any use for composable plugins?
> From my perspective, this should be handled by each individual network
> plugin - each plugin might want to handle this differently (or not at all).
> Perhaps there are other cases to be made for chaining of network plugins,
> but the hairpin case alone doesn't convince me.
I think I am coming to the same conclusion.
>> 1. Right now, there is a requirement that all the pods should also be
>> able to speak to every host that hosts pods.
>
> We've been talking about locking down host-pod communication in the general
> case as part of our thoughts on security. There are still cases where
> host->pod communication is needed (e.g. NodePort), but at the moment our
> thinking is to treat infrastructure as "outside the cluster". As far as
> security is concerned, we think the default behavior should be "allow from
> pods within my namespace". Anything beyond that can be configured using
> security policy.
See above - what about cases where the node needs to legitimately
access cluster services (the canonical case being a docker registry)?
On Tue, Aug 18, 2015 at 5:45 AM, Michael Bridgen <mic...@weave.works> wrote:
> I have been working on adding an API library to CNI[1], which is used for
> rocket's networking, but was intended as a widely-applicable plugin
> mechanism. To date, CNI consists of a specification for invoking plugins[2],
> some plugins themselves, and a handful of example scripts that drive the
> plugins. With an API in a go library, it'd be much easier to use as common
> networking plugin infrastructure for kubernetes, rocket, runc and other
> things that come along.
>
> I like CNI because it does just what is needed, while giving plugins and
> applications a fair bit of freedom of implementation. It's pretty close, and
> at the same level of abstraction, to the networking hooks added to
> Kubernetes recently.
I'm fine with folding things together - that would be great, in fact.
I have not paid attention to CNI in the last 2 months, but I had some
concerns with it, last I looked. I was one of the people arguing that
CNI and CNM were too close to not fold together. I still feel that
there is not really room for more than one or MAYBE two plugin models
in this space. I don't have any particular attachment to owning one
of those, personally, but I am VERY concerned that:
a) implementors like Weave/Calico/... have to implement and maintain
Docker plugins and CNI/k8s plugins with slightly different semantics
b) users experience confusion and/or complexity about how to configure
a solution
> So far, with Eugene @ CoreOS's help, I have pulled together enough of a
> library that Rocket's networking can be ported to it[3] -- not too
> surprising, since much of the code was adapted from there -- and I've
> written a tiny command-line tool that can be used with e.g., runc.
> Meanwhile, Paul @ Calico is getting good results trying an integration with
> Kubernetes.
I'd like to see this expanded on. If we can reduce the space from 3
plugins to 2, that's a win.
> I am aware that I'm late to the party, and that Kubernetes + CNI and various
> other combinations have been discussed before. But I think things have moved
> on a bit[4], so if people don't mind some recapitulation, it'd be useful to
> hear objections and (unmet) requirements and so on. Perhaps it is needless
> to say that I would like this to become a "full community effort", if we
> find that it is a broadly acceptable approach.
I'll have to look at CNI again.
On Tue, Aug 18, 2015 at 9:40 AM, Paul Tiplady
<
paul.t...@metaswitch.com> wrote:
> I like this model because it would allow Calico to provide a single CNI
> plugin for Kubernetes, and have it run for any containerizer (docker, rkt,
> runc, ...). As k8s support for different runtimes grows, this will become an
> increasingly significant issue. (Right now we can just target docker and be
> done with it).
Does CNI work with Docker?
> Of the plugin hooks, CNI maps cleanly to ADD and DELETE. It doesn't have a
> concept of daemons, so the k8s INIT action isn't covered (we don't currently
> use INIT, though we think we will eventually). To handle the functionality
> currently provided by INIT, CNI could potentially be extended to add the
> concept of a daemon, or we could leave the INIT hook as a kick to an
> arbitrary binary that is independent from the CNI mechanism. The latter is
> probably the short-term pragmatic solution, since current plugins' INIT
> hooks will remain unchanged.
k8s plugins were intended to be exec-and-terminate, but docker plugins
are assumed to be be daemons. In both cases we have open issues with
startup, containerization vs not, etc.
> As for the new STATUS plugin action, I'm not sure if that's needed if we use
> CNI; the plugin returns the IP from the ADD call so we can just update the
> pod status after it's created. Was another motivation of STATUS the idea
> that a pod's IP could change? If we don't need to support that use case then
> things integrate very cleanly.
I asked the same question. I think it was "following the existing
pattern on calling docker for status". I think simply returning it
once might be OK.
On Tue, Aug 18, 2015 at 10:09 AM, Paul Tiplady
<
paul.t...@metaswitch.com> wrote:
> For the rate-limiting case, I can't see how you can implement this outside
> the plugin in a generic way; after the plugin has done its thing, how do you
> determine which interface is connected to the pod? For example Calico
> deletes the veth pair that docker creates, so we'd have to duplicate any tc
> config that was set anyway. IMO better to have all that logic in one place,
> where a plugin implementor can see what needs to be implemented.
If I recall, the TC logic is applied at the host interface per-CIDR
not per veth.
> Also, currently kubelet has code to setup cbr0. While that's a great
> pragmatic simplification, I don't think it quite fits with the concept of
> pluggable networking modules -- could that be handled in the plugin INIT
> step? That would make the docker-bridge plugin an equal peer to other
> plugins, which would help flush out issues with the API; if we can't
> implement docker-bridge entirely as a k8s plugin, then maybe the API isn't
> complete enough.
Yeah, I should have read the thread before responding - this is my
conclusion, too.