NetworkPolicy vs Services

400 views
Skip to first unread message

Tim Hockin

unread,
Jan 10, 2017, 6:23:19 PM1/10/17
to kubernetes-sig-network
Hi all,

I know we discussed the difference between expressing NetPolicy's
selector in terms of Services vs a selector (specifically the
receiving end of the NP, not the `from`), but I recently had a
conversation with some folks who are looking at Services via a
different implementation, and it's not clear whether NP will work for
them.

Background: we chose to express the subject of a policy as a Pod
selector because there was some push-back -- we may want to apply
policy to pods that were not in a service. This works because the
implementations of Services (in particular kube-proxy) do NAT and
deliver packets [from:client to:pod], so we could filter on
destination IP == pod IP.

This other implementation of Services would arrange for traffic to
service VIPs to arrive at the pod with the destination IP intact
[from:client to:vip], allowing for direct-return. It seems to me that
the NetworkPolicy implementations might totally miss this case.

Those of you that have implementations - would your implementation
work? I know we never said that all NP implementations had to work
with all Service implementations, but several out there are very
network-agnostic. Expressing a pod selector and forcing lookups of
"which services are fronting this pod" for each pod seems really
clumsy.

What do you all think?

Tim

dim...@aporeto.com

unread,
Jan 10, 2017, 9:04:03 PM1/10/17
to kubernetes-sig-network
Hi Tim:

For Trireme at least it should work either way. We have pushed some updates and we don't 
need to make any decisions based on source/destination IPs. What we care
about is the identity (i.e. labels) of the transmitter and the receiver and that's all.

Cheers,

Dimitri

Stas Kraev

unread,
Jan 10, 2017, 11:59:22 PM1/10/17
to kubernetes-sig-network
Hi Tim, 

Romana NetworkPolicy implementation makes decisions based on the
data encoded in IP address. We would have to take over Service IP 
address assignment to accommodate such design.

Regards

Salvatore Orlando

unread,
Jan 11, 2017, 5:05:26 AM1/11/17
to Stas Kraev, kubernetes-sig-network
Hello Tim.

As for Romana, OVN ACLs are implemented taking into account only Pod IP addresses.
OVN will likely need changes - albeit not major - to accomodate scenario where NPs can policy traffic for services, and Pods could receive traffic with src_ip == svc_cluster_ip

To this aim it's worth noting that in K8S/OVN integration - which does not use kube-proxy - traffic to/from service cluster IP is always 'translated' to a Pod IP.
If NP were to apply also to services, and include services in the from clause, I would expect this to be done via selectors (NOTE: I am just speculating here, and assuming no mix & match of backend technologies - ie: k8s networking managed exclusively by OVN).
In this case he K8S/OVN integration might look at endpoints for those services, and use their IP addresses to program OVN ACLs, and apply them to the appropriate OVN logical ports.

Salvatore


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-network@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-sig-network.
For more options, visit https://groups.google.com/d/optout.

Thomas Graf

unread,
Jan 11, 2017, 9:31:38 AM1/11/17
to Tim Hockin, kubernetes-sig-network
I assume the VIP would be the known by network implementations through
the service spec so the link can be made regardless of whether an
implementation uses its own tagging mechanism to attach identity to
the packet or not.

Will there be an indication in the service spec that a a service is to
deliver packets to the pod with destination IP left intact? As of now
we would do the service DNAT translation on the node just befor
delivering into the pod which would be undesirable.


On 10 January 2017 at 15:22, 'Tim Hockin' via kubernetes-sig-network
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
> To post to this group, send email to kubernetes-...@googlegroups.com.

Guru Shetty

unread,
Jan 11, 2017, 11:06:24 AM1/11/17
to Tim Hockin, kubernetes-sig-network
Tim,
  To make sure that I understand what you are saying, let me try to rephrase. You are saying that there is an implementation where a pod will receive traffic with the destination IP address of a service (a VIP). Is that right? If so, where does the translation from service VIP to destination pod IP happen? How does the destination pod's network stack even accept such a packet?

Thanks


On 10 January 2017 at 15:22, 'Tim Hockin' via kubernetes-sig-network <kubernetes-...@googlegroups.com> wrote:

Tim

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-network@googlegroups.com.

Alex Pollitt

unread,
Jan 11, 2017, 11:30:02 AM1/11/17
to Guru Shetty, Tim Hockin, kubernetes-sig-network
Echoing Guru's comments, I would find it helpful to understand how this service implementation interacts with basic networking (independent of network policy). How is the packet getting to the pod? Does this service implementation come tightly coupled to a particular network implementation?

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To post to this group, send email to kubernetes-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To post to this group, send email to kubernetes-...@googlegroups.com.

Tim Hockin

unread,
Jan 11, 2017, 11:55:38 AM1/11/17
to Salvatore Orlando, Stas Kraev, kubernetes-sig-network
On Wed, Jan 11, 2017 at 2:05 AM, Salvatore Orlando
<salv.o...@gmail.com> wrote:
> Hello Tim.
>
> As for Romana, OVN ACLs are implemented taking into account only Pod IP
> addresses.
> OVN will likely need changes - albeit not major - to accomodate scenario
> where NPs can policy traffic for services, and Pods could receive traffic
> with src_ip == svc_cluster_ip
>
> To this aim it's worth noting that in K8S/OVN integration - which does not
> use kube-proxy - traffic to/from service cluster IP is always 'translated'
> to a Pod IP.
> If NP were to apply also to services, and include services in the from
> clause, I would expect this to be done via selectors (NOTE: I am just

I am not proposing that change, at this time, though it may be
something we want to reopen. I just wanted everyone to consider the
implications of this particular form of Services vs their current
implementations of NP. I expect many/most will break.
>> email to kubernetes-sig-ne...@googlegroups.com.
>> To post to this group, send email to
>> kubernetes-...@googlegroups.com.
>> Visit this group at
>> https://groups.google.com/group/kubernetes-sig-network.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-sig-ne...@googlegroups.com.
> To post to this group, send email to
> kubernetes-...@googlegroups.com.

Tim Hockin

unread,
Jan 11, 2017, 11:58:40 AM1/11/17
to Thomas Graf, kubernetes-sig-network
On Wed, Jan 11, 2017 at 6:31 AM, Thomas Graf <tg...@suug.ch> wrote:
> I assume the VIP would be the known by network implementations through
> the service spec so the link can be made regardless of whether an
> implementation uses its own tagging mechanism to attach identity to
> the packet or not.

The VIP would certainly be knowable, but it would involve the Service
and Endpoints resources, which most NP implementations do not need to
examine today.

> Will there be an indication in the service spec that a a service is to
> deliver packets to the pod with destination IP left intact? As of now
> we would do the service DNAT translation on the node just befor
> delivering into the pod which would be undesirable.

I would argue against any such indication. The Service abstraction is
that we LB traffic to your pod. Whether we do that via NAT or VIP
should not matter to your pod, as long as your pod doesn't have to be
particularly aware of the difference. I think.

Tim Hockin

unread,
Jan 11, 2017, 12:00:25 PM1/11/17
to Guru Shetty, kubernetes-sig-network
On Wed, Jan 11, 2017 at 8:06 AM, Guru Shetty <guru...@gmail.com> wrote:
> Tim,
> To make sure that I understand what you are saying, let me try to
> rephrase. You are saying that there is an implementation where a pod will
> receive traffic with the destination IP address of a service (a VIP). Is
> that right? If so, where does the translation from service VIP to
> destination pod IP happen? How does the destination pod's network stack even
> accept such a packet?

The VIP is added as a local address inside the pod's netns, and some
routing trickery is used to deliver it to that netns.
>> email to kubernetes-sig-ne...@googlegroups.com.
>> To post to this group, send email to
>> kubernetes-...@googlegroups.com.

Tim Hockin

unread,
Jan 11, 2017, 12:04:16 PM1/11/17
to Alex Pollitt, Guru Shetty, kubernetes-sig-network
On Wed, Jan 11, 2017 at 8:29 AM, Alex Pollitt <al...@tigera.io> wrote:
> Echoing Guru's comments, I would find it helpful to understand how this
> service implementation interacts with basic networking (independent of
> network policy). How is the packet getting to the pod? Does this service
> implementation come tightly coupled to a particular network implementation?

I'm not sure it is "tightly" coupled but it does make some
assumptions, such as the ability for DSR to work (so no NAT). It's a
poc for now, but the issue raised with NP was one that I could not
answer satisfactorily, and in fact my fears have been mostly
confirmed.

Tim

Tim Hockin

unread,
Jan 11, 2017, 12:26:16 PM1/11/17
to Guru Shetty, kubernetes-sig-network
I don't have all the details, but it doesn't sound that tricky to me.
Having a VIP as a local address is a pretty well-understood technique,
and static(ish) load-balancer routes are not rocket science.

Sure, it's a particular implementation, but I don't think what they
are doing is out of bounds - maybe I am too lax? Assuming NAT in all
cases seems too restrictive.

On Wed, Jan 11, 2017 at 9:15 AM, Guru Shetty <gu...@ovn.org> wrote:
>> The VIP is added as a local address inside the pod's netns, and some
>> routing trickery is used to deliver it to that netns.
>
>
> IMHO, that looks like a very specific implementation to me with a lot of
> trickery which breaks some fundamental networking rules.

Guru Shetty

unread,
Jan 11, 2017, 12:42:13 PM1/11/17
to Tim Hockin, kubernetes-sig-network
On 11 January 2017 at 09:25, Tim Hockin <tho...@google.com> wrote:
I don't have all the details, but it doesn't sound that tricky to me.
Having a VIP as a local address is a pretty well-understood technique,
and static(ish) load-balancer routes are not rocket science.

Now I see how this works and it suddenly looks reasonable.
 

Sure, it's a particular implementation, but I don't think what they
are doing is out of bounds - maybe I am too lax?  Assuming NAT in all
cases seems too restrictive.

A NetworkPolicy's selectors is a label that selects pods on which we need to apply ingress network policys. So even in this case, that does not break anything

A NetworkPolicy's from clause is pods and namespaces. Since the client IP remains as-is, it should not break anything either.

Am I missing cases?



 

On Wed, Jan 11, 2017 at 9:15 AM, Guru Shetty <gu...@ovn.org> wrote:
>> The VIP is added as a local address inside the pod's netns, and some
>> routing trickery is used to deliver it to that netns.
>
>
> IMHO, that looks like a very specific implementation to me with a lot of
> trickery which breaks some fundamental networking rules.
>
>
>>
>>
>>
>> > On 10 January 2017 at 15:22, 'Tim Hockin' via kubernetes-sig-network

>> >> To post to this group, send email to

Tim Hockin

unread,
Jan 11, 2017, 1:07:00 PM1/11/17
to Guru Shetty, kubernetes-sig-network
On Wed, Jan 11, 2017 at 9:42 AM, Guru Shetty <guru...@gmail.com> wrote:
>
>
> On 11 January 2017 at 09:25, Tim Hockin <tho...@google.com> wrote:
>>
>> I don't have all the details, but it doesn't sound that tricky to me.
>> Having a VIP as a local address is a pretty well-understood technique,
>> and static(ish) load-balancer routes are not rocket science.
>
>
> Now I see how this works and it suddenly looks reasonable.
>
>>
>>
>> Sure, it's a particular implementation, but I don't think what they
>> are doing is out of bounds - maybe I am too lax? Assuming NAT in all
>> cases seems too restrictive.
>
>
> A NetworkPolicy's selectors is a label that selects pods on which we need to
> apply ingress network policys. So even in this case, that does not break
> anything

It doesn't break at an abstraction level, but the "obvious"
implementation of NP would select pods, extract the pod IPs, and
establish firewalls based on packet destination IPs == pod IPs. That
is obvious and works with the NAT-based kube-proxy, but would NOT work
with a VIP-based proxy. Hence this thread. I predicted that most
implementations would have done the obvious thing (as they should!),
and do not account for this style of Service VIP.

IF we want to support this style of VIP, and I do feel we probably
should, we should consider the implications of it.

a) Is it obvious enough as is? I think that is clearly no.

b) If we simply document that implementations should *also* handle
service VIPs and map them to pods, is that sufficient?

c) Can the known implementations actually accomodate this in a reasonable way?

d) Is the API actually expressing what we want, or should we consider
changes to the API?

Since we are not yet at v1, NOW is the last chance we have to really
change the API (and honestly, it may be hard given the rules around
deprecation and compat).

> A NetworkPolicy's from clause is pods and namespaces. Since the client IP
> remains as-is, it should not break anything either.
>
> Am I missing cases?
>
>
>
>
>>
>>
>> On Wed, Jan 11, 2017 at 9:15 AM, Guru Shetty <gu...@ovn.org> wrote:
>> >> The VIP is added as a local address inside the pod's netns, and some
>> >> routing trickery is used to deliver it to that netns.
>> >
>> >
>> > IMHO, that looks like a very specific implementation to me with a lot of
>> > trickery which breaks some fundamental networking rules.
>> >
>> >
>> >>
>> >>
>> >>
>> >> > On 10 January 2017 at 15:22, 'Tim Hockin' via kubernetes-sig-network
>> >> >> email to kubernetes-sig-ne...@googlegroups.com.
>> >> >> To post to this group, send email to
>> >> >> kubernetes-...@googlegroups.com.

Thomas Graf

unread,
Jan 11, 2017, 4:24:00 PM1/11/17
to Tim Hockin, kubernetes-sig-network
On 11 January 2017 at 08:58, Tim Hockin <tho...@google.com> wrote:
> On Wed, Jan 11, 2017 at 6:31 AM, Thomas Graf <tg...@suug.ch> wrote:
>> I assume the VIP would be the known by network implementations through
>> the service spec so the link can be made regardless of whether an
>> implementation uses its own tagging mechanism to attach identity to
>> the packet or not.
>
> The VIP would certainly be knowable, but it would involve the Service
> and Endpoints resources, which most NP implementations do not need to
> examine today.

Agreed. Cilium examines the service resources to perform DSR back from
nodes, east-west LB and intra node LB.

>> Will there be an indication in the service spec that a a service is to
>> deliver packets to the pod with destination IP left intact? As of now
>> we would do the service DNAT translation on the node just befor
>> delivering into the pod which would be undesirable.
>
> I would argue against any such indication. The Service abstraction is
> that we LB traffic to your pod. Whether we do that via NAT or VIP
> should not matter to your pod, as long as your pod doesn't have to be
> particularly aware of the difference. I think.

OK. If there is a service resource for the VIP then Cilium would
currently do DNAT to the pod IP on the node itself as it sees the VIP
as destination for a packet routed in the node hosting the pod. Unless
the IP in pod resource matches the VIP in which case the DNAT becomes
a NOP. I wasn't sure whether your initial statement regarding the VIP
being assigned to the pod implies that the VIP would become the
primary address of all pods for the service or whether that would be a
secondary address not visible through k8s APIs.

On this particular note. I've attempted to turn a GCE node into a
loadbalancer and preserve the SIP to allow for DSR back from workers.
I have not managed to convince the GCE firewall to not drop packets
from the LB to the worker nodes despite explicit allow rules. Given it
seems that we are not the only ones trying to get this to work, any
hints on that?

Tim Hockin

unread,
Jan 11, 2017, 4:58:36 PM1/11/17
to Thomas Graf, kubernetes-sig-network
Bring up your VMs with --can-ip-forward

>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
>>> To post to this group, send email to kubernetes-sig-network@googlegroups.com.

Tim Hockin

unread,
Jan 12, 2017, 5:30:07 PM1/12/17
to kubernetes-sig-network
Sharing with the group


---------- Forwarded message ----------
From: Tim Hockin <tho...@google.com>
Date: Wed, Jan 11, 2017 at 4:12 PM
Subject: Re: [k8s-sig-net] NetworkPolicy vs Services
To: Salvatore Orlando <salv.o...@gmail.com>


On Wed, Jan 11, 2017 at 2:08 PM, Salvatore Orlando
<salv.o...@gmail.com> wrote:
> Replies Inline.
>
> Cheers,
> Salvatore
>
> On 11 January 2017 at 19:06, 'Tim Hockin' via kubernetes-sig-network
> <kubernetes-...@googlegroups.com> wrote:
>>
>> On Wed, Jan 11, 2017 at 9:42 AM, Guru Shetty <guru...@gmail.com> wrote:
>> >
>> >
>> > On 11 January 2017 at 09:25, Tim Hockin <tho...@google.com> wrote:
>> >>
>> >> I don't have all the details, but it doesn't sound that tricky to me.
>> >> Having a VIP as a local address is a pretty well-understood technique,
>> >> and static(ish) load-balancer routes are not rocket science.
>> >
>> >
>> > Now I see how this works and it suddenly looks reasonable.
>> >
>> >>
>> >>
>> >> Sure, it's a particular implementation, but I don't think what they
>> >> are doing is out of bounds - maybe I am too lax? Assuming NAT in all
>> >> cases seems too restrictive.
>> >
>> >
>> > A NetworkPolicy's selectors is a label that selects pods on which we
>> > need to
>> > apply ingress network policys. So even in this case, that does not break
>> > anything
>>
>> It doesn't break at an abstraction level, but the "obvious"
>> implementation of NP would select pods, extract the pod IPs, and
>> establish firewalls based on packet destination IPs == pod IPs. That
>> is obvious and works with the NAT-based kube-proxy, but would NOT work
>> with a VIP-based proxy. Hence this thread. I predicted that most
>> implementations would have done the obvious thing (as they should!),
>> and do not account for this style of Service VIP.
>>
>> IF we want to support this style of VIP, and I do feel we probably
>> should, we should consider the implications of it.
>>
>> a) Is it obvious enough as is? I think that is clearly no.
>
>
> It is not obvious because while the API semantics are well-defined, there is
> no guideline for the implementation.
> I think most of us have been implicitly assuming that kube-proxy (or some
> replacement of it) would translate the VIP into a Pod IP.
> And this assumption is probably incorrect because if I'm not mistaken the
> proxy is an entirely optional component in a k8s cluster.
>
> On a side note, the fact that you used several times the term "obvious" is
> probably an indication that perhaps the community might use a sort of
> "reference" implementation. Perhaps something which assumes kube-proxy and
> bases on iptables and/or conntrack
>
>>
>>
>> b) If we simply document that implementations should *also* handle
>> service VIPs and map them to pods, is that sufficient?
>
>
> Quite. An implementation that does not support service VIPs in a k8s cluster
> where VIPs are always NAT'd is a valid implementation imho.
> Can we say that handling service VIPs is a condition sufficient but not
> necessary to be a valid k8s np implementation.

Truthfully, I am less concerned with debating whether it is "valid" or
not than I am with contemplating the API. It's not *at all* clear
from this API that traffic could arrive at a pod with any IP other
than the Pod's IP. It's not even a case of "duh, look up the
alternate IPs" - it's actually a fairly tricky transform to run. The
fact that many of the implementers here, who helped form the API,
missed this cased -- MYSELF INCLUDED -- is worrisome.

>> c) Can the known implementations actually accomodate this in a reasonable
>> way?
>
> At first glance, if I wanted to do that in OVN, I'd say that for each pod
> selected in the 'from' clauses, I'd have to lookup which services select it
> and then use their cluster IPs. But that would be for policing return
> traffic. For ingress traffic getting into the pod backing the DSR LB I see
> no difference as the source IP will be a pod IP.

It's not about the `from` it's about the `podSelector` whuch says:
"Selects the pods to which this NetworkPolicy object applies". Now,
to be an IP-based implementation with max compat (acknowledging that
not every network config is composable with every services config nor
NP config, and also that some implementations are not IP-based) you
have to also index every Service and Endpoints, and do a back-lookup
of the Pod IP and translate to Service IPs and *also* handle those.
Even though the API is devoid of teh word "service".

>> d) Is the API actually expressing what we want, or should we consider
>> changes to the API?
>
> My only question is only about whether there could be a requirement for
> 'service selectors' in from clauses; I reckon the folks that are trying to
> implement VIPs via DSR can provide some inputs on this point.

Again, not the `from`. Imagine we had full freedom to change without
compat concerns - I might propose to replace `PodSelector` with
`ServiceName` or `ServiceSelector`. This was discussed early on, it
was argued and more or less downvoted. I feel like I am resurrecting
that. I hate hate *hate* re-adjudicating stuff like this, but this
use-case seems valid to me, and I am embarrassed to not have caught
this before.

>> Since we are not yet at v1, NOW is the last chance we have to really
>> change the API (and honestly, it may be hard given the rules around
>> deprecation and compat).
>
>
> At least k8s has an API evolution strategy from day 1. That is great. My
> biggest regret for OpenStack Neutron has been the failure to define and
> implement and API evolution strategy in its early days.

Well, we'll see how useful it is, in the end.

Salvatore Orlando

unread,
Jan 12, 2017, 5:30:53 PM1/12/17
to Tim Hockin, kubernetes-sig-network
Sorry I did not realize I did reply to you alone ;)

Salvatore

On 12 January 2017 at 23:29, 'Tim Hockin' via kubernetes-sig-network <kubernetes-...@googlegroups.com> wrote:
Sharing with the group


---------- Forwarded message ----------
From: Tim Hockin <tho...@google.com>
Date: Wed, Jan 11, 2017 at 4:12 PM
Subject: Re: [k8s-sig-net] NetworkPolicy vs Services
To: Salvatore Orlando <salv.o...@gmail.com>


On Wed, Jan 11, 2017 at 2:08 PM, Salvatore Orlando
<salv.o...@gmail.com> wrote:
> Replies Inline.
>
> Cheers,
> Salvatore
>
> On 11 January 2017 at 19:06, 'Tim Hockin' via kubernetes-sig-network

>> >> >> >> To post to this group, send email to

>> >> >> >> Visit this group at
>> >> >> >> https://groups.google.com/group/kubernetes-sig-network.
>> >> >> >> For more options, visit https://groups.google.com/d/optout.
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kubernetes-sig-network" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> To post to this group, send email to

>> Visit this group at
>> https://groups.google.com/group/kubernetes-sig-network.
>> For more options, visit https://groups.google.com/d/optout.
>
>

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
To post to this group, send email to kubernetes-sig-network@googlegroups.com.

Casey Davenport

unread,
Jan 12, 2017, 8:50:47 PM1/12/17
to Salvatore Orlando, Tim Hockin, kubernetes-sig-network
Chiming in a bit late, but here are a few of my thoughts:

First, to answer the original question.   Obviously I'd want to do integration testing with specific Service implementations, but Calico's NetworkPolicy implementation should work today as-is with such a Service implementation.

As I mentioned on the call today, the abstractions this SIG has developed for NetworkPolicy are compatible with the Service abstraction, and as we've said before not all NetworkPolicy, Service, and networking implementations need to be compatible with each other.  Enforcing this is both impossible and overly limiting (we'll end up with a bunch of clones!), and so I don't think that should drive our decision making.

It's not *at all* clear from this API that traffic could arrive at a pod with any IP other than the Pod's IP

A lot of the problem comes from the Service spec not being clear about what the behavior should be in terms of modifying the destination address. Neither the Service spec nor the NetworkPolicy spec make any claims about the transform that is performed on the traffic, and I think that's the way it should be.  It gives the necessary room for lots of diverse implementations that meet different needs.

On Thu, Jan 12, 2017 at 2:30 PM Salvatore Orlando <salv.o...@gmail.com> wrote:
Sorry I did not realize I did reply to you alone ;)

Salvatore
On 12 January 2017 at 23:29, 'Tim Hockin' via kubernetes-sig-network <kubernetes-...@googlegroups.com> wrote:
Sharing with the group


---------- Forwarded message ----------
From: Tim Hockin <tho...@google.com>
Date: Wed, Jan 11, 2017 at 4:12 PM
Subject: Re: [k8s-sig-net] NetworkPolicy vs Services
To: Salvatore Orlando <salv.o...@gmail.com>


On Wed, Jan 11, 2017 at 2:08 PM, Salvatore Orlando
<salv.o...@gmail.com> wrote:
> Replies Inline.
>
> Cheers,
> Salvatore
>
> On 11 January 2017 at 19:06, 'Tim Hockin' via kubernetes-sig-network

>> >> >> >> To post to this group, send email to

>> >> >> >> Visit this group at
>> >> >> >> https://groups.google.com/group/kubernetes-sig-network.
>> >> >> >> For more options, visit https://groups.google.com/d/optout.
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kubernetes-sig-network" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> To post to this group, send email to

>> Visit this group at
>> https://groups.google.com/group/kubernetes-sig-network.
>> For more options, visit https://groups.google.com/d/optout.
>
>

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To post to this group, send email to kubernetes-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To post to this group, send email to kubernetes-...@googlegroups.com.

Tim Hockin

unread,
Jan 12, 2017, 11:37:03 PM1/12/17
to Casey Davenport, Salvatore Orlando, kubernetes-sig-network
I agree that Service does not spec the transform - that is as
intended. I don't think the NP API is intent-oriented enough, though.
After staring at this for a few days, I am really starting to feel
that either a) changing PodSelector to Service (LocalObjectReference)
or b) allow either one.

(a) is simpler
(b) is more compatible, but more complicated and emulatable entirely
in (a). Additionally, there's a push to make all such oneof blocks
carry a discriminator field, which makes this API uglier and
non-compatible. @pwittrock

Alex Pollitt

unread,
Jan 13, 2017, 1:40:21 AM1/13/17
to Tim Hockin, Casey Davenport, Salvatore Orlando, kubernetes-sig-network
Can you expand on what you mean by the "NP API is not intent-oriented enough" idea?   (And would the same idea apply to ReplicaSets for example?)

Alex Pollitt

unread,
Jan 13, 2017, 1:33:28 PM1/13/17
to Tim Hockin, Casey Davenport, Salvatore Orlando, kubernetes-sig-network

I still need to give this important topic more thought, but here’s my thinking so far.  I would appreciate if people could flag any points that I might have got wrong - because I’m sure I’ll have got some of them wrong!


Firstly, some observations about the context of the issue we are trying to solve:


  • We have a broad range of network and NP implementations for Kubernetes.  Some network and NP implementations are tightly coupled.  Some are less tightly coupled - for example, there are some implementations of NP that can be deployed on top of several different network implementations from different vendors.  (I don’t believe anyone has implemented a NP implementation that is compatible with every network implementation though.)

  • Some network and NP implementations are compatible with kube-proxy’s implementation of services.  Some aren’t and have implemented their own service implementations (kube-proxy equivalents).

  • A new implementation of services is being considered.   (I’ll call it SX in the rest of this email for short.)  

  • Some network and NP implementations are compatible with SX; some aren’t.  

  • Even without NP, some network implementations will be compatible with SX and some won’t - because it fundamentally changes what the packet looks like both when traversing the network and when being delivered to the pod.

  • I’m not clear how SX gets packets to traverse the network given the above.  My guess is it tunnels them in some way, which effectively means in one sense that it is a network implementation.  Choosing to use SX changes how traffic gets from pod-to-pod when services are being used.  If services are being used for the majority of pod-to-pod traffic then most traffic within the cluster will be via these tunnels rather than via the base network implementation that is configured via CNI.


Within the above context, we are considering whether we want to change NP definition to make it easier for (some) NP implementations to work with SX by replacing the podSelector field in NP objects with Services (LocalObjectReference).  


Some thoughts on potential complications of this:


  • For an NP implementation to be compatible with both kube-proxy and SX, it would need to not care whether traffic arrives on a pod’s IP or on a service VIP that is mapped to that pod - because it could be either depending on whether kube-proxy or SX is being used.  If the NP implementation uses dest IP to determine what traffic is allowed then it means a doubling of the number of match rules that need to be rendered by that implementation.  Probably not a big deal in terms of implementation complexity, but a bit tedious and potentially confusing.

  • Services support port mapping (mapping from the service ports to target ports on the pods).  Target ports can be named ports, which means a service may map to different target ports on different pods.  This is problematic when using SX with an NP implementation that just uses dest IP alone to determine which ingress policy to apply since it cannot determine the correct port to allow because it just sees the VIP, not the pod IP.  (This assumes that SX supports service port mapping, and makes some guesses on where that port mapping happens.)

  • Does the Service type imply anything different for the NP?  Eg ClusterIP vs NodePort Services typically have different reachability from inside/outside the cluster.  I think this is probably live with as “it will depend on your network, NP, and services implementations”, though might be a little confusing in some cases.

  • The cost across the networking community of this change could be some number of man years.  I’m guessing some implementations may be very easy to adapt, others not so much.  If we replace podSelector with Service then that work is effectively mandatory for everyone who wants to support any NP.  We need to consider the user community too and how long we need to support the current beta object and how people migrate from that to Service based.

  • It might be significantly less impactful (on implementors and users) to introduce a new serviceSelector field alongside the existing podSelector field, with the constraint that you can only specify one of these for each NP object.  “One of” field constraints are used elsewhere in k8s objects (e.g. Probe objects) so this wouldn’t be a first.  We could decouple the introduction of this new feature from graduating the current beta object, give us more time to work through all the consequences, and decide whether this should be a mandatory or optional feature of NP implementations.

  • Looking ahead to the future and egress policy, should rules also use Services instead of label selectors?  In an SX environment egress rules could well have the same issue as we are considering now for ingress rules where VIP hides the destination pod IP.

  • There was some discussion of “non-obviousness” around the interactions between NP implementations and SX.  I’m not sure that changing NP definition significantly reduces “non-obviousness” given all these subtle interactions.  And there’s probably a whole bunch of things we haven’t thought of yet.


I’m sure there will be a lot more discussion on this topic, but I’m leaning towards not trying to solve this all now, and instead leaving open the option of adding serviceSelector alongside podSelector (as a “one of” field) in the future if we determine that SX needs to work with more network and NP implementations than it does today and we think this will help.


Casey Davenport

unread,
Jan 13, 2017, 3:50:17 PM1/13/17
to Alex Pollitt, Tim Hockin, Salvatore Orlando, kubernetes-sig-network
Wanted to add a complication to the list Alex provided above.

NetworkPolicy currently can only apply to Kubernetes Pods (through the podSelector field).  Services are a bit different, in that they don't always select Pods, and can sometimes have a manually provided set of endpoints that may be Pods, resources external to Kubernetes, other Services, or even some combination.

This means that changing from podSelector to serviceSelector is not just a change of "where do I get my IPs from", but it's also a fundamental change in the set of endpoints/IPs that can possibly be selected by a NetworkPolicy (one that seems fairly hard to implement).



Thomas Graf

unread,
Jan 13, 2017, 4:57:58 PM1/13/17
to Casey Davenport, Alex Pollitt, Tim Hockin, Salvatore Orlando, kubernetes-sig-network
Thanks for the excellent write up Alex and Casey. I agree that a
change from podSelector to serviceSelector is invasive and complicated
but I also feel somewhat unsatisfied with the current abstraction. I
always wondered who is responsible for keeping the selector of a
service synchronised with the podSelector of a NP resource. Tying a NP
to all pods of a service would seem very natural from that
perspective.

OTOH, I agree with Casey that this may result in user expectations
which a NP may not be able to meet. Obvious examples as Casey points
out are external endpoints or a service with a service endpoint in
another namespace. I guess the redirection to another service case is
doable for a service aware NP implementation but the external endpoint
case is very difficult to meet for any NP implementation unless in
full control of all networking.

Rejection of NP resources pointing to service resources which are not
using selectors is not an option either as a service might be updated
after the NP has been inserted. Would it be an option instead to have
the service resource point to the NP resource by name, i.e. a service
defines which NP applies? In that case the API gateway could reject
any service with non-pod endpoints if the service resource also
selects a NP resource. Initially NP selection via service would be
restricted to services selecting pods and can later be extended into
supporting service to service redirection and finally external
endpoints if that makes sense.

On the subject of the flexibility of a NP implementation to support
the existing service model and what Alex referred to as SX. I think
this should be left up to NP implementations. If a NP implementation
can't support it, it is incompatible with the SX service
implementation. I doubt that every NP implementation will be capable
of supporting every service implementation. Personally I think this is
a strong example that decoupling identity and addressing is essential
going forward. Several implementations already recognise this.

Thomas

Casey Davenport

unread,
Jan 13, 2017, 7:33:10 PM1/13/17
to Thomas Graf, Alex Pollitt, Tim Hockin, Salvatore Orlando, kubernetes-sig-network
always wondered who is responsible for keeping the selector of a service synchronised with the podSelector of a NP resource

It's always felt clear to me that this is the same entity that is responsible for keeping say, a ReplicaSet selector synchronized with a Service selector.

Over time it is likely that higher order abstractions will be built that, for example, manage a Deployment, a Service, and a NetworkPolicy in unison. To facilitate a wide range of use-cases and these higher abstractions, Kubernetes needs to provide the correct "atoms" if you will (in this case ReplicaSet, Service, NetworkPolicy) with which to compose the various higher-order functions.

Thomas Graf

unread,
Jan 13, 2017, 7:57:36 PM1/13/17
to Casey Davenport, Alex Pollitt, Tim Hockin, Salvatore Orlando, kubernetes-sig-network
I'm not sure I fully agree with that. The loose coupling of RS vs
service makes total sense as the role deploying a RS may differ from
exposing it as a service. OTOH, allowing to tie service with NP seems
natural to me. You absolutely don't want to expose a service publicly
if you can't control the NP that is being applied to any traffic from
the first packet that is being received for that service. If the NP is
out of sync on some nodes, packets will be dropped randomly. A higher
level resource could provide this but what is the benefit of that
versus just allowing a service to select the NP?

Bernard Van De Walle

unread,
Jan 13, 2017, 8:56:17 PM1/13/17
to Casey Davenport, Thomas Graf, Alex Pollitt, Tim Hockin, kubernetes-sig-network, Salvatore Orlando
The main thing I take away from this discussion is that a PodSelector is fundamentally different than a ServiceSelector.

- With a PodSelector, the NP is defined "end to end" (As it selects the source pods and the destination pods based on labels).
- With a ServiceSelector, the endpoints pod information is mostly lost (Or multiple queries on the APIs would be needed to reconstruct that information). Also as Casey highlighted, a service is not limited to pods, it could go to other endpoints. 

There is definitely value to be able to have an extra ServiceSelector, but I think they are made for different use-cases. PodSelectors also allow you to implement NetworkPolicies without the L3-L4 network information (Our implementation with Trireme), which is way more difficult with only a ServiceSelector.

I would also believe that introducing a “one of” option later on between a PodSelector and ServiceSelector would be a good way to resolve this and be able to support multiple use-cases. It opens the door to confusion around what the policy will exactly select though.

Bernard

Alex Pollitt

unread,
Jan 14, 2017, 4:22:05 PM1/14/17
to Thomas Graf, Casey Davenport, Tim Hockin, Salvatore Orlando, kubernetes-sig-network
The topic of roles is always a fun discussion with no obviously right answer I think.

You could argue that if someone has permissions to deploy a pod then them having permission to restrict what traffic that pod is expecting to receive is quite natural.  "This is my new microservice I just finished coding; it should only receive connections on port 443".  Deploying a pod without a Service that maps to that pod does not eliminate those security requirements.  

In either approach I think it is more a property of the implementation that controls what happens to that first packet than it is a property of the objects exposed to the user.  Both approaches have the ability to create objects in an order that is logically clear in terms of isolation expected by the user.

When it comes to imagining which user role will be wanting to which operations I think flexibility is valuable.  Different organizations will have different role and permission requirements.  That makes me generally lean towards separated atoms being a good thing.

Thomas Graf

unread,
Jan 14, 2017, 7:11:31 PM1/14/17
to Alex Pollitt, Casey Davenport, Tim Hockin, Salvatore Orlando, kubernetes-sig-network
Just to be clear, I'm not advocating to replace the podSelector with a
serviceSelector. The use case of an NP on a RS without service is
obviously valid. The NP selection from service would be in addition to
allow making the link between service and NP to consider the SX model
which initiated this discussed.

Do you have an alternative suggestion?

Alex Pollitt

unread,
Jan 15, 2017, 1:50:10 PM1/15/17
to Thomas Graf, Casey Davenport, Tim Hockin, Salvatore Orlando, kubernetes-sig-network
Thanks for clarifying Thomas.

There are a bunch of complexities we've been discussing in this thread, but the two that were on my mind this morning when I work up were:
  • Assuming SX is aimed at being a full Services implementation then I think there is a constraint that the NP implementation has to be using the interface or something else other than the dest IP to determine which policy to apply because a single service VIP may map through to target pods with different ingress port requirements.  
  • I think SX will undoubtedly also put constraints on the network implementation in order to get packets via service VIPs across the fabric to the target pod. (Or in some ways SX could be regarded as a network implementation itself if it is tunneling packets to get them to the desired pod.)
I doubt there is a change to the NP object definition that we could make to resolve the above constraints without introducing other constraints that are equally problematic for a range of existing network and NP implementations.  Therefore I think I would advocate for leaving things as is in terms of NP object definitions, at least until the constraints the SX implementation puts on network and NP implementations are better understood.  I would not hold up the graduation of the current NP object from beta behind this, so long as we have a fall back position which is "we can add serviceSelector as an optional additional alternative to podSelecter in the future if we want to".

The folks working on SX can work with the existing network and NP implementor community to help build some clarity on the constraints, hopefully minimize the constraints, and identify enough compatible network and NP implementations that the SX ideas still fly.  My expectations is there is no amount of NP object definition tweaking that will mean SX is compatible with all network and all NP implementations.

As a final idea, the ability to give a pod a specific IP or an additional specific IP is something that we’re hearing as a requirement from users reasonably often now.  Often to support the ability to ingress to a cluster without going through NAT or other shenanigans.  e.g. To put a public IP on a pod.  Or to put a service VIP directly on one or mode pods.  Perhaps this is an alternative direction to explore to allow some of the things SX is trying to do to be more explicitly represented in the K8s APIs?

Tim Hockin

unread,
Jan 26, 2017, 4:56:44 PM1/26/17
to Alex Pollitt, Thomas Graf, Casey Davenport, Salvatore Orlando, kubernetes-sig-network
Sorry for the delinquent replies. Sigh.

> * Some network and NP implementations are compatible with SX; some aren’t.

The problem for me is less about "isn't compatible" and more about
"isn't expressible".

> * I’m not clear how SX gets packets to traverse the network

I'm assuming it is just routing. I find an analog in Google's
internal load-balancer, even though it is not an appropriate
replacement for kube-proxy, it operates in a similar way. This whole
discussion has me thinking about ways to do kube-proxy with less NAT
on GCE, too. I feel like it is just outside my grasp...

> Within the above context, we are considering whether we want to change NP definition to make it easier for
> (some) NP implementations to work with SX

IMO s/easier/possible

> Services support port mapping

Yeah, I thought about this, too. Pretty clearly it's more complex to
port map in this way, if it is even possible. There is still the
option to say "if you don't remap you get fast-path routing".

> The cost across the networking community of this change could be some number of man years.

To earlier statements of "Some network and NP implementations are
compatible with SX; some aren’t.", it seems legitimate to make the
change in (e.g. Calico) be "resolve Service IP/port to Endpoint
IPs/ports and do exactly what happens today". You wouldn't support
SX, but NP for SX would at least be implementable.

>> I always wondered who is responsible for keeping the selector of a service synchronised with the podSelector of a NP resource

> It's always felt clear to me that this is the same entity that is responsible for keeping say, a ReplicaSet selector synchronized
> with a Service selector.

Except very often a Deployment or ReplicaSet is intentionally
DIFFERENT from a Service selector. Service chooses "app = foo", but
multiple deployments might cover "app = foo; color = blue" vs ""app =
foo; color = green". I don't expect that split wrt NP and Services.

> The use case of an NP on a RS without service is obviously valid

It's possible to emulate this by adding a Service to cover the RS.
It's not possible to go from a selected pod up to the Service IP and
not also have other pods selected.


On reading all of this, and given the general unavailability of an
ACTUAL implementation of "the thing Alex dubbed SX", we should NOT
block or fundamentally change the NP API in this regard.

We can consider as a separate point whether we want to *add* support
for ServiceName as a on-of with PodSelector. As one of the people who
wanted that in the first place, it is appealing to me because it makes
more sense to me anyway. This would not mean every NP implementation
has to be compatible with every Services implementation - we just need
to specify the guaranteed semantics, and everything else is a compat
matrix.

Given that we are considering this as a one-of, we may yet need to
tweak the API in accordance with
https://github.com/kubernetes/community/pull/278#pullrequestreview-18291618
but we can consider that as a followup to this discussion.


Thoughts?
Thoughts
Reply all
Reply to author
Forward
0 new messages