Promoting the Service Node Exclusion Feature

156 views
Skip to first unread message

Andrew Kim

unread,
Jan 24, 2019, 8:50:03 PM1/24/19
to kubernetes-sig-network
Hi SIG Network! 

Wanted to follow up from the discussion we had in today's SIG meeting regarding the Service Node Exclusion feature (excluding nodes from LB backend via labels).

Here are some potential next steps I see for this feature:
1) Continue to support the feature as-is. This would mean switching the label key from alpha.service-controller.kubernetes.io/exclude-balancer to service-controller.kubernetes.io/exclude-balancer (we would support the alpha label for X number of releases before removing it). 
2) Add a method to the cloud provider LB interface where the provider can filter the set of nodes that should be passed onto the load balancer. The existing label can be a common label checked by all cloud providers but not strictly required. Could maybe remove the need for this to be a feature at all?

Another thing to consider is that service controller, in addition to checking for alpha.service-controller.kubernetes.io/exclude-balancer, also checks for the label node-role.kubernetes.io/master. By promoting this feature to beta/GA we can remove the need for checking the master label and expect users to set the exclude-balancer label on control plane nodes themselves if desired (related issue: https://github.com/kubernetes/kubernetes/issues/65618). 

Thoughts? 

- Andrew Sy Kim

Tim Hockin

unread,
Feb 14, 2019, 6:28:09 PM2/14/19
to Andrew Kim, Brendan Burns, kubernetes-sig-network
I have a bunch of thoughts.

For historical context:  A bunch of logic exists in ServiceController which really should have been cloud-specific.  We figured that all KNOWN providers were the same, so we put stuff like this in the main service portion rather than each cloud provider, because DRY, amiright?

Now things have changed.  Cloud Providers are all moving out of tree, and there exist a number of fully out-of-tree (never were in-tree) providers.  They are free to implement LBs however they want.  

If we promote this to `service-controller.kubernetes.io/exclude-balancer` we're assert that something called "service-controller" exists and implying that all clouds should or must support this.  But some providers don't bounce LB through nodes at all!

So in terms of paths forward:

1) deprecate this and let each cloud-provider define a specific one, if they want to support it

OR

2) rename it to something a little saner (e.g. networking.kubernetes.io/exclude-service-load-balancer) and...
  a) ...document it as optional for cloud-providers
  b) ...document it as required for cloud-providers that use the node for load-balancing

OR 

3)  ???

Thoughts?  Brendan, you added this, so your voice in particular would be nice.


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To post to this group, send email to kubernetes-...@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-sig-network.
For more options, visit https://groups.google.com/d/optout.

Andrew Kim

unread,
Feb 15, 2019, 1:00:01 AM2/15/19
to kubernetes-sig-network
Would a new `LoadBalancer` kind (in the networking group or as a CRD) + new controller be a viable option for 3)? 

Some benefits I see to this approach:
* decouple cloud provider specific details from Services. (even though we would have to support Type=LoadBalancer forever)
* rename service-controller -> loadbalancer-controller (service-controller is not a great name for this given it only does LB provisioning for Services with Type=LoadBalancer). 
* embed more complex logic into LBs beyond node exclusion (with versioning)
* only support it out-of-tree so it can be used as a forcing function to move cloud providers out-of-tree.

Definitely more work/time involved in this but I feel it makes more sense long term. 
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.

Tim Hockin

unread,
Feb 15, 2019, 1:36:23 AM2/15/19
to Andrew Kim, kubernetes-sig-network
That may be the longer-term plan. We're noodling around what a
revised Service API would look like. We're starting with some
concrete scalability problems in Endpoints. I don't know that this is
a 2019 thing though, or at least not "done" in 2019.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
>>> To post to this group, send email to kubernetes-...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.

Andrew Kim

unread,
Feb 15, 2019, 4:35:49 PM2/15/19
to Tim Hockin, kubernetes-sig-network
Noted. In that case, I would be willing to do the work to move forward with 1 or 2 for now (personally I prefer 2) and at least start brainstorming about "LBs as it's own resource" as a longer-term goal. I'm assuming that the revised Service API would not leak provider specific information like LBs so we would likely want to manage this in a separate resource anyways. 

Either way, I would like to explore "LoadBalancer as it's own resource" idea sooner rather than later because it would allow us to address existing problems with LBs that would otherwise require handling complicated cases in production clusters such as:
* better LB naming (https://github.com/kubernetes/kubernetes/issues/69293https://github.com/kubernetes/kubernetes/issues/29789)
* not excluding the control plane by default (https://github.com/kubernetes/kubernetes/issues/65618) - this one can actually be done today but I'm hesitant for reasons
* topology aware load balancers / excluding nodes from LBs on a per-service basis (right now if we label a node, it gets excluded from all LBs which can be undesirable).

Do we think the issues above are worth addressing with the current state of Services or should we delay addressing these until we have new APIs to work with?

P.S. if there's a doc circulating around with some initial thoughts/ideas around the revised Service API I would love to see it :) 

Sandor Szuecs

unread,
Feb 16, 2019, 3:01:50 AM2/16/19
to Andrew Kim, Tim Hockin, kubernetes-sig-network
Hi!

On Fri 15. Feb 2019 at 22:35 Andrew Kim <kim.an...@gmail.com> wrote:

Either way, I would like to explore "LoadBalancer as it's own resource" idea

There was already a document by alibaba, IIRC, that tried to add a loadbalancer resource.

sooner rather than later because it would allow us to address existing problems with LBs that would otherwise require handling complicated cases in production clusters such as:
* better LB naming (https://github.com/kubernetes/kubernetes/issues/69293https://github.com/kubernetes/kubernetes/issues/29789)
* not excluding the control plane by default (https://github.com/kubernetes/kubernetes/issues/65618) - this one can actually be done today but I'm hesitant for reasons
* topology aware load balancers / excluding nodes from LBs on a per-service basis (right now if we label a node, it gets excluded from all LBs which can be undesirable).

What do you mean by topology in this case?

Best, Sandor 
--
Sandor Szücs | 418 I'm a teapot

Tim Hockin

unread,
Feb 19, 2019, 4:54:12 PM2/19/19
to Andrew Kim, kubernetes-sig-network
> Either way, I would like to explore "LoadBalancer as it's own resource" idea sooner rather than later because it would allow us to address existing problems with LBs that would otherwise require handling complicated cases in production clusters such as:
> * better LB naming (https://github.com/kubernetes/kubernetes/issues/69293 & https://github.com/kubernetes/kubernetes/issues/29789)
> * not excluding the control plane by default (https://github.com/kubernetes/kubernetes/issues/65618) - this one can actually be done today but I'm hesitant for reasons
> * topology aware load balancers / excluding nodes from LBs on a per-service basis (right now if we label a node, it gets excluded from all LBs which can be undesirable).
>
> Do we think the issues above are worth addressing with the current state of Services or should we delay addressing these until we have new APIs to work with?
>
> P.S. if there's a doc circulating around with some initial thoughts/ideas around the revised Service API I would love to see it :)

There's not a doc yet. I have some notes but not anything that is
comprehensible. You know what, that sucks. I'll write up my notes.
The real problem here is that I am SWAMPED right now and getting 100
comments on the ideas is not going to free up time to address them.
Still, I will share.

https://docs.google.com/document/d/1Uyriyu2LgXPyJInEs0emuJo162IwJhzPlUewKCa5caQ/edit

Andrew Kim

unread,
Feb 19, 2019, 7:36:48 PM2/19/19
to Tim Hockin, kubernetes-sig-network
Thanks for sharing Tim, really appreciate it! 

Andrew Kim

unread,
Feb 19, 2019, 7:41:44 PM2/19/19
to Sandor Szuecs, Tim Hockin, kubernetes-sig-network
By "topology" I was thinking something like "add only pods/nodes with label X to this load balancer's node pool". I guess we could do this today with `externalTrafficPolicy: Local` and use node selectors but I think there's potential value in being able to specify this on the LB resource. 

Tim Hockin

unread,
Feb 19, 2019, 8:03:17 PM2/19/19
to Andrew Kim, Sandor Szuecs, kubernetes-sig-network
On Tue, Feb 19, 2019 at 4:41 PM Andrew Kim <kim.an...@gmail.com> wrote:
>
> By "topology" I was thinking something like "add only pods/nodes with label X to this load balancer's node pool". I guess we could do this today with `externalTrafficPolicy: Local` and use node selectors but I think there's potential value in being able to specify this on the LB resource.

I'm not super convinced about the utility of this. LB mechanisms
should be invisible to users. externalTrafficPolicy is a horror show,
but only because we simply CAN'T make a "correct" decision without
user input.

Sandor Szuecs

unread,
Feb 20, 2019, 2:34:31 AM2/20/19
to Tim Hockin, Andrew Kim, kubernetes-sig-network
Tim,
Do you have some links to the mentioned problems that are identified, because without a clear problems statement without identified goals, I don’t think it will be possible to enhance something without creating other problems.

Best, Sandor 

Tim Hockin

unread,
Feb 20, 2019, 4:52:16 PM2/20/19
to Sandor Szuecs, Andrew Kim, kubernetes-sig-network
On Tue, Feb 19, 2019 at 11:34 PM Sandor Szuecs <sandor...@zalando.de> wrote:
>
> Tim,
> Do you have some links to the mentioned problems that are identified, because without a clear problems statement without identified goals, I don’t think it will be possible to enhance something without creating other problems.

There are a few issues open on Services. I don't have time to dig
right now (sorry) but here are a few off the top of my head:

Service conflates a frontend-spec with a backend grouping system.

There are many fields that are about load balancers, but LBs are
really a distinct concept from Services.

The mish-mash of LB `type` intersected with selector vs no-selector
intersected with cluster IP vs "none" is a mess.

Endpoints don't scale well as a plural resource


Please take a look at my doc -- it's a little rambling, but I think
you'll see what I am thinking about.


Tim

Andrew Kim

unread,
Feb 22, 2019, 7:04:39 PM2/22/19
to kubernetes-sig-network
Hi SIG Network,

I wanted to follow-up on yesterday's discussion around Service Node Exclusion. There were a few concerns raised about moving forward with this feature:

* Promoting Service Node Exclusion also implies that master nodes will be automatically added to LBs (this is desired). Should we also automatically include nodes that are marked unschedulable? I think the consensus was yes. 
* If we do automatically allow masters, does that put master nodes at risk of self DoS in production because we are suddenly adding load from LB traffic there? 
* How much of this logic do we want to standardize and how much should we delegate to providers? I think longer term we want to delegate more to the providers but working with what we have today we may still need to support something standardized. 
* With v1.14 code freeze coming up, we won't have adequate time to test this change.

The high-level solution that we had discussed which accounts for the concerns above was:
* Provide a standardized label that is read by service controller (i.e. common label for any cluster that uses core controllers for LBs). I think Tim's earlier suggestion of something like networking.kubernetes.io/exclude-service-load-balancer is good. This would also mean supporting the alpha label for X number of releases.
* To preserve backwards compatibility with existing behavior of excluding master/unscheduable nodes, we can add a `FilterNodes` method to the LoadBalancer interface and have existing providers call out to a common `DefaultFilterNodes` function that resembles today's logic. This shifts the responsibility of when to change the filter logic to each provider (giving them time to add the exclude node label to master nodes if they want to keep this) while also allowing new providers to filter nodes however they wish.

If we're happy with the solution proposed above, I can get started on the KEP and aim to have something ready before code freeze. I understand that timing is a bit tight so I have no objections holding this off until v1.15. 

Thanks,

Andrew Sy Kim 
>> >> >>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
>> >> >>> To post to this group, send email to kubernetes-...@googlegroups.com.
>> >> >>> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>> >> >>> For more options, visit https://groups.google.com/d/optout.
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
>> >> > To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
>> >> > To post to this group, send email to kubernetes-...@googlegroups.com.
>> >> > Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>> >> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.

Bowei Du

unread,
Feb 23, 2019, 2:18:57 AM2/23/19
to Andrew Kim, kubernetes-sig-network
Some of my thoughts inline below:

On Fri, Feb 22, 2019 at 4:04 PM Andrew Kim <kim.an...@gmail.com> wrote:
Hi SIG Network,

I wanted to follow-up on yesterday's discussion around Service Node Exclusion. There were a few concerns raised about moving forward with this feature:

 
* Promoting Service Node Exclusion also implies that master nodes will be automatically added to LBs (this is desired). Should we also automatically include nodes that are marked unschedulable? I think the consensus was yes. 
 
Isn't that filtering behavior for the master separate from the annotation?


The code for exclusion includes role master, unschedulable and the annotation. We would have to change more than service exclusion -- we will also have to have introduce the label on existing master to avoid change of behavior for existing installs.
 
* If we do automatically allow masters, does that put master nodes at risk of self DoS in production because we are suddenly adding load from LB traffic there? 

Yes, this can have unexpected consequences, for example, security issues due to exposing master to traffic.
 
* How much of this logic do we want to standardize and how much should we delegate to providers? I think longer term we want to delegate more to the providers but working with what we have today we may still need to support something standardized. 

If our final aim is to delegate to cloud providers, then wouldn't we not want to standardize this? As standardization will require a deprecation cycle to remove.
 
* With v1.14 code freeze coming up, we won't have adequate time to test this change.

IMHO, this is definitely worth getting a KEP and holding until the 1.15, as it has potential for non-trivial semantic change (VMs added to LBs). It would be good to gather feedback on a KEP before implementing.

Thanks,
Bowei




===
>> >> >>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
>> >> >>> To post to this group, send email to kubernetes-...@googlegroups.com.
>> >> >>> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>> >> >>> For more options, visit https://groups.google.com/d/optout.
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
>> >> > To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
>> >> > To post to this group, send email to kubernetes-...@googlegroups.com.
>> >> > Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>> >> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
>> To post to this group, send email to kubernetes-...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> Sandor Szücs | 418 I'm a teapot
>

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.

Andrew Kim

unread,
Feb 23, 2019, 3:26:16 PM2/23/19
to Bowei Du, kubernetes-sig-network
Isn't that filtering behavior for the master separate from the annotation?
 
See https://sourcegraph.com/github.com/kubernetes/kubernetes@master/-/blob/pkg/controller/service/service_controller.go#L588.
 
The code for exclusion includes role master, unschedulable and the annotation. We would have to change more than service exclusion -- we will also have to have introduce the label on existing master to avoid change of behavior for existing installs.

The filtering behavior is separate, but ideally we would couple the changes together (promote service node exclusion & include masters by default) because excluding masters by default has proven to be undesirable. The only reason we can't change this behavior today is because we don't have a "stable" option to exclude masters. Promoting service node exclusion would provide that option. 

Yes, this can have unexpected consequences, for example, security issues due to exposing master to traffic.

Agreed, this is where the `FilterNodes` call would come in handy, to ensure existing providers that relied on this behavior for legacy purposes can still opt into excluding masters for X number of releases until they apply the service node exclusion logic to masters (if that is what they want).

If our final aim is to delegate to cloud providers, then wouldn't we not want to standardize this? As standardization will require a deprecation cycle to remove. 

Ideally yes, but I don't think we're at a point where we can fully rely on providers to own this yet. Providers rely heavily on upstream to "do the right thing" and I think we need to take incremental steps in delegating more decision to providers. This in addition to users expecting a standard label across providers already (via today's alpha label) is why I think a standardized label would be a good transitional step forward (even if it means future deprecation, though I think this would be many releases away).

IMHO, this is definitely worth getting a KEP and holding until the 1.15, as it has potential for non-trivial semantic change (VMs added to LBs). It would be good to gather feedback on a KEP before implementing.

Sounds good to me! I'll start drafting the initial KEP and have it ready for review after v1.14 and to be implemented for v1.15 if there are no objections. Thanks Bowei! 

Tim Hockin

unread,
Feb 26, 2019, 12:35:47 PM2/26/19
to Andrew Kim, Bowei Du, kubernetes-sig-network
Any change in default behavior here is significant. If we suddenly
allow master nodes and unschedulable nodes into LB pools, that is a
MAJOR change and needs to be opt-in before it is opt-out (if ever).

The existing feature is about excluding non-master nodes from LB.
You're really asking about INCLUDING master nodes. Those are very
different things.

The problem is that we don't really know the implications of changing
this. Even on a single provider, there isn't one true answer (e.g.
GCP and GKE use the same provider).

Even if we push this onto providers (which seems correct from a
who-owns-it perspective, but maybe obnoxious from a who-uses-it
perspective) the inclusion of masters is markedly different than the
exclusion of nodes. Concretely, we can't just suddenly announce
"masters are now included" without some transition. During that
transition, we need to make sure users get to choose. In other words,
we need a flag or something that says
"default-filter-nodes-for-service-lb", which defaults to true. We can
announce intention to change that default in 6 months, but in the
interim we have to respect existing configs.

I hate to make this hard, but we really have 2 questions in play, and
one is harder than the other. Maybe we should handle them as 2 KEPs.

Andrew Kim

unread,
Feb 26, 2019, 1:26:37 PM2/26/19
to kubernetes-sig-network
Even if we push this onto providers (which seems correct from a 
who-owns-it perspective, but maybe obnoxious from a who-uses-it 
perspective) the inclusion of masters is markedly different than the 
exclusion of nodes.  Concretely, we can't just suddenly announce 
"masters are now included" without some transition.  During that 
transition, we need to make sure users get to choose.  In other words, 
we need a flag or something that says 
"default-filter-nodes-for-service-lb", which defaults to true.  We can 
announce intention to change that default in 6 months, but in the 
interim we have to respect existing configs. 

I have more thoughts/alternatives on this (i.e. maybe we need a "filter mode" label/field with
the options "allowAll", "excludeMasters", etc??) but I'll defer that discussion to the KEP. 

I hate to make this hard, but we really have 2 questions in play, and 
one is harder than the other.  Maybe we should handle them as 2 KEPs. 

This is fair, will break this up into 2 KEPs :)


Thanks Tim! 
>>>> >> >> >>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
>>>> >> >> >>> To post to this group, send email to kubernetes-...@googlegroups.com.
>>>> >> >> >>> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>>>> >> >> >>> For more options, visit https://groups.google.com/d/optout.
>>>> >> >> >
>>>> >> >> > --
>>>> >> >> > You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
>>>> >> >> > To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
>>>> >> >> > To post to this group, send email to kubernetes-...@googlegroups.com.
>>>> >> >> > Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>>>> >> >> > For more options, visit https://groups.google.com/d/optout.
>>>> >>
>>>> >> --
>>>> >> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
>>>> >> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
>>>> >> To post to this group, send email to kubernetes-...@googlegroups.com.
>>>> >> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>>>> >> For more options, visit https://groups.google.com/d/optout.
>>>> >
>>>> > --
>>>> > Sandor Szücs | 418 I'm a teapot
>>>> >
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
>>> To post to this group, send email to kubernetes-...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/kubernetes-sig-network.
>>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-network+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages