Intersection between TrafficPolicy and Topology

Rob Scott

unread,

Mar 2, 2021, 6:58:10 PM3/2/21

to kubernetes-sig-network

Hey Everyone,

I've been trying to figure out the best way to provide an opt-in for topology aware routing. As we went through the KEP, we explored a number of options before eventually linking it to TrafficPolicy at the last minute. As I've thought about it more, I'm not completely sure that was the right decision. There's been some discussion on Slack and GitHub about what makes the most sense, but I'd like to try to reach a final decision soon since code freeze is approaching awfully quickly. Here are a couple ways we could model this on a Service:

1. New Topology field on Service

externalTrafficPolicy: Local | Cluster

internalTrafficPolicy: Local | Cluster

topology: Auto | Disabled

Downsides

- May not be obvious that topology field only has an effect when trafficPolicy == Cluster

- Requires a new API field

2. New Topology values for Traffic Policy fields

externalTrafficPolicy: Local | Cluster | Topology

internalTrafficPolicy: Local | Cluster | Topology

Downsides

- A new PreferLocal TrafficPolicy option would be ambiguous if no local endpoints were available. Should it fall back to topology or cluster?

- If something like externalTrafficPolicy=Cluster combined with internalTrafficPolicy=Topology became common, it could significantly increase the number of iptables rules.

- This feature becomes closely linked to the InternalTrafficPolicy feature.

3. Punt on config until 1.22, if feature gate is on, feature is enabled for all Services

Not ideal, but possible.

Downsides

- Pushes back when this feature could realistically graduate to beta by 1 release cycle.

I think I'm leaning towards option 1 here, but I wanted to check in with the rest of the SIG to see if anyone had a preference here.

Thanks!

Rob

antonio.o...@gmail.com

unread,

Mar 3, 2021, 11:55:48 AM3/3/21

to kubernetes-sig-network

Hi Rob,

let me try to summarise my understanding so you can correct me and, hopefully, I can provide more context to the audience:

The current intersection is about the following KEPs:

* KEP Service Traffic Policy

https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2086-service-internal-traffic-policy

* KEP Topology Aware Hints

https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2433-topology-aware-hints

This is a link to one of the discussions on how to "handle the intersection" between both KEPS:

https://github.com/kubernetes/enhancements/pull/2441#discussion_r571938384

worth reading Tim Hockin comment here:

https://github.com/kubernetes/enhancements/pull/2441#discussion_r572294066

These are some related and important events in the "Kubernetes Topology history":

* Topology Aware Subsetting was the initial proposal to solve Topology, but it was superseded by current Service Traffic Policy

https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2004-topology-aware-subsetting

https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2030-endpointslice-subsetting

* sig-network mailing list Topology discussion

https://groups.google.com/g/kubernetes-sig-network/c/wXd1D_fKjqU

* The EndpointSlices Topology field deprecation has been announced in 1.20 and a new field NodeName was added

https://github.com/kubernetes/kubernetes/pull/96440

https://github.com/kubernetes/enhancements/pull/2367

The Service object already has a related field that is "strongly" tied to NodePort and LoadBalancer types:

ExternalTrafficPolicy: Cluster | Local

Current KEP proposal is to add a new field TrafficPolicy to Service.Spec with current values (the name was not decided so I assume that this is what you are referring as internalTrafficPolicy) :

TrafficPolicy: Cluster | Topology | PreferLocal | Local

After all your comments, all of them seem to have a lot of downsides and the combinations complex to understand.

Thinking as an user, do I ever need to use both at the same time?

both seems to address very specific use cases and maybe trying to generalise is not a good thing-

Is an option to make topology awareness and traffic policy exclusive?

An additional question I have is about the KEP "Tracking termination endpoints", is it affected by this decision, the use case described in the KEP is about handling terminating endpoints that use "externalTraffiPolicy=Local".

https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1672-tracking-terminating-endpoints

Rob Scott

unread,

Mar 3, 2021, 1:04:19 PM3/3/21

to antonio.o...@gmail.com, kubernetes-sig-network

Thanks for the great summary Antonio! You've added a lot of great context here. Especially the references to earlier GitHub discussions about this. Each of these fields and KEPs make sense in isolation, I'm just trying to make sure the combination of them actually results in something sensible.

After all your comments, all of them seem to have a lot of downsides and the combinations complex to understand.

Completely agree.

Thinking as an user, do I ever need to use both at the same time?

No one would ever need to use both at the same time, but I can imagine the combination of externalTrafficPolicy=Local and topology=Auto being fairly popular. (That could alternatively be configured as externalTrafficPolicy=Local and internalTrafficPolicy=Topology).

both seems to address very specific use cases and maybe trying to generalise is not a good thing-
Is an option to make topology awareness and traffic policy exclusive?

I think there are going to be compelling combinations like the example above that use both. Within the same level (external or internal) I think topology and traffic policy are relatively exclusive. For example, externalTrafficPolicy=Local makes any topology config irrelevant for external traffic. On the other hand, if we were to add support for a PreferLocal traffic policy, a separate topology field would be useful in determining what happened when no local endpoints were available (fall back to topology or cluster).

An additional question I have is about the KEP "Tracking termination endpoints", is it affected by this decision, the use case described in the KEP is about handling terminating endpoints that use "externalTraffiPolicy=Local".

Good question. I don't think this is affected, but Andrew could confirm.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/857f8118-2711-401b-b953-b6f966b7bd04n%40googlegroups.com.

Tim Hockin

unread,

Mar 3, 2021, 1:32:32 PM3/3/21

to Rob Scott, antonio.o...@gmail.com, kubernetes-sig-network

On Wed, Mar 3, 2021 at 10:04 AM 'Rob Scott' via kubernetes-sig-network <kubernetes-...@googlegroups.com> wrote:

Thanks for the great summary Antonio! You've added a lot of great context here. Especially the references to earlier GitHub discussions about this. Each of these fields and KEPs make sense in isolation, I'm just trying to make sure the combination of them actually results in something sensible.

After all your comments, all of them seem to have a lot of downsides and the combinations complex to understand.
Completely agree.

Thinking as an user, do I ever need to use both at the same time?
No one would ever need to use both at the same time, but I can imagine the combination of externalTrafficPolicy=Local and topology=Auto being fairly popular. (That could alternatively be configured as externalTrafficPolicy=Local and internalTrafficPolicy=Topology).

To emphasize: `externalTrafficPolicy=Local` generally has more to do with "I need the client IPs preserved" than "I want good performance". This is very different from `internalTrafficPolicy=Local` which seems to mean "I care about performance" or even "the same-node is semantically meaningful". So I *do* expect to see cases where `eTP` is "Local" and `iTP` is not.

The question I see in this discussion is how do we express what "not local" means? Historically that was "random across the whole cluster". With topology it can mean "pick a reasonable subset for me".

One option is to set `xTP` to "Topology", which is one less field, but makes things like `iTP: Topology; eTP: Cluster` expressible and makes future policies (PreferLocal has some merit) more complex (the "or else" clause is ambiguous).

The other option is to add an orthogonal `topology` field which is a modifier of the meaning of "Cluster for both iTP and eTP. Main downside is that it's YET ANOTHER field and it makes `iTP: Local; eTP: Local; topology: Auto` expressible which is a little weird, though not semantically wrong.

If we ever end up with more params for topology, we end up with the same ambiguity,.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CAGY4dk%3DXTePhkGCETqDeoxTOhbKiJF2Mkc0U3Jjnsf09aOAbBg%40mail.gmail.com.

Antonio Ojea

unread,

Mar 3, 2021, 5:34:14 PM3/3/21

to Tim Hockin, Rob Scott, kubernetes-sig-network

I had an interesting chat with Tim and there is an interesting angle I'd like to share based on that conversation.

ExternalTrafficPolicy and InternalTrafficPolicy have clear scopes and semantics, but we are struggling with Topology because it is too complicated and has so much uncertainty that is hard to model, do we need a switch to enable it, opt-in or opt-out, should it be an enum, how does intersect with traffic policies, ...

It is important to mention that feature gates are not switches, they graduate and they are enabled by default forever, do we really need to define how Topology should work in the API? is it really an API or an implementation detail of the "services-proxy" ?

We do have to implement an API that people can use to implement different Services topologies, that API are the new Slices fields EndpointHints, NodeName, ...

We can also implement some "models" in kube-proxy, same as kube-proxy has different backends for Services: iptables, ipvs.

If we accept this, the question is, how do I use topology for my traffic? quoting Tim

> If we have an annotation that says "generate topology hints for this service" , people can opt-in now. ...

And you can decide on your "service-proxy" if you want to make use of those hints and also expose different methods on how to use them.

Sandor Szuecs

unread,

Mar 10, 2021, 5:06:59 AM3/10/21

to Antonio Ojea, Tim Hockin, Rob Scott, kubernetes-sig-network

I wonder if we need to model it into a service or if we could have an object describing how to create slices out of endpoints with a topology requested by the object.

For our purpose it would be enough that a cluster admin defines the topology and the ingress router compiles its routing tree based on topology.

I also have only ~10 applications out of 3000, which could benefit from this to be honest. I think you need >100 endpoints before it makes sense to slice.

For me "too complicated" always means you deal with the wrong abstraction, maybe you can hook this differently into service.

Just my 2cents, best regards, sandor

--

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CABhP%3DtZuP_Z1EXzcmJ5RehruP%3DjsZcdBYifnDTNavjL2aVa7Gw%40mail.gmail.com.