[k8s-sig-net] Stateful Network Applications

441 views
Skip to first unread message

Antonio Ojea

unread,
Sep 17, 2021, 11:36:18 AM9/17/21
to kubernetes-sig-network
Hi,

I've started to see a pattern on some new issues opened about Services:

"Session affinity for same service and different ports is forwarded to different endpoints for each destination port #103000" [1]
"kube-proxy failed to clear conntrack for SCTP for the pod which is removed from service #101968" [2]
"conntrack entries not cleared when switching service endpoints #100698" [3]

Digging a bit more on the issues, I think that we can classify them on 2 different but related problems:

- active-passive deployments, like databases. Doing a bit of research, I've found that stackoverflow and other places recommend to use "pet" pods with the databases/application, expose them with a Service, but set only the "active" pod in the Endpoint. The failover is done switching the endpoint IP, but since the Pod is still alive, the TCP connections are not closed and they never fail over automatically.

- application protocols, like FTP or SIP, that don't work well through NAT because they need to open dynamically new ports per example, and they need to send the traffic always to the same pod (the session affinity is embedded in the payload). Typically they are singleton applications that use the Service to expose the Pod,also used to be deployed as active-passive deployments. We touched on this on the KEP about all-ports Services, this is the typical problem solved by ALG [4]

Kubernetes Services are a nice abstraction, you have a virtual IP and port that it's forwarded to the corresponding backends (Pods). But I like to think about Services as a distributed load balancer at the transport level.

These problems are easily solved at a higher level than Services, but I'm not familiar enough with the Ingress and Gateway API implementation. Can you deploy a singleton Ingress/Gateway (since you have to funnel the traffic at one point) to solve them? is there something wrong with that?

It would be good to have some blog so we can offer users solutions (I'm volunteering here) and avoid people getting frustrated because their solution doesn't work and won't be fixed.

I was thinking about these examples:
- active-passive database
- ftp server

Antonio Ojea

unread,
Sep 24, 2021, 3:03:19 AM9/24/21
to kubernetes-sig-network
... or someone can create a new KPNG backend ;)

Bowei Du

unread,
Sep 24, 2021, 1:20:35 PM9/24/21
to Antonio Ojea, kubernetes-sig-network
Response inline:

- active-passive deployments, like databases. Doing a bit of research, I've found that stackoverflow and other places recommend to use "pet" pods with the databases/application, expose them with a Service, but set only the "active" pod in the Endpoint. The failover is done switching the endpoint IP, but since the Pod is still alive, the TCP connections are not closed and they never fail over automatically.

The assumption that kube makes in this case is that Pods prefer a graceful shutdown (keep existing flows) vs fencing (cut all connections).

Couple of things here:

- What signal does the Pod give to indicate that primary changed?
- How to configure (hypothetically, not that we would do so, a Service field to tell the cluster LB to fence as opposed to gracefully handle connections.

- application protocols, like FTP or SIP, that don't work well through NAT because they need to open dynamically new ports per example, and they need to send the traffic always to the same pod (the session affinity is embedded in the payload). Typically they are singleton applications that use the Service to expose the Pod,also used to be deployed as active-passive deployments. We touched on this on the KEP about all-ports Services, this is the typical problem solved by ALG [4]

This sounds like it should use an AllPorts service to managed Ports.

Thanks,
Bowei

khn...@gmail.com

unread,
Sep 25, 2021, 10:31:07 AM9/25/21
to kubernetes-sig-network
adding my 2c (or 2p depending on where you are :-) )

- For allPorts service i do believe the next step is to find a solution for affinity-ize the thing for all ports (something under discussion right now for normal services). I am guessing a SIP client will want to open another port to the same replica not to another replica. FWIW i don't have a solution yet for that. But Eventually will. A problem for another day.

- For DB like (single active replica) I have always thought of that as kubernetes service short-coming. Today if you want to run any workload that is active/passive style (that includes all non-multi-lead-writes DBs e.g, mysql maria sql-server etc. Those my implement multi leader, but usually advise against for perf reasons) you will have to write it yourself. And i find that sucky and redundant.

There are two scenarios failover and that is pretty mechanical EndPointSlice should be able to select one replica that matches the selector instead of all. This could be via another service.Spec field (sigh). Some thinking will have to go into ready/not-ready state while selecting (among other things). The other scenario blue/green style deployment that needs to stay out of service logic and implemented as it is today via external tool chain.

Kal

antonio.o...@gmail.com

unread,
Aug 25, 2023, 11:49:29 AM8/25/23
to kubernetes-sig-network
bumping again this thread, I think that we can revisit this now that we have a new state on the EndpointSlices


The tricky part is how to technically break connections, but assuming we can, do we want to do it?

This will solve the problem of active-passive deployments that does not work with kube-proxy iptables

Reply all
Reply to author
Forward
0 new messages