Hi,
I've started to see a pattern on some new issues opened about Services:
"Session affinity for same service and different ports is forwarded to different endpoints for each destination port #103000" [1]
"kube-proxy failed to clear conntrack for SCTP for the pod which is removed from service #101968" [2]
"conntrack entries not cleared when switching service endpoints #100698" [3]
Digging a bit more on the issues, I think that we can classify them on 2 different but related problems:
- active-passive deployments, like databases. Doing a bit of research, I've found that stackoverflow and other places recommend to use "pet" pods with the databases/application, expose them with a Service, but set only the "active" pod in the Endpoint. The failover is done switching the endpoint IP, but since the Pod is still alive, the TCP connections are not closed and they never fail over automatically.
- application protocols, like FTP or SIP, that don't work well through NAT because they need to open dynamically new ports per example, and they need to send the traffic always to the same pod (the session affinity is embedded in the payload). Typically they are singleton applications that use the Service to expose the Pod,also used to be deployed as active-passive deployments. We touched on this on the KEP about all-ports Services, this is the typical problem solved by ALG [4]
Kubernetes Services are a nice abstraction, you have a virtual IP and port that it's forwarded to the corresponding backends (Pods). But I like to think about Services as a distributed load balancer at the transport level.
These problems are easily solved at a higher level than Services, but I'm not familiar enough with the Ingress and Gateway API implementation. Can you deploy a singleton Ingress/Gateway (since you have to funnel the traffic at one point) to solve them? is there something wrong with that?
It would be good to have some blog so we can offer users solutions (I'm volunteering here) and avoid people getting frustrated because their solution doesn't work and won't be fixed.
I was thinking about these examples:
- active-passive database
- ftp server