@calvix: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-bugs
In response to this:
/sig api-machinery
@kubernetes/sig-api-machinery-bugs
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
/cc @mikedanese
/sub
closing dup
Closed #56584.
Is there a manual fix for this? Seems like any manual IP changes to the kubernetes endpoint object is getting reversed
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Is there a manual fix for this for those that are running an older version? Seems like any manual IP changes to the kubernetes endpoint object is getting reversed
Did you ever find a resolution to this? We don't see proper "load balancing" happening between the 3 API servers listed in the endpoints. One of our masters is pegged super high while the others just sit barely idling. We have a load balancer in front of the control-plane for externally originated API requests, but that doesn't appear to have a hold on the internal comms.
I need to get to the bottom of the 1 master being pinned, or figure out how to adjust those endpoints in the Kubernetes service to point to the load balancer IPs instead of the direct master IPs.
One of our masters is pegged super high while the others just sit barely idling.
Causes:
I think in 1.20 it should be safe to turn back on the probabalistic GOAWAY which will fix 2 / 3 over time. To fix 1 you also have to send those component's traffic through the load balancer.
There is probably a better bug to attach this info to. @liggitt @deads2k
I think #94532 (comment) is the http/2 issue we need a new go version for in order to enable probabalistic GOAWAY
Yeah I've conflated this with the http2 disconnection detection change that made it into 1.20. (which, to be fair, is going to be needed by clients in the scenario that triggers my 2 above)
@lavalamp thanks for the reply! We are running a pretty old cluster (1.12, long story, working toward upgrades), so not sure if that changes anything.
Going through your list
- Co-located controller manager / scheduler which have the lease and are not going through the load balancer
Isn't this a common pattern for HA masters? I've never seen an arch where the controller-manager and scheduler pods for each master were not co-located on the same host.
Maybe I'm misunderstanding. Regardless, are you aware of anything I can check to see if this is the case?
- Rolling restart of apiservers leaves most clients connected to a single apiserver.
Could this happen by patching your master hosts one by one, and rebooting them in a rolling fashion, causing the apiservers to align to switch and switch and switch and coalesce on one master? If that is the case, how would you recommend avoiding that issue? Reboot simultaneously??
- Clients are using HTTP2 and route their many requests over a single connection, exacerbating 2
I'm generally unfamiliar with HTTP2 and Kubernetes Clients' use of it internally, so unsure if this is a culprit in 1.12.
I think in 1.20 it should be safe to turn back on the probabalistic GOAWAY which will fix 2 / 3 over time.
1.12, womp womp.
To fix 1 you also have to send those component's traffic through the load balancer.
How is this achieved? Everything I've found points to the internal kubernetes
service being used to comm back to the control-plane, but I can't change the endpoints without the reconciler setting it back.
Truly appreciate your time and willingness to provide insight.
I don't know that we've specifically done anything about this issue since 1.12, but I really recommend you upgrade before spending much more time on this!
Co-located controller manager / scheduler which have the lease and are not going through the load balancer
I think you're talking about the lease? How would one prevent them from landing on the same master? Or how could you split them once the leases are co-located?
The lease is one aspect, there's no built-in prevention (unless you count resource limits causing one of the components to OOM if both leases end up on the same node). Just delete one of the leases and wait.
A bigger deal is that these components are likely connecting directly to apiserver rather than through the load balancer. See below.
Rolling restart of apiservers leaves most clients connected to a single apiserver.
Could this happen by patching your master hosts one by one, and rebooting them in a rolling fashion, causing the apiservers to align to switch and switch and switch and coalesce on one master?
Right.
If that is the case, how would you recommend avoiding that issue? Reboot simultaneously??
No, then you have a (short) outage... There's really not a good fix for this. The probabalistic GOAWAY is the best we've got. Or will have, anyway. If this is really the problem (which I doubt) you should be able to even out the load a bit by restarting the most heavily loaded apiserver. That only helps if that apiserver is taking more than 50% of the load. But I doubt this is your problem.
To fix 1 you also have to send those component's traffic through the load balancer.
How is this achieved? Everything I've found points to the internal kubernetes service being used to comm back to the control-plane,
I'd be surprised if your controller manager / scheduler are connecting that way. Take a look at their flags and/or kubeconfig files. This is probably your biggest contributor.
In a really extreme case you could run a couple controller managers with different leases and sets of controllers. But this isn't a good experiment to do if prod is on fire, and I think you should spend your time upgrading... :)
Thanks again @lavalamp, total boss.
Good luck with your upcoming 8 consecutive upgrades! You'll be the boss after that...