Healthchecking of Tunneler is part of healthz of apiserver. However, that seems to be a bit broken:
when we start, this is always ok:
https://github.com/kubernetes/kubernetes/blob/master/pkg/master/tunneler/ssh.go#L52
https://github.com/kubernetes/kubernetes/blob/master/pkg/master/tunneler/ssh.go#L131
If (because of some reason, e.g. user disabled compute API in GCE) we are not update SSHKeys, after 10 minutes, it will always turn into "failed" - this seems pretty bad to me.
If the call fails because of some reason, there is a big chance that we will fail health-check too. The problem is that:
@gmarek @kubernetes/sig-api-machinery-bugs
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
If this is configured to be on but not functioning, then proxy, port forward, exec, attach, some admission webhooks and some aggregated apiservers are all broken. The latter two are concerning as they are part of the control plane. I think calling the control plane unhealthy for this state is fair.
We don't really have a concept of "live but in degraded mode", but maybe we should invent one.
Let's be clear - I'm not saying this shouldn't be part of healthcheck.
I guess what I'm saying is:
Does that make sense?
@wojtek-t they both make sense, but we need to be careful that the apiserver pod spec's liveness intervals will play well with this since the apiserver will start out unhealthy and these can be somewhat high-latency operations.
Actually, I think that:
Related: #55453
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale.
/lifecycle frozen
I'm currently having issues with the 2. point you have mentioned in your initial message. I use an external
CCM which doesn't implement the required function. I would propose to fix this by simply skipping the check if the interface doesn't support it.
ssh tunnel was removed in 1.22
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.
Closed #59347.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.