[kubernetes/kubernetes] Healthchecking of SSH Tunneler seem to be broken (#59347)

0 views
Skip to first unread message

Wojciech Tyczynski

unread,
Feb 5, 2018, 8:04:24 AM2/5/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Healthchecking of Tunneler is part of healthz of apiserver. However, that seems to be a bit broken:

  1. when we start, this is always ok:
    https://github.com/kubernetes/kubernetes/blob/master/pkg/master/tunneler/ssh.go#L52
    https://github.com/kubernetes/kubernetes/blob/master/pkg/master/tunneler/ssh.go#L131

  2. If (because of some reason, e.g. user disabled compute API in GCE) we are not update SSHKeys, after 10 minutes, it will always turn into "failed" - this seems pretty bad to me.

  3. If the call fails because of some reason, there is a big chance that we will fail health-check too. The problem is that:

@gmarek @kubernetes/sig-api-machinery-bugs


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Daniel Smith

unread,
Feb 5, 2018, 4:00:09 PM2/5/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

If this is configured to be on but not functioning, then proxy, port forward, exec, attach, some admission webhooks and some aggregated apiservers are all broken. The latter two are concerning as they are part of the control plane. I think calling the control plane unhealthy for this state is fair.

We don't really have a concept of "live but in degraded mode", but maybe we should invent one.

Wojciech Tyczynski

unread,
Feb 5, 2018, 4:18:51 PM2/5/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Let's be clear - I'm not saying this shouldn't be part of healthcheck.
I guess what I'm saying is:

  1. the installSSHKey should either be called more often or the timeout sholld be longer than 10m (single failure shouldn't cause going into non-health state)
  2. I think we should check both things at start, instead of entering the "healthy" state without setting up tunnelers.

Does that make sense?

Matt Liggett

unread,
Feb 6, 2018, 5:20:06 PM2/6/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

@wojtek-t they both make sense, but we need to be careful that the apiserver pod spec's liveness intervals will play well with this since the apiserver will start out unhealthy and these can be somewhat high-latency operations.

Wojciech Tyczynski

unread,
Feb 7, 2018, 2:31:13 AM2/7/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Actually, I think that:

  • (1) isn't very controversial
  • (2) - agree that it's fairly non-trivial and I had the same concern that you have (otherwise I would have already done that). I'm not entirely sure how to solve it, but where we are now seems definitely like bad state to me.

Matt Liggett

unread,
Feb 12, 2018, 6:29:58 PM2/12/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Related: #55453

fejta-bot

unread,
May 15, 2018, 1:06:57 PM5/15/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Wojciech Tyczynski

unread,
May 15, 2018, 2:15:31 PM5/15/18
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

/remove-lifecycle stale.
/lifecycle frozen

Mario Valderrama

unread,
Sep 20, 2019, 7:30:19 AM9/20/19
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

I'm currently having issues with the 2. point you have mentioned in your initial message. I use an external CCM which doesn't implement the required function. I would propose to fix this by simply skipping the check if the interface doesn't support it.

Jordan Liggitt

unread,
Jan 25, 2022, 9:16:44 AM1/25/22
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

ssh tunnel was removed in 1.22


Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are on a team that was mentioned.Message ID: <kubernetes/kubernetes/issues/59347/1021226149@github.com>

Jordan Liggitt

unread,
Jan 25, 2022, 9:16:47 AM1/25/22
to kubernetes/kubernetes, k8s-mirror-api-machinery-bugs, Team mention

Closed #59347.


Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are on a team that was mentioned.Message ID: <kubernetes/kubernetes/issue/59347/issue_event/5950184375@github.com>

Reply all
Reply to author
Forward
0 new messages