Version skew support for kubelet (n-2)

84 views
Skip to first unread message

Elana Hashman

unread,
Feb 23, 2021, 4:44:08 PM2/23/21
to kubernetes-sig-architecture, kubernete...@googlegroups.com
Currently, we document a support policy of n-2 version skew for the kubelet. This means that we expect, given a 1.20 API Server, that 1.18, 1.19, and 1.20 kubelets will be compatible.

However, it seems that we do not actually test for this compatibility. The last release we did n-2 version upgrade tests appears to be 1.14. I understand there might have been some sort of API change at the time that caused this verification to break. I have raised this on slack and at both SIG Node and the Conformance testing meetings over the past week where we discussed and came to this conclusion.

I would like to propose we make a decision re: n-2 version kubelet support. Either we should choose to verify support, and begin work to put the necessary test infrastructure in place, or we should choose not to support this, and update our published support policies.

I have added this to the SIG Architecture agenda for the Thursday meeting.

Cheers,

- e

Davanum Srinivas

unread,
Feb 23, 2021, 4:52:51 PM2/23/21
to Elana Hashman, kubernetes-sig-architecture, kubernete...@googlegroups.com
Elana,

has this already been discussed at a sig-node meeting? what was the consensus there?

Thanks,
Dims

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAH1uJ6U4qftRnmWthjYKtBKAzRFFnX7XWLMkSbyVWHUsX8%3DBBg%40mail.gmail.com.


--
Davanum Srinivas :: https://twitter.com/dims

Jordan Liggitt

unread,
Feb 23, 2021, 4:55:59 PM2/23/21
to Elana Hashman, kubernetes-sig-architecture, kubernetes-sig-node
Thanks for raising this. Given the number of downstream processes built on top of our published support policies, I think the skew test should be restored.

> I understand there might have been some sort of API change at the time that caused this verification to break

Do you have more details or a pointer to where this was raised? I don't remember that happening and would like to dig into it if there was one.




On Tue, Feb 23, 2021 at 4:44 PM Elana Hashman <ehas...@redhat.com> wrote:
--

Lubomir I. Ivanov

unread,
Feb 23, 2021, 5:21:15 PM2/23/21
to Elana Hashman, kubernetes-sig-architecture, kubernete...@googlegroups.com
testing this would be a matter of adding a couple of new test jobs to
the existing list of kubeadm based jobs:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm

when we designed the interface for the tool being used we thought
about the kubelet skew, but we just don't have any jobs for it
currently.

given:
- this skew is officially supported.
- SIG Cluster Lifecycle cares about the skew since some users are
doing it, intentionally or non-intentionally.

we can just add and maintain these test jobs under the kubeadm
dashboard, but we can also add them to a SIG Node dashboard.
if you have a SIG Node dashboard in mind, do tell.

NOTE: the test tooling is kind based so it's containerd-in-docker, so
if there are objections to this setup i guess another setup has to be
used (i.e. we cannot help).

lubomir
--

Derek Carr

unread,
Feb 23, 2021, 5:26:20 PM2/23/21
to Lubomir I. Ivanov, Elana Hashman, kubernetes-sig-architecture, kubernetes-sig-node
Since the test is primarily verifying communication channels between kube-apiserver and kubelet and less kubelet ability to manage the host, I think that setup described makes sense as an improvement over present state.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

Elana Hashman

unread,
Feb 23, 2021, 5:28:14 PM2/23/21
to Davanum Srinivas, kubernetes-sig-architecture, kubernete...@googlegroups.com
(replying to all emails on thread in one reply)

@dims: We discussed this at SIG Node last week, the slack thread linked below is my written summary of that discussion. Dawn gave some historical context on the introduction of the tests around 1.6 and we discussed up to the removal after 1.14.

@liggitt: I don't have any more details on what changes might have been made that led us to stop supporting the tests. The best people to ask would probably be the 1.15 release team?

@neolit123: I think SIG Cluster Lifecycle is the right group to be responsible for these tests. I don't have enough context to comment on whether kind tests will be sufficient.

- e

Lubomir I. Ivanov

unread,
Feb 24, 2021, 7:25:03 PM2/24/21
to kubernetes-sig-architecture, kubernete...@googlegroups.com
here are the PRs to add test jobs for kubelet N-1 and N-2 skew against
api-server N:
https://github.com/kubernetes/kubeadm/pull/2396
https://github.com/kubernetes/test-infra/pull/21016

what these jobs do:
- create a 3 CP x 2 W cluster using kubeadm
- run some kubeadm e2e and smoke tests
- run the conformance suite (parallel test)
- report logs from all components / containers in a log folder (same as kind).

list of new jobs:
kubelet 1.17 against apiserver 1.18
kubelet 1.17 against apiserver 1.19
kubelet 1.18 against apiserver 1.19
kubelet 1.18 against apiserver 1.20
kubelet 1.19 against apiserver 1.20
kubelet 1.19 against apiserver from master
kubelet 1.20 against apiserver from master

notes:
- all artifacts are build from the HEAD of the respective branches
- older than 1.17-against-1.18 is outside of the main k8s support window.
- N-against-N are already covered in regular kubeadm jobs
- kubeadm N doesn't really support kubelet N-2, but this should work
mostly fine as long as KubeletConfiguration v1beta1 is not removed
within less than 3 releases.
- kubernetes-sig...@googlegroups.com is added on CC for
test failures (this is relatively low noise)
- the test jobs will be visible / mirrored in sig-node-kubelet dashboard
- when testing N-1 or N-2 kubelet against N kube-apiserver, the
conformance suite is build from N of k/k.
- we rotate these jobs every release - e.g. add 1.21 once it's released.

lubomir
--
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CAH1uJ6UJgVkLLLucWk7-3nsi262n%2Bm-tin%3DnfD7_vZ3MD%2B33Sw%40mail.gmail.com.

Lubomir I. Ivanov

unread,
Feb 25, 2021, 3:57:29 PM2/25/21
to kubernetes-sig-architecture, kubernete...@googlegroups.com
as discussed in today's call there is a problem area around the
kube-proxy against kubelet skew for the new tests we are adding:
https://kubernetes.io/docs/setup/release/version-skew-policy/#kube-proxy

quote:
> kube-proxy must be the same minor version as kubelet on the node.
> kube-proxy must not be newer than kube-apiserver.
> kube-proxy must be at most two minor versions older than kube-apiserver.

kubeadm today always deploys kube-proxy's version equal to the
kube-apiserver version.
which violates the first point above if we have a setup where only the
kubelet version is skewed (N-1 or N-2).

in practice i have not seen this break a cluster, but if we see
problems around it we could update the proposed test setup to include
a matching kube-proxy == kubelet versions.
...which would be hacky and can become complicated if
KubeProxyConfiguration v1alpha1 is removed in less that 3 releases (i
think).

lubomir
--

Jordan Liggitt

unread,
Feb 25, 2021, 5:54:57 PM2/25/21
to Lubomir I. Ivanov, kubernetes-sig-architecture, kubernetes-sig-node
The kube-proxy skew section was added in https://github.com/kubernetes/website/pull/22034, prompted by https://github.com/kubernetes/website/issues/12322 tracking open questions remaining from the initial skew policy PR.

Lubomir I. Ivanov

unread,
Feb 25, 2021, 7:23:49 PM2/25/21
to Fox, Kevin M, kubernetes-sig-architecture, kubernete...@googlegroups.com
On Fri, 26 Feb 2021 at 00:17, Fox, Kevin M <Kevi...@pnnl.gov> wrote:
>
> I've seen it break things on a cluster, but not 100% sure it had to do specifically with the node/proxy setup, or more with proxy / calico.
>

i guess we will find out.

> To fix this, could we get nodes get automatically labeled with the minor version:
> kubernetes.io/minor: 1.20
>
> and then have kubeadm deploy 3 x kube-proxy daemonsets, one targeted for n, n-1 and n-2 with the right versions?
>

per test job, it could skip the built-in `addons/kube-proxy` phase and
then apply a single kube-proxy DS (+side resources) with a matching
version to the kubelet version.
alternatively it could patch the kube-proxy DS image, but that's one restart.

lubomir
--


> ________________________________________
> From: kubernete...@googlegroups.com <kubernete...@googlegroups.com> on behalf of Lubomir I. Ivanov <neol...@gmail.com>
> Sent: Thursday, February 25, 2021 12:57 PM
> To: kubernetes-sig-architecture; kubernete...@googlegroups.com
> Subject: Re: Version skew support for kubelet (n-2)
>
> Check twice before you click! This email originated from outside PNNL.
>
>
> as discussed in today's call there is a problem area around the
> kube-proxy against kubelet skew for the new tests we are adding:
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkubernetes.io%2Fdocs%2Fsetup%2Frelease%2Fversion-skew-policy%2F%23kube-proxy&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7Cdaeaecd0bc0345f856f208d8d9d00ccb%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637498834894050387%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5nuSMscQ8CoMptqTXRXHZLY2j6bU404p6J2el2Hn78E%3D&amp;reserved=0
>
> quote:
> > kube-proxy must be the same minor version as kubelet on the node.
> > kube-proxy must not be newer than kube-apiserver.
> > kube-proxy must be at most two minor versions older than kube-apiserver.
>
> kubeadm today always deploys kube-proxy's version equal to the
> kube-apiserver version.
> which violates the first point above if we have a setup where only the
> kubelet version is skewed (N-1 or N-2).
>
> in practice i have not seen this break a cluster, but if we see
> problems around it we could update the proposed test setup to include
> a matching kube-proxy == kubelet versions.
> ...which would be hacky and can become complicated if
> KubeProxyConfiguration v1alpha1 is removed in less that 3 releases (i
> think).
>
> lubomir
> --
>
>
>
> On Thu, 25 Feb 2021 at 02:24, Lubomir I. Ivanov <neol...@gmail.com> wrote:
> >
> > here are the PRs to add test jobs for kubelet N-1 and N-2 skew against
> > api-server N:
> > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkubernetes%2Fkubeadm%2Fpull%2F2396&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7Cdaeaecd0bc0345f856f208d8d9d00ccb%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637498834894050387%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4pcJIyLr3xles%2B1HPNkwDwghLZ5iI9d4B6zVLdy2gS0%3D&amp;reserved=0
> > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkubernetes%2Ftest-infra%2Fpull%2F21016&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7Cdaeaecd0bc0345f856f208d8d9d00ccb%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637498834894050387%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OF%2FpEdHi2%2Fwt2vADNVRW3%2Fa81dkq0E9eKuqjTUZQRMw%3D&amp;reserved=0
> > >>> To view this discussion on the web visit https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fkubernetes-sig-architecture%2FCAH1uJ6U4qftRnmWthjYKtBKAzRFFnX7XWLMkSbyVWHUsX8%253DBBg%2540mail.gmail.com&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7Cdaeaecd0bc0345f856f208d8d9d00ccb%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637498834894050387%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uoFgbpUxAFGWA%2BGS2mIDhmQ1urKuTYsFg%2BCfyzu5rDQ%3D&amp;reserved=0.
> > >>
> > >>
> > >>
> > >> --
> > >> Davanum Srinivas :: https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fdims&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7Cdaeaecd0bc0345f856f208d8d9d00ccb%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637498834894050387%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=MqNY%2FInB%2FGmuLVgASz%2B0PbGr6BWa%2B8Q%2FO41Rd%2B4k0zI%3D&amp;reserved=0
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
> > > To view this discussion on the web visit https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fkubernetes-sig-node%2FCAH1uJ6UJgVkLLLucWk7-3nsi262n%252Bm-tin%253DnfD7_vZ3MD%252B33Sw%2540mail.gmail.com&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7Cdaeaecd0bc0345f856f208d8d9d00ccb%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637498834894050387%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=108rhHfH4uxl6nC8oTCJRuexhNnjFDtUR%2FSHgqOjCdk%3D&amp;reserved=0.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
> To view this discussion on the web visit https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fkubernetes-sig-node%2FCAGDbWi9DtNOE7b2DnSJi9rAsPQ_Oc8HapAEEA-OejGeiVxf5Mw%2540mail.gmail.com&amp;data=04%7C01%7CKevin.Fox%40pnnl.gov%7Cdaeaecd0bc0345f856f208d8d9d00ccb%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C637498834894050387%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=f2m9pwM7siekhW%2BvtlFmZL%2Fy7NqVWE2ZP3Ks27W4bRI%3D&amp;reserved=0.

Antonio Ojea

unread,
Feb 26, 2021, 3:26:01 PM2/26/21
to Lubomir I. Ivanov, Fox, Kevin M, kubernetes-sig-architecture, kubernete...@googlegroups.com
I agree with Dan Winship here that the only relation between kube-proxy and kubelet are some iptables rules, I may be missing something though, but I don't think that kube-proxy skew policy should depend on kubelet, it consumes services and endpoint-slices.

Lubomir I. Ivanov

unread,
Mar 4, 2021, 12:00:27 PM3/4/21
to kubernetes-sig-architecture, kubernete...@googlegroups.com
helloo,

the e2e test job PRs are merging and the skew testing will start later today:
https://github.com/kubernetes/test-infra/pull/21016

if we see setup related failures we will address them until the tests
are running properly.

@cynepco3hahue asked here:
https://github.com/kubernetes/test-infra/pull/21016#issuecomment-790750888

> I do not see any problems with jobs, the single question who will be responsible to analyze failures(in the case when we will have any)

my proposal:

> - if the failures are deployer related (kubeadm) - the kubeadm team will triage and fix.
> - if the kubeadm team sees potential problems in the kubelet skew we will ping sig-node, but sig-node will see the failures on their mailing list too.
> - if kubelet skew failures are not resolved by the sig-node team after a long period of time, the kubeadm team will pause / remove affected jobs.
> in such a case we must claim a particular skew as unsupported.

the same applies if we see skew problems right after today's initial
jobs launch.

lubomir
--

Lubomir I. Ivanov

unread,
Mar 6, 2021, 8:00:42 PM3/6/21
to kubernetes-sig-architecture, kubernete...@googlegroups.com
hello,

after a few runs all the new kubeadm based kubelet skew jobs are
green, except one:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-kubelet-1-19-on-latest

notes:
- kubelet is at the HEAD of release-1.19
- apiserver / test suite are at HEAD of the master branch (to be
released as 1.21)

this is failing on running the test:
Kubernetes e2e suite.[sig-node] Probing container should be restarted
with an exec liveness probe with timeout [NodeConformance]
[Conformance]

the test is here:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/common/node/container_probe.go#L212-L226

i've logged an issue in k/k and tagged with 1.21 milestone for
sig-node to triage:
https://github.com/kubernetes/kubernetes/issues/99909

lubomir
--

Jordan Liggitt

unread,
Mar 6, 2021, 8:40:55 PM3/6/21
to Lubomir I. Ivanov, kubernetes-sig-architecture, kubernete...@googlegroups.com
Thanks for driving this, that's a great result.

Lubomir I. Ivanov

unread,
Apr 6, 2021, 7:55:39 PM4/6/21
to kubernete...@googlegroups.com, kubernetes-sig-architecture, Jordan Liggitt
hello,

after the addition of the kubelet skew tests in the kubeadm e2e test
setup, we faced some difficulties explaining to the kubeadm
contributors how to properly update these tests as part of our "e2e
housekeeping" for 1.21. this was expected as the kubelet skew jobs
further increased the complexity of the manual edits one has to
perform for such an update.

thus, i've created a small tool to automate the process:
https://github.com/kubernetes/kubeadm/pull/2434

the lack of such tooling was briefly discussion with @liggitt in a SIG
Arch meeting.
i'm sending this email to notify the relevant groups about the improvement.

lubomir
--
Reply all
Reply to author
Forward
0 new messages