[URGENT] Concerns about the flakiness of jobs

11 views
Skip to first unread message

Nabarun Pal

unread,
Apr 7, 2021, 2:34:02 PM4/7/21
to Kubernetes developer/contributor discussion, le...@kubernetes.io, kubernetes-...@googlegroups.com, kubernetes-sig-release, release-team

Hello everyone!


As you might know, Tomorrow, Thursday 8th April is when we plan to release Kubernetes 1.21.


From the test grid, we noticed that there are some flaky tests that have higher occurrence compared to others. Following are a list of them:


We have marked the above flake issues in the current milestone. On the basis of the inputs from the community, we have to decide on the following:


Can any of the above flakes block the 1.21 release?

  • If no,

    • We are good to go with the release Tomorrow, Thursday, April 8th.

  • If yes, can they be resolved by this Friday, April 9th?

    • If yes, we can delay the release to Tuesday, April 13th.

    • If not, can we reduce the severity of those flakes by the end of the week and release on Tuesday, April 13th


The Release Team would really appreciate thoughts from the community on this, as early as possible. On the basis of the inputs, we will figure out whether the release needs to be delayed. We are timeboxing the window for raising concerns to Wednesday, April 7th 12AM UTC // 5PM Pacific Time. What we are looking for by the mentioned timebox is a signal and not the exact resolution of those flakes.


If you have any thoughts/questions/concerns, please reply to this email or respond to the above GitHub issues or ping us on #sig-release.


Thank you

Nabarun Pal

(on behalf of the Kubernetes 1.21 Release Team)

Antonio Ojea

unread,
Apr 7, 2021, 4:31:49 PM4/7/21
to Nabarun Pal, Kubernetes developer/contributor discussion, le...@kubernetes.io, kubernetes-sig-network, kubernetes-sig-release, release-team
They doesn't seem a problem for the release:

* The 2 tests "[sig-network] Networking Granular Checks:*hostNetwork" seems to be a specific problem on that job, they fail because a conflict trying to deploy the test pods, and are passing in other jobs https://testgrid.k8s.io/sig-network-gce#gci-gce&include-filter-by-regex=hostNet

* [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:Ingress] should conform to Ingress spec, this is one of the cases that Kubernetes only have the API in the repository, the implementation are not part of Kubernetes itself, since the API didn't change most likely the problem is environmental: the job, ingress implementation, ...

Regards,
A.Ojea

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-release" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-re...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-release/CAB_Fwd_nNm5RZq3QMc314hiVwn2Kk9a4uEeXMCu2XmaN_v6bew%40mail.gmail.com.

Bowei Du

unread,
Apr 7, 2021, 4:34:07 PM4/7/21
to Antonio Ojea, Swetha Repakula, Nabarun Pal, Kubernetes developer/contributor discussion, le...@kubernetes.io, kubernetes-sig-network, kubernetes-sig-release, release-team
+Swetha Repakula  is actively looking at this one:

[Flaky Test][sig-network] Loadbalancing: L7 GCE [Slow] [Feature:Ingress] should conform to Ingress spec #93740

Thanks,
Bowei

You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CABhP%3DtaLEo0AwHYFP96-kqzX9WuRT8NU%3DmQKSk2ak-mPQDeUUQ%40mail.gmail.com.

Antonio Ojea

unread,
Apr 7, 2021, 4:55:14 PM4/7/21
to Bowei Du, Swetha Repakula, Nabarun Pal, Kubernetes developer/contributor discussion, le...@kubernetes.io, kubernetes-sig-network, kubernetes-sig-release, release-team
great,
I'm looking at the hostnetwork ones, I have a PR to fix one of the issues with both tests
but I want to check why it is only flaking in that specific job

Swetha Repakula

unread,
Apr 7, 2021, 8:12:28 PM4/7/21
to Antonio Ojea, Bowei Du, Nabarun Pal, Kubernetes developer/contributor discussion, le...@kubernetes.io, kubernetes-sig-network, kubernetes-sig-release, release-team
As an update for [Flaky Test][sig-network] Loadbalancing: L7 GCE [Slow] [Feature:Ingress] should conform to Ingress spec #93740.  The root cause of the failure is not yet clear. However this should not be a release blocking issue. Looking at the logs the error is coming from something outside of the Kubernetes release. The issue seems to be more about the loadbalancer not being ready in time.

I have posted a similar comment on the issue.

Nabarun Pal

unread,
Apr 7, 2021, 9:31:15 PM4/7/21
to Swetha Repakula, Antonio Ojea, Bowei Du, pavi...@google.com, Kubernetes developer/contributor discussion, le...@kubernetes.io, kubernetes-sig-network, kubernetes-sig-release, release-team
Thank you Antonio, Bowei, Pavithra, Swetha, and everyone else for looking at the flakes and helping us understand the nature of these flakes in the context of the v1.21 release.

We appreciate all of your insights!
Reply all
Reply to author
Forward
0 new messages