CI Signal Report for Week Starting 6/10

46 views
Skip to first unread message

Jim Angel

unread,
Jun 14, 2019, 12:36:48 PM6/14/19
to kuberne...@googlegroups.com

CI Signal Report for Week Starting 6/10

Questions or feedback? Just reply to this email or come over to #sig-release.


Hello and welcome to the CI signal report for the Kubernetes 1.15 release cycle. This is week 10 of the release cycle.


We have finally entered code thaw! This week we took some time to investigate performance regressions that showed up in scalability tests #78734, and finally reverted our e2e tests to use coreDNS v1.3.1 #78123.


Now that the master branch is again open to PRs, we will focus on the 1.15 Tesgrid dashboards for the remainder of the release cycle.


For more details and the latest on all topics, please visit the 1.15 CI Signal project board.

🎉 Resolved

SIG Cluster Lifecycle

  • #78123 (Milestone v1.15) Multiple job failures on the BeforeSuite step while waiting for CoreDNS to be ready. The failures were due to the coreDNS version being bumped to 1.5.0 in #78030 (Merged). The initial desire to bump the coreDNS version was to solve an issue coreDNS kept on OOMing during scalability tests with 5K+ node clusters, see #78691#issuecomment-498673666.

    • The two types of jobs that were affected were kinder-based jobs (which use kubeadm) and GCE-based jobs that use kube-up to stand up a cluster.

    • The kinder jobs were fixed by reverting coreDNS to v1.13.1 in #78545.

    • #78691 reverts to coreDNS v1.3.1 in kube-up. An issue, #78919, was later found in the e2e tests in which the coreDNS config needed to be readapted for v1.3,  #78920.

    • #78302 (Milestone v1.16) proposes a solution for GCE jobs using kube-up to use coreDNS v1.5.0, but deletes its config for the upgrade.

  • #78907 Kinder cluster upgrades flaked for a couple runs but stabilized after #78915.

SIG Scalability

  • #78734 (Milestone v1.15) timeouts in ci-kubernetes-e2e-gce-scale-performance. A bump in klog version, from v0.3.1 to v0.3.2 in #78465 introduced a performance regression, see kubernetes/test-infra#12940 for more details.

    • The bump in klog version (#78465) was intended to solve an issue, kubernetes/klog#53, in which klog was writing multiple times the same content when configured to log to a file.

    • klog log file issues were discovered when klog started being used in such a way for #76396 (convert GCE manifests for master containers to remove shell dependencies in order to move forward with distroless images).

✈️ In flight

SIG Scalability

  • #73884 (Milestone v1.16) pull-kubernetes-e2e-gce-100-performance is another presubmit test which has been known to flake often. The issue seems to be to fluentd-gcp high memory consumption. There is fix open in #524 (Open) but the reason for the memory increase is being investigated in #77492 (Open).

SIG Testing

  • #78982 (Milestone v1.16) Some unit tests do not pass locally, but pass in bazel CI. This is due to some unit tests not having Go source code available and the ciphersuites test deciding to skip along with envelope grpc tests being linux specific.


🤔 New/Not Yet Started

SIG Auth

  • #75563 (Milestone v1.16) Cleanup Advanced Audit testing

SIG Cluster Lifecycle

  • #78901 (Milestone v1.16) multiple reboot tests flaking

SIG GCP

SIG Network

  • #76721 (Milestone v1.16) diffResources test has been flaking in ci-kubernetes-e2e-gci-gce-ingress but we are still observing.

Failures in Master-Blocking:

  • 14 jobs total

  • 9 are passing

  • 5 are flaking

  • 0 are failing

  • 0 are stale

Failures in Master-Informing:

  • 18 jobs total

  • 6 are passing

  • 7 are flaking

  • 4 are failing

  • 1 are stale

Failures in 1.15-blocking:

  • 13 jobs total

  • 8 are passing

  • 5 are flaking

  • 0 are failing

  • 0 are stale

Failures in 1.15-informing:

  • 1 job total

1 are passing  
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages