Questions or feedback? Just reply to this email or come over to #sig-release.
Hello and welcome to the CI signal report for the Kubernetes 1.15 release cycle. This is week 10 of the release cycle.
We have finally entered code thaw! This week we took some time to investigate performance regressions that showed up in scalability tests #78734, and finally reverted our e2e tests to use coreDNS v1.3.1 #78123.
Now that the master branch is again open to PRs, we will focus on the 1.15 Tesgrid dashboards for the remainder of the release cycle.
For more details and the latest on all topics, please visit the 1.15 CI Signal project board.
SIG Cluster Lifecycle
#78123 (Milestone v1.15) Multiple job failures on the BeforeSuite step while waiting for CoreDNS to be ready. The failures were due to the coreDNS version being bumped to 1.5.0 in #78030 (Merged). The initial desire to bump the coreDNS version was to solve an issue coreDNS kept on OOMing during scalability tests with 5K+ node clusters, see #78691#issuecomment-498673666.
The two types of jobs that were affected were kinder-based jobs (which use kubeadm) and GCE-based jobs that use kube-up to stand up a cluster.
The kinder jobs were fixed by reverting coreDNS to v1.13.1 in #78545.
#78691 reverts to coreDNS v1.3.1 in kube-up. An issue, #78919, was later found in the e2e tests in which the coreDNS config needed to be readapted for v1.3, #78920.
#78302 (Milestone v1.16) proposes a solution for GCE jobs using kube-up to use coreDNS v1.5.0, but deletes its config for the upgrade.
#78907 Kinder cluster upgrades flaked for a couple runs but stabilized after #78915.
SIG Scalability
#78734 (Milestone v1.15) timeouts in ci-kubernetes-e2e-gce-scale-performance. A bump in klog version, from v0.3.1 to v0.3.2 in #78465 introduced a performance regression, see kubernetes/test-infra#12940 for more details.
The bump in klog version (#78465) was intended to solve an issue, kubernetes/klog#53, in which klog was writing multiple times the same content when configured to log to a file.
klog log file issues were discovered when klog started being used in such a way for #76396 (convert GCE manifests for master containers to remove shell dependencies in order to move forward with distroless images).
SIG Scalability
#73884 (Milestone v1.16) pull-kubernetes-e2e-gce-100-performance is another presubmit test which has been known to flake often. The issue seems to be to fluentd-gcp high memory consumption. There is fix open in #524 (Open) but the reason for the memory increase is being investigated in #77492 (Open).
SIG Testing
#78982 (Milestone v1.16) Some unit tests do not pass locally, but pass in bazel CI. This is due to some unit tests not having Go source code available and the ciphersuites test deciding to skip along with envelope grpc tests being linux specific.
SIG Auth
#75563 (Milestone v1.16) Cleanup Advanced Audit testing
SIG Cluster Lifecycle
#78901 (Milestone v1.16) multiple reboot tests flaking
SIG GCP
#74893 (Milestone v1.16) Cluster upgrades jobs for Kubernetes v1.14 have been failing in https://testgrid.k8s.io/sig-release-1.14-all
SIG Network
#76721 (Milestone v1.16) diffResources test has been flaking in ci-kubernetes-e2e-gci-gce-ingress but we are still observing.
14 jobs total
9 are passing
5 are flaking
0 are failing
0 are stale
18 jobs total
6 are passing
7 are flaking
4 are failing
1 are stale
13 jobs total
8 are passing
5 are flaking
0 are failing
0 are stale
1 job total
1 are passing