When: Weekly on Wed, 9:45 – 10:15am
Attendees: dhiller, brianmcarey,
Topics:
[urgent]
revisit previous action items
Daniel Hiller check network clustered failures issue state - https://github.com/kubevirt/kubevirt/issues/12898
Ammar: to look whether there’s a correlation between the kci bump
seems that the cluster is more loaded now
bc: will take a look again at the cluster load
was talking to the arm - Howard - it seems that the VMIs are taking longer to come up
edy: we might need some more profiling (about k8s?)
looks like sth is taking more resources but we can’t confirm whether this is just a side effect - overall it takes more time to reconcile, then tests are either failing due to timeout
we are not sure why load has increased
proposal: have a profiling on a regular basis - in the overall sense
q: should we ignore intermediate errors that are coming due to slowness - take this to the community meeting
Daniel Hiller q VSOCK test PR: https://github.com/kubevirt/kubevirt/pull/12901
seems as if failure rate has decreased - decision: close or not?
Daniel Hiller check: Storage lanes are now running with etcd in memory so we should no longer see etcd timeouts there.
we don’t see the timeouts on the presubmit lanes any more: https://search.ci.kubevirt.io/?search=etcdserver%3A+request+timed+out&maxAge=48h&context=1&type=build-log&name=&excludeName=periodic.*&maxMatches=5&maxBytes=20971520&groupBy=job
discuss imminent topics
Daniel Hiller arm lane still failing
[bc] Howard might get to looking at it this Friday
running on their hw, so it’s just a waste
we could switch to always_run: false, so that it can still be executed
we might do this for 1.3 and 1.2 also, since they are failing
we might check for a backport that has caused it to help locate the issue
Look at flakes
flake stats - create issues accordingly
we’ve seen still rather high numbers of failures on Monday and Tuesday this week
sig-network: major clustered failure - 31 tests: https://prow.ci.kubevirt.io/view/gcs/kubevirt-prow/pr-logs/pull/kubevirt_kubevirt/12667/pull-kubevirt-e2e-k8s-1.29-sig-network/1838121657995104256
sig-compute-migrations: three clustered failures
also VSOCK test slightly flaky
about to create an issue on both of the tests, since both show up in flake stats
dequarantine tests:
look at list of quarantined tests
check status, i.e. who is working on those
look at PRs that want to fix flakes
see whether we can dequarantine tests
Action items
Daniel Hiller create a tracker issue - evaluate wholistic profiling
Daniel Hiller convert https://github.com/kubevirt/kubevirt/pull/12901 into issue, close PR
Brian Carey increase the range of stored metrics on the workloads cluster
Brian Carey arm lane - switch to always_run: false for main, release-1.3 and release-1.2
Daniel Hiller what commitment do we have wrt arm arch?
Daniel Hiller get back to terminationGracePeriod Q test - who is working on it?
https://storage.googleapis.com/kubevirt-prow/reports/quarantined-tests/kubevirt/kubevirt/index.html
Daniel Hiller create flake issues
Kind regards,
Daniel Hiller
He / Him / His
Senior Software Engineer, KubeVirt CI, OpenShift Virtualization
![]() |
Red Hat GmbH, Registered seat: Werner-von-Siemens-Ring 12, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht München/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross