When: Weekly on Wed, 9:45 – 10:15am
Notes: KubeVirt CI SIG meeting notes
Attendees: dhiller, dollierp,
Reminders:
we will create GitHub issues for tracking
GitHub issues and PRs
should be marked with /sig ci and /kind flake if applicable
should be marked with the target sig
Topics:
[urgent]
[dollierp] KubeVirt Prow Control Plane cluster will be updated to OpenShift 4.20 on Friday.
might see some aborted test running there
[ycui] hold PRs: https://github.com/kubevirt/kubevirt/pulls?q=is%3Apr+is%3Aopen+label%3Aapproved+label%3Algtm+-label%3Ado-not-merge%2Fhold
[dhiller] sig-compute quarantine flaky test “test_id:832” “cpu pinning with fedora images”
compute, network and storage are all above the 5%
previous action items
state of existing issues: https://github.com/kubevirt/kubevirt/issues?q=is%3Aissue+is%3Aopen+label%3Akind%2Fflake+sort%3Aupdated-asc+label%3Asig%2Fci
n/a
[non-urgent]
Priority PR
[dhiller] will keep an eye on this
Look at flakes
flake stats - create issues accordingly
[dhiller] all sig-compute periodics are still suffering from clustered failures
overall ( ∑=3427, 100.00% )
periodic-kubevirt-e2e-k8s-1.34-sig-compute ( ∑=894, 26.09% )
clustered failures:
periodic-kubevirt-e2e-k8s-1.35-sig-compute ( ∑=811, 23.67% )
clustered failures
pull-kubevirt-e2e-k8s-1.35-sig-compute-serial ( ∑=595, 17.36% )
is getting stable - failures decreased from 990 last week to 595 - however there’s quite a bit of interrupted builds, which seems suspicious
periodic-kubevirt-e2e-k8s-1.33-sig-compute ( ∑=592, 17.27% )
clustered failures:
periodic-kubevirt-e2e-test-S390X ( ∑=109, 3.18% )
overall flake rate seems to decrease in the recent runs, however there’s still one test that is failing every time on the lane:
[sig-compute]VM Affinity Updating VMs node affinity [test_id:11208]should successfully update node selector
remainder here is the usual flake rate
periodic-kubevirt-e2e-k8s-1.35-sig-storage ( ∑=57, 1.66% )
periodic-kubevirt-e2e-k8s-1.34-sig-storage ( ∑=54, 1.58% )
periodic-kubevirt-e2e-k8s-1.33-sig-storage ( ∑=53, 1.55% )
periodic-kubevirt-e2e-k8s-1.34-sig-monitoring ( ∑=50, 1.46% )
pull-kubevirt-e2e-k8s-1.34-sig-compute-serial ( ∑=41, 1.20% )
dequarantine tests:
look at list of quarantined tests
count: 18 tests in quarantine currently - no change to last week
kubevirt/kubevirt quarantined tests over time (by SIG):
Legend: Red(Total) | Blue(Compute) | Green(Storage) | Orange(Network) | Purple(Monitoring)
check status, i.e. who is working on those
see whether we can dequarantine tests
misc topics
Action items
update/create issues with latest flakes spotted
communication
send meeting notes to kubevirt-dev, bcc sig people for spotted flakes (include meeting changes for upcoming instances)
Kind regards,
Daniel Hiller
He / Him / His
Principal Software Engineer, KubeVirt CI, OpenShift Virtualization
![]() |
Red Hat GmbH, Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht Muenchen/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Avril Crosse O'Flaherty
Hey Lee,thanks for raising that!Interesting - I was under the impression that another PR which was merged last week should have brought improvement. I'll take another look and share my findings here then.

Best,DanielOn Wed, Mar 11, 2026 at 10:40 AM Lee Yarwood <lyar...@redhat.com> wrote:On Wed, 11 Mar 2026 at 09:10, 'Daniel Hiller' via kubevirt-dev
<kubevi...@googlegroups.com> wrote:
>
> [..]
>
> flake stats - create issues accordingly
>
> [dhiller] all sig-compute periodics are still suffering from clustered failures
Hey Daniel,
Apologies for not being on the call, I spoke briefly to Lubo about
this while dropping my kids off at school and it looks like these
clustered failures stopped once
https://github.com/kubevirt/project-infra/pull/4784 landed on Monday
afternoon.
I often find the full 7 days of CI data can be very misleading once
fixes have landed. Can I suggest that we also consider sharing 24, 48
and 72 hour trends when summarising the state of CI? That way fixes
like this should show up and reflect the true current state of things.
Cheers,
Lee
----Best,Daniel
On Wed, 11 Mar 2026 at 10:40, Daniel Hiller <dhi...@redhat.com> wrote:
>
> Hey again,
>
> On Wed, Mar 11, 2026 at 11:20 AM Daniel Hiller <dhi...@redhat.com> wrote:
>>
>> Hey Lee,
>>
>> thanks for raising that!
>>
>> Interesting - I was under the impression that another PR which was merged last week should have brought improvement. I'll take another look and share my findings here then.
>>
>
> indeed I can confirm that no clustered failure occurred after the fix had landed 🎉
Excellent thanks!
> Also we are seeing an overall decrease of failures in the periodics from 80 to around 60% - this might not sound like much but since the periodics are running the quarantined tests, this indeed is significant.
>
> https://grafana.ci.kubevirt.io/d/efpTS3t4z/e2e-jobs-overview-v2?orgId=1&from=1773054000000&to=1773223199000&var-job_name=periodic-kubevirt-e2e-k8s-.%2Asig-compute&viewPanel=15
Stupid question but what is the benefit in running quarantined as part
of our periodics? If the intention is to confirm if they are still
flaky, shouldn't we run them in their own dedicated, quarantined
periodic jobs to avoid polluting the unquarantined tests?
--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/CAPkJ9DvaAsioJvemLJkvPj12U1LwVDWNChCaFBy57fvWNkCkjg%40mail.gmail.com.
I question the need for the unquarantined periodic jobs on main right
now anyway, given the sheer amount of incoming changes. They make
sense on less active branches but could this job ever reveal something
we wouldn't already be seeing on main in PR and merge related runs?


Cheers,
Lee
Best,DanielCheers,
Lee
----Best,Daniel