When: Weekly on Mon, 10:00 – 10:30am
Notes: KubeVirt CI SIG meeting notes
Attendees: dhiller, ycui, dollierp
Reminders:
we will create GitHub issues for tracking
GitHub issues and PRs
should be marked with /sig ci and /kind flake if applicable
should be marked with the target sig
Topics:
[urgent]
[ycui] 8x: 2026-03-20 13:04:14 +0000 UTC: make: *** [Makefile:176: cluster-down] Error 125 build-log
Nir helped: Both logs show Podman cannot write to /var/lib/shared-images/overlay/ when: 1. Pulling the gocli container image 2. Trying to remove incomplete/pre-existing layer directories 3. The filesystem is mounted read-only This appears to be a widespread infrastructure issue affecting multiple test jobs in PR #16885 at the same time, not a problem with the code changes. All jobs trying to use this shared image storage are hitting the same filesystem permission problem.
Thread: https://redhat-internal.slack.com/archives/C01EX3K1FGE/p1774434905534859
[dhiller] looks like a shared image controller issue - the shared-image-controller removes old images every now and then. Probably a job was just kicked off and then the underlying image was removed
https://github.com/kubevirt/project-infra/blob/f756706fd1e0ad47fe6a29af04a8e7cea3b13555/images/shared-images-controller/main.go#L113
What is next?
[dhiller] increase the waiting time for cleanup so that we ensure no job is probably using it any more
quick look at the e2e job failures:
https://grafana.ci.kubevirt.io/d/efpTS3t4z/e2e-jobs-overview-v2
[timeboxed: 5 mins] revisit previous action items
existing issues opened in last 7 days: https://github.com/search?q=repo%3Akubevirt%2Fproject-infra+is%3Aissue++-label%3Akind%2Fenhancement&type=issues&state=open
[non-urgent]
[dhiller] discuss closing RH internal channels
communicating them in the community makes them available to the whole community
[nir] suggest syncing with Itamar Holder, he was trying to establish a separate channel
[dhiller] discuss communicating issues with CI
proposal: github issues with special labels, ci-issue, priority/critical-urgent
[dhiller] investigate how to create a meaningful initial issue around ci issues (labels, queryable, etc)
[ycui] https://github.com/kubevirt/kubevirt/pull/17307#issuecomment-4151651754
Pull-kubevirt-check-tests-for-flakes - how useful for this flakes test? Do we still need it? Does anyone use it often?
[misc]
Look at held tests:
https://grafana.ci.kubevirt.io/d/uAoSeksSk/referee-retests?orgId=1&refresh=15m&from=now-1h&to=now
is:pr is:open label:approved label:lgtm -label:do-not-merge/hold -label:needs-rebase
is:pr is:open quarantine -label:do-not-merge/work-in-progress label:kind/flake -label:needs-rebase
recently merged PRs authored by SIG CI
8 recently merged PRs authored by SIG CI (query: is:pr is:merged merged:>=2026-03-23 author:dhiller author:dollierp author:whitedyl org:kubevirt”)
kubevirt/project-infra#4879: fix: add coverage manifests to base kustomization (by @dhiller)
kubevirt/project-infra#4874: Add Whitedyl to prow job taskforce (by @Whitedyl)
kubevirt/project-infra#4873: s390x: mark lanes as required after the cluster issue is fixed (by @dollierp)
kubevirt/project-infra#4872: SEV: mark lanes as optional due to persistent failures (by @dollierp)
kubevirt/project-infra#4870: s390x: mark lanes as optional due to Pod scheduling issue (by @dollierp)
kubevirt/kubevirt#17285: Revert "bazel server: reload on network changes" (by @dhiller)
kubevirt/project-infra#4865: fix: make check-provision-k8s-1.35 lane required (by @dhiller)
kubevirt/project-infra#4864: chore(cleanup): remove redundant yq installation (by @dollierp)
kubevirt/ci-health#114: feat: regex-based categorization of CI build failures (by @dhiller)
kubevirt/project-infra#4701: External plugin - coverage (by @Whitedyl)
Action items
Daniel Hiller install alertmanager and add rules for overall failures crossing 20% and for specific failures crossing 40%: https://redhat.atlassian.net/browse/CNV-83235
Daniel Hiller increase the waiting time for cleanup so that we ensure no job is probably using it any more: https://redhat.atlassian.net/browse/CNV-83249
Daniel Hiller investigate how to create a meaningful initial issue around ci issues (labels, queryable, etc): https://redhat.atlassian.net/browse/CNV-83251
proposal: create issue, send mail with tracker issue to kubevirt-dev with meaningful advice on what NOT to do
Daniel Hiller either make pull-kubevirt-check-tests-for-flakes required or remove it: https://groups.google.com/g/kubevirt-dev/c/6jZ-Umjm_RA
how useful for this flakes test? Do we still need it? Does anyone use it often?
Daniel Hiller quarantine PR : https://search.ci.kubevirt.io/?search=%5C%5Bsig-monitoring%5DMonitoring+Deprecation+Alerts+KubeVirtDeprecatedAPIRequested+should+be+triggered+when+a+deprecated+API+is+requested&maxAge=72h&context=1&type=junit&name=&excludeName=periodic-.*&maxMatches=1&maxBytes=20971520&groupBy=job
Daniel Hiller think about how we can add labels to a re-run to determine the reason of rerunning: https://redhat.atlassian.net/browse/CNV-83261
communication
send meeting notes to kubevirt-dev (include meeting changes for upcoming instances)
Kind regards,
Daniel Hiller
He / Him / His
Principal Software Engineer, KubeVirt CI, OpenShift Virtualization
![]() |
Red Hat GmbH, Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht Muenchen/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Avril Crosse O'Flaherty