State of voting lanes for kubernetes 1.35 providers

9 views
Skip to first unread message

Daniel Hiller

unread,
Feb 9, 2026, 6:00:27 AMFeb 9
to kubevirt-dev
Hey all!

The PR for making 1.35 voting [1] is not yet merged. There's two tests that are still failing on the periodics sig-compute for 1.35 [2], which are being tracked [3].
As of Friday the failures were still under investigation [4], now a PR to fix them is on the way [5].

Thanks to everyone involved!


--

Kind regards,


Daniel Hiller

He / Him / His

Principal Software Engineer, KubeVirt CI, OpenShift Virtualization

Red Hat

dhi...@redhat.com   

Red Hat GmbH, Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany  
Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Avril Crosse O'Flaherty  

Federico Fossemo

unread,
Feb 17, 2026, 10:48:04 AM (11 days ago) Feb 17
to Daniel Hiller, kubevirt-dev
Hey Daniel,

Since https://github.com/kubevirt/kubevirt/pull/16758 was merged, we can already see the good result:
https://testgrid.k8s.io/kubevirt-periodics#periodic-kubevirt-e2e-k8s-1.35-sig-compute&width=20

Thank you to anyone involved. We can start moving forward.

Regards,
-FF

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/CAK%2BeyL6mRSiTfD_RMNbG3WVLOKt9Mg4zpU%2Bvt%3DrzXX9x3YhixQ%40mail.gmail.com.

Dan Kenigsberg

unread,
Feb 17, 2026, 10:58:57 AM (11 days ago) Feb 17
to Federico Fossemo, Daniel Hiller, kubevirt-dev
Thank you very much for this progress with 1.35!

Do we have a clue why other lanes are failing so much?
https://kubevirt.io/ci-health/#kubevirtkubevirt
For example, pull-kubevirt-e2e-k8s-1.34-sig-compute-serial is failing 30% on merged PRs

Regards,
Dan.

Federico Fossemo

unread,
Feb 17, 2026, 11:16:46 AM (11 days ago) Feb 17
to Dan Kenigsberg, Daniel Hiller, kubevirt-dev
Hey Dan,

We were discussing this in the last sig-ci meeting[1].
Generally speaking, we observed an increase in the clustered failures.
We discovered a correlation between this increase in serial failures and the making of 1.35 lanes to `always_run: true`.
The root cause is still under investigation, but I think that making the switch to the mandatory 1.35 provider, and 
the deprecation of 1.32 provider lanes will help a lot in this direction.
Right now, we are running 4 provider jobs for each PR.

Hope this helps,

Daniel Hiller

unread,
Feb 17, 2026, 12:35:22 PM (11 days ago) Feb 17
to Federico Fossemo, Dan Kenigsberg, kubevirt-dev
Hey all,

On Tue, Feb 17, 2026 at 5:16 PM Federico Fossemo <ffos...@redhat.com> wrote:
Hey Dan,

We were discussing this in the last sig-ci meeting[1].
Generally speaking, we observed an increase in the clustered failures.
To expand on that: what we are seeing is that all sig-compute periodic lanes show several clustered failures (ex 1.35 [2]), while the presubmit lanes do not show this. I believe there's a correlation of clustered failures on periodics caused by adding the additional 1.35 always_run lanes on top, however there's no strong indication when looking at the graphs.

image.png

Blue lines from left to right:
* 1st marks the always_run: true for 1.35
* 2nd marks the 1st occurrence of a cluster failure on 1.35 sig-compute
* 3rd marks KubeVirtCI bump 1.35 to stable
 
We discovered a correlation between this increase in serial failures and the making of 1.35 lanes to `always_run: true`.
The root cause is still under investigation, but I think that making the switch to the mandatory 1.35 provider, and 
the deprecation of 1.32 provider lanes will help a lot in this direction.

I agree. Removing the old lanes will also give more capacity overall.



--
-- 
Best,
Daniel

Daniel Hiller

unread,
Feb 17, 2026, 12:38:27 PM (11 days ago) Feb 17
to Federico Fossemo, Dan Kenigsberg, kubevirt-dev
On Tue, Feb 17, 2026 at 6:35 PM Daniel Hiller <dhi...@redhat.com> wrote:
Hey all,

On Tue, Feb 17, 2026 at 5:16 PM Federico Fossemo <ffos...@redhat.com> wrote:
Hey Dan,

We were discussing this in the last sig-ci meeting[1].
Generally speaking, we observed an increase in the clustered failures.
To expand on that: what we are seeing is that all sig-compute periodic lanes show several clustered failures (ex 1.35 [2]), while the presubmit lanes do not show this. I believe there's a correlation of clustered failures on periodics caused by adding the additional 1.35 always_run lanes on top, however there's no strong indication when looking at the graphs.

Correction: 1.34 and 1.35 serial presubmit lanes both show clustered failures [3]

image.png

Blue lines from left to right:
* 1st marks the always_run: true for 1.35
* 2nd marks the 1st occurrence of a cluster failure on 1.35 sig-compute
* 3rd marks KubeVirtCI bump 1.35 to stable
 
We discovered a correlation between this increase in serial failures and the making of 1.35 lanes to `always_run: true`.
The root cause is still under investigation, but I think that making the switch to the mandatory 1.35 provider, and 
the deprecation of 1.32 provider lanes will help a lot in this direction.

I agree. Removing the old lanes will also give more capacity overall.




--
-- 
Best,
Daniel

Dan Kenigsberg

unread,
Feb 17, 2026, 3:11:13 PM (11 days ago) Feb 17
to Daniel Hiller, Federico Fossemo, kubevirt-dev
On Tue, Feb 17, 2026 at 7:38 PM Daniel Hiller <dhi...@redhat.com> wrote:


On Tue, Feb 17, 2026 at 6:35 PM Daniel Hiller <dhi...@redhat.com> wrote:
Hey all,

On Tue, Feb 17, 2026 at 5:16 PM Federico Fossemo <ffos...@redhat.com> wrote:
Hey Dan,

We were discussing this in the last sig-ci meeting[1].
Generally speaking, we observed an increase in the clustered failures.
To expand on that: what we are seeing is that all sig-compute periodic lanes show several clustered failures (ex 1.35 [2]), while the presubmit lanes do not show this. I believe there's a correlation of clustered failures on periodics caused by adding the additional 1.35 always_run lanes on top, however there's no strong indication when looking at the graphs.

Correction: 1.34 and 1.35 serial presubmit lanes both show clustered failures [3]

image.png

Blue lines from left to right:
* 1st marks the always_run: true for 1.35
* 2nd marks the 1st occurrence of a cluster failure on 1.35 sig-compute
* 3rd marks KubeVirtCI bump 1.35 to stable
 
We discovered a correlation between this increase in serial failures and the making of 1.35 lanes to `always_run: true`.
The root cause is still under investigation, but I think that making the switch to the mandatory 1.35 provider, and 
the deprecation of 1.32 provider lanes will help a lot in this direction.

I agree. Removing the old lanes will also give more capacity overall.

What holds us from stopping 1.32 on main right now?

Luboslav Pivarc

unread,
Feb 17, 2026, 3:28:05 PM (11 days ago) Feb 17
to Dan Kenigsberg, Daniel Hiller, Federico Fossemo, kubevirt-dev
On Tue, Feb 17, 2026 at 9:11 PM 'Dan Kenigsberg' via kubevirt-dev <kubevi...@googlegroups.com> wrote:


On Tue, Feb 17, 2026 at 7:38 PM Daniel Hiller <dhi...@redhat.com> wrote:


On Tue, Feb 17, 2026 at 6:35 PM Daniel Hiller <dhi...@redhat.com> wrote:
Hey all,

On Tue, Feb 17, 2026 at 5:16 PM Federico Fossemo <ffos...@redhat.com> wrote:
Hey Dan,

We were discussing this in the last sig-ci meeting[1].
Generally speaking, we observed an increase in the clustered failures.
To expand on that: what we are seeing is that all sig-compute periodic lanes show several clustered failures (ex 1.35 [2]), while the presubmit lanes do not show this. I believe there's a correlation of clustered failures on periodics caused by adding the additional 1.35 always_run lanes on top, however there's no strong indication when looking at the graphs.

Correction: 1.34 and 1.35 serial presubmit lanes both show clustered failures [3]

image.png

Blue lines from left to right:
* 1st marks the always_run: true for 1.35
* 2nd marks the 1st occurrence of a cluster failure on 1.35 sig-compute
* 3rd marks KubeVirtCI bump 1.35 to stable
 
We discovered a correlation between this increase in serial failures and the making of 1.35 lanes to `always_run: true`.
The root cause is still under investigation, but I think that making the switch to the mandatory 1.35 provider, and 
the deprecation of 1.32 provider lanes will help a lot in this direction.

I agree. Removing the old lanes will also give more capacity overall.

What holds us from stopping 1.32 on main right now?

That's a very good question...  https://github.com/kubevirt/project-infra/pull/4726

Daniel Hiller

unread,
Feb 18, 2026, 6:12:34 AM (10 days ago) Feb 18
to Luboslav Pivarc, Dan Kenigsberg, Federico Fossemo, kubevirt-dev
--
-- 
Best,
Daniel
Reply all
Reply to author
Forward
0 new messages