[SIG-ci] Rethinking the /hold policy by kubevirt-commenter-bot

8 views
Skip to first unread message

Felix Matouschek

unread,
Mar 18, 2026, 4:18:47 AM (3 days ago) Mar 18
to kubevirt-dev
Hi all,

I would like to discuss the kubevirt-commenter-bot rule that places a /hold on a PR after five /retest commands on the same commit. While I understand this is meant to discourage blind retesting and save CI resources, I feel it is actually causing PRs to get stuck and increasing our overall CI churn.

It often takes more than five retest commands and by the time CI finally passes, PRs are stuck under the bot's /hold. This is causing them to miss their merge window and by the time someone notices and comments /hold cancel, the PR has often gone stale or fallen behind the base branch. This forces the PR to run the entire testing suite again, defeating the purpose of saving resources.

We should rethink and adjust this behavior. I would love to hear your thoughts on this.

Thanks,
Felix


Felix Matouschek

unread,
Mar 18, 2026, 4:25:33 AM (3 days ago) Mar 18
to kubevirt-dev
Small addition, this is the worst offender I can think of at the moment:


This PR literally took months to get merged

Dan Kenigsberg

unread,
Mar 18, 2026, 4:39:05 AM (3 days ago) Mar 18
to Felix Matouschek, kubevirt-dev
On Wed, Mar 18, 2026 at 10:25 AM 'Felix Matouschek' via kubevirt-dev <kubevi...@googlegroups.com> wrote:
Small addition, this is the worst offender I can think of at the moment:


This PR literally took months to get merged

On Wednesday, March 18, 2026 at 9:18:47 AM UTC+1 Felix Matouschek wrote:
Hi all,

I would like to discuss the kubevirt-commenter-bot rule that places a /hold on a PR after five /retest commands on the same commit. While I understand this is meant to discourage blind retesting and save CI resources, I feel it is actually causing PRs to get stuck and increasing our overall CI churn.

It often takes more than five retest commands and by the time CI finally passes

This is the crux of the problem. We must fix this point by quarantining unstable tests, removing redundant expensive tests, converting e2e tests to unit tests, reusing expensive resources between multiple tests, running e2e tests only after basic `make generate` passes, focusing retry executions only on failing tests, etc.
 
, PRs are stuck under the bot's /hold. This is causing them to miss their merge window and by the time someone notices and comments /hold cancel, the PR has often gone stale or fallen behind the base branch. This forces the PR to run the entire testing suite again, defeating the purpose of saving resources.

We should rethink and adjust this behavior. I would love to hear your thoughts on this.

Thanks,
Felix


--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/188589fd-8d2c-4639-a3ce-a6faa6c75201n%40googlegroups.com.

Luboslav Pivarc

unread,
Mar 18, 2026, 5:09:25 AM (3 days ago) Mar 18
to Dan Kenigsberg, Felix Matouschek, kubevirt-dev
Hey,

On Wed, Mar 18, 2026 at 9:39 AM 'Dan Kenigsberg' via kubevirt-dev <kubevi...@googlegroups.com> wrote:


On Wed, Mar 18, 2026 at 10:25 AM 'Felix Matouschek' via kubevirt-dev <kubevi...@googlegroups.com> wrote:
Small addition, this is the worst offender I can think of at the moment:


This PR literally took months to get merged

On Wednesday, March 18, 2026 at 9:18:47 AM UTC+1 Felix Matouschek wrote:
Hi all,

I would like to discuss the kubevirt-commenter-bot rule that places a /hold on a PR after five /retest commands on the same commit. While I understand this is meant to discourage blind retesting and save CI resources, I feel it is actually causing PRs to get stuck and increasing our overall CI churn.

It often takes more than five retest commands and by the time CI finally passes

This is the crux of the problem. We must fix this point by quarantining unstable tests, removing redundant expensive tests, converting e2e tests to unit tests, reusing expensive resources between multiple tests, running e2e tests only after basic `make generate` passes, focusing retry executions only on failing tests, etc.

Most importantly, fix the tests and production code in a timely manner!

-Lubo
 
 
, PRs are stuck under the bot's /hold. This is causing them to miss their merge window and by the time someone notices and comments /hold cancel, the PR has often gone stale or fallen behind the base branch. This forces the PR to run the entire testing suite again, defeating the purpose of saving resources.

We should rethink and adjust this behavior. I would love to hear your thoughts on this.

Thanks,
Felix


--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/188589fd-8d2c-4639-a3ce-a6faa6c75201n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Or Shoval

unread,
Mar 18, 2026, 5:14:09 AM (3 days ago) Mar 18
to Luboslav Pivarc, Dan Kenigsberg, Felix Matouschek, kubevirt-dev
On Wed, Mar 18, 2026 at 11:09 AM 'Luboslav Pivarc' via kubevirt-dev <kubevi...@googlegroups.com> wrote:
Hey,

On Wed, Mar 18, 2026 at 9:39 AM 'Dan Kenigsberg' via kubevirt-dev <kubevi...@googlegroups.com> wrote:


On Wed, Mar 18, 2026 at 10:25 AM 'Felix Matouschek' via kubevirt-dev <kubevi...@googlegroups.com> wrote:
Small addition, this is the worst offender I can think of at the moment:


This PR literally took months to get merged

On Wednesday, March 18, 2026 at 9:18:47 AM UTC+1 Felix Matouschek wrote:
Hi all,

I would like to discuss the kubevirt-commenter-bot rule that places a /hold on a PR after five /retest commands on the same commit. While I understand this is meant to discourage blind retesting and save CI resources, I feel it is actually causing PRs to get stuck and increasing our overall CI churn.

It often takes more than five retest commands and by the time CI finally passes

This is the crux of the problem. We must fix this point by quarantining unstable tests, removing redundant expensive tests, converting e2e tests to unit tests, reusing expensive resources between multiple tests, running e2e tests only after basic `make generate` passes, focusing retry executions only on failing tests, etc.

Most importantly, fix the tests and production code in a timely manner!

-Lubo

Hi, I agree about the benefit and the problem with the hold bot.

To create a win-win situation, we can consider to introduce please a maintainer-approved label 
(e.g., /wave-hold-bot) that maintainers can apply when a PR is known to be safe and unrelated to the flakes that caused the bot to hold it.
It will stop the rehold mess, saving precious developer and CI resources.

 
 
, PRs are stuck under the bot's /hold. This is causing them to miss their merge window and by the time someone notices and comments /hold cancel, the PR has often gone stale or fallen behind the base branch. This forces the PR to run the entire testing suite again, defeating the purpose of saving resources.

We should rethink and adjust this behavior. I would love to hear your thoughts on this.

Thanks,
Felix


--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/188589fd-8d2c-4639-a3ce-a6faa6c75201n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/CAHOEP57ssy05NKNtYY9dMsU2ZXazZ0Ls2x4ik%2BDUDvEKfDnfJQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Daniel Hiller

unread,
Mar 18, 2026, 6:36:38 AM (3 days ago) Mar 18
to Or Shoval, Luboslav Pivarc, Dan Kenigsberg, Felix Matouschek, kubevirt-dev
Hey all,

On Wed, Mar 18, 2026 at 10:14 AM 'Or Shoval' via kubevirt-dev <kubevi...@googlegroups.com> wrote:


On Wed, Mar 18, 2026 at 11:09 AM 'Luboslav Pivarc' via kubevirt-dev <kubevi...@googlegroups.com> wrote:
Hey,

On Wed, Mar 18, 2026 at 9:39 AM 'Dan Kenigsberg' via kubevirt-dev <kubevi...@googlegroups.com> wrote:


On Wed, Mar 18, 2026 at 10:25 AM 'Felix Matouschek' via kubevirt-dev <kubevi...@googlegroups.com> wrote:
Small addition, this is the worst offender I can think of at the moment:


This PR literally took months to get merged

On Wednesday, March 18, 2026 at 9:18:47 AM UTC+1 Felix Matouschek wrote:
Hi all,

I would like to discuss the kubevirt-commenter-bot rule that places a /hold on a PR after five /retest commands on the same commit. While I understand this is meant to discourage blind retesting and save CI resources, I feel it is actually causing PRs to get stuck and increasing our overall CI churn.

It often takes more than five retest commands and by the time CI finally passes

This is the crux of the problem. We must fix this point by quarantining unstable tests, removing redundant expensive tests, converting e2e tests to unit tests, reusing expensive resources between multiple tests, running e2e tests only after basic `make generate` passes, focusing retry executions only on failing tests, etc.

Most importantly, fix the tests and production code in a timely manner!

-Lubo

Hi, I agree about the benefit and the problem with the hold bot.

To create a win-win situation, we can consider to introduce please a maintainer-approved label 
(e.g., /wave-hold-bot) that maintainers can apply when a PR is known to be safe and unrelated to the flakes that caused the bot to hold it.
It will stop the rehold mess, saving precious developer and CI resources.
I find it very interesting that noone is talking about the fact that everyone is quietly accepting flaky tests and instead blaming the CI system of PRs not getting in, where it should be the highest priority fixing the flakes?

Probably unpopular opinion, but how about we entirely remove the automated retest trigger and leave that up to the contributors to manually trigger retesting? Then there would be no need for the bot to even place a hold on a PR. Also this would increase pressure working on test stabilization.

FTR, Kubernetes has removed automated retry in end of 2019: https://groups.google.com/a/kubernetes.io/g/dev/c/Twq7YWHrFOE
 

 
 
, PRs are stuck under the bot's /hold. This is causing them to miss their merge window and by the time someone notices and comments /hold cancel, the PR has often gone stale or fallen behind the base branch. This forces the PR to run the entire testing suite again, defeating the purpose of saving resources.

We should rethink and adjust this behavior. I would love to hear your thoughts on this.

Thanks,
Felix


--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/188589fd-8d2c-4639-a3ce-a6faa6c75201n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/CAHOEP57ssy05NKNtYY9dMsU2ZXazZ0Ls2x4ik%2BDUDvEKfDnfJQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubevirt-dev/CAFTR0GWDuZt4fbg5178dwqYcJeM%2B-opr-uqiV6U0y6CXXyoG_g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages