attempting to finish PR with flaky CI tests

131 views
Skip to first unread message

Jackson Walters

unread,
Jan 30, 2025, 12:35:07 PMJan 30
to sage-...@googlegroups.com
Hello,

I'm attempting to finish two pull requests:


They do depend on one another in that 38455 requires extend=true from 39200. However, right now both PRs seem to be passing all the major "build and test" CI checks, but a few (three for each) appear to be "flaky" in that they don't seem to actually be indicating anything is wrong with the code, nor that there is anything I can change to get them to pass.

Is this just a matter of triggering the checks repeatedly with empty commits until such time that the issues surrounding the offending checks are resolved?

Any help would be appreciated.

Thanks,
Jackson

Dima Pasechnik

unread,
Jan 30, 2025, 12:44:39 PMJan 30
to sage-...@googlegroups.com
You can just check whether CI failures are relevant to your branch,
and if not, write so in comments.
> --
> You received this message because you are subscribed to the Google Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/sage-devel/CAGqtwVDMMMxa45C2kGNThzDWGZSJ936ReCN2ViKHSeS2qngy9A%40mail.gmail.com.

Kwankyu Lee

unread,
Jan 30, 2025, 6:29:33 PMJan 30
to sage-devel
On Friday, January 31, 2025 at 2:44:39 AM UTC+9 dim...@gmail.com wrote:
You can just check whether CI failures are relevant to your branch,
and if not, write so in comments.
 
I think we should adopt a policy that promptly (no excuse) disable any CI check that fails due to reasons unrelated to a PR. Disable and enabling a workflow (CI check) may be done through PRs as well.

The current practice ignoring erroneous CI checks is degrading the quality of developer life.

Tobia...@gmx.de

unread,
Jan 31, 2025, 5:38:06 AMJan 31
to sage-devel
+1 on disabling all doctests that randomly fail. https://github.com/sagemath/sage/pull/39100 might be helpful in discovering these flaky tests.

Do you also want to disable CI workflows that sometimes fail for intrinsic reasons? For example, the Build & Test workflow often fails to push the temporary Docker image that it builds (eg https://github.com/sagemath/sage/actions/runs/13036220999/job/36367766912).

Kwankyu Lee

unread,
Jan 31, 2025, 9:51:28 AMJan 31
to sage-devel
Do you also want to disable CI workflows that sometimes fail for intrinsic reasons? For example, the Build & Test workflow often fails to push the temporary Docker image that it builds (eg https://github.com/sagemath/sage/actions/runs/13036220999/job/36367766912).

No. Just CI workflows that almost constantly fail. The "test-mod" check (recently removed)


would  an example. By the way, it should have been just disabled so that could be enabled after fixed.

Kwankyu Lee

unread,
Feb 3, 2025, 5:35:12 PMFeb 3
to sage-devel
The basic idea is not to fail a CI check if the PR branch is not the cause.

Currently, 

(1) "Build & Test / test-long" sometimes fails. 
(2) It occurs frequently that one of "Build & Test using Conda(Meson) / Conda" checks fails. 

(1) is unfortunate, but this is our main engine checking the PR branch. We should live with it, hoping someone to fix it.

For (2), I suggest to make the checks information only; the check report failed step, but the check itself passes always. 

It seems that this can be implemented by adding

- name: Force job success 
  if: always() 
  run: exit 0 # Ensure job always passes

as the last step of the job for the check.



Tobia...@gmx.de

unread,
Feb 4, 2025, 5:51:19 AMFeb 4
to sage-devel
> I suggest to make the checks information only; the check report failed step, but the check itself passes always.

I don't think this is a good idea. If some tests fail randomly, then that's an issue with the test and not with the "engine". Just fix the test or, if that's too much work for now, disable it.

One simple improvement to the actual stability of the "engine" would be to disable that the "Build & Test" workflow pushes the docker image to the cache. This often fails and is completely unnecessary.

Kwankyu Lee

unread,
Feb 4, 2025, 5:55:21 PMFeb 4
to sage-devel
On Tuesday, February 4, 2025 at 7:51:19 PM UTC+9 Tobia...@gmx.de wrote:
> I suggest to make the checks information only; the check report failed step, but the check itself passes always.

I don't think this is a good idea. If some tests fail randomly, then that's an issue with the test and not with the "engine". Just fix the test or, if that's too much work for now, disable it.

Certainly there is an issue somewhere in code, but the doctest is not to blame. The engine ( "Build & Test using Conda(Meson) / Conda") itself is not faulty either. Fixing the root cause of the unfortunate situation seems too much work for now.

PR authors and reviewers is bearing the harm by the unfortunate situation. My suggestion is to minimize it.
Reply all
Reply to author
Forward
0 new messages