Windows Tests failing on Timeout

59 views
Skip to first unread message

Bryan Stopp

unread,
Jun 4, 2021, 12:24:48 PM6/4/21
to Jenkins Developers
Hello all,

Looking for some advice/help here. I've added a bunch of tests for my plugin, but it seems that on Windows I get random failures of my restart test cases. It doesn't happen during the Linux runs.

Essentially I'm validating that my custom steps come up clean from a reboot and then handle expected interactions correctly, and continue the pipeline.

The failures are intermittent, and randomly occur on different reboot tests. I have a number of them in the plugin. At first i was getting errors because I wasn't waiting for the test Jenkins to finish its boot. But now that i protected against that, I'm getting timeouts.

Two of the latest runs which show the randomness of the failures are here and here.

Does anyone have any idea what could cause this, and is there a way for me to adjust my tests to account for the time? Is this something that i can adjust on a Jenkins CI build, or will it be ignored?

Thanks in advance everyone!

-Bryan

Bryan Stopp

unread,
Jun 4, 2021, 2:05:33 PM6/4/21
to Jenkins Developers
Follow-up:  I found a way to extend the timeout for a test (WithTimout annotation), and bumped it up to 5 minutes. However, it does not solve the issues.

Any help is much appreciated.

-Stopp

Jonathan Mackenzie

unread,
Jun 4, 2021, 3:06:59 PM6/4/21
to jenkin...@googlegroups.com
The timeout is set in the Builder object.
You may be suffering from a Windows process create time issue. I assume you are running this on the Jenkins server itself and it has to restart then spin up the agent and connect to it. What is the error seen after you added WithTimeout annotation?

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/3824aa9f-d1df-4f82-9521-912fd70ae318n%40googlegroups.com.

Jonathan Mackenzie

unread,
Jun 4, 2021, 3:22:34 PM6/4/21
to jenkin...@googlegroups.com
NVM, found the log. Saw that you increased the test timeout, but this was still the same error as before:

hudson.model.Queue$WaitingItem:ExecutorStepExecution.PlaceholderTask{runId=test#1,label=,context=CpsStepContext[3:node]:Owner[test/1:test #1],cookie=a2fda7d9-72d9-4431-9b08-fd737441c0e5,auth=null}:38 after waiting for 15,000 ms because we assume unknown Node master is never going to appear!


Jonathan Mackenzie

unread,
Jun 4, 2021, 4:24:08 PM6/4/21
to jenkin...@googlegroups.com
Turns out this timeout is configurable via a system property but not if it is a unit test, unfortunately. I looked at https://github.com/jenkinsci/workflow-durable-task-step-plugin/blame/master/src/main/java/org/jenkinsci/plugins/workflow/support/pickles/ExecutorPickle.java
and on line 71 there is
public static long TIMEOUT_WAITING_FOR_NODE_MILLIS = Main.isUnitTest ? /* fail faster */ TimeUnit.SECONDS.toMillis(15) : Long.getLong(ExecutorPickle.class.getName()+".timeoutForNodeMillis", TimeUnit.MINUTES.toMillis(5));

Bryan Stopp

unread,
Jun 4, 2021, 5:07:39 PM6/4/21
to Jenkins Developers
Yeah, I'd hate to have to disable all windows validation because my restart validation tests can't get a test runner in 15 seconds after a reboot. Seems like a limitation of the testing framework...

-Stopp

Basil Crow

unread,
Jun 7, 2021, 11:44:47 AM6/7/21
to jenkin...@googlegroups.com
On Fri, Jun 4, 2021 at 2:07 PM Bryan Stopp <bryan...@gmail.com> wrote:
>
> Yeah, I'd hate to have to disable all windows validation because my restart validation tests can't get a test runner in 15 seconds after a reboot. Seems like a limitation of the testing framework...

I've similarly given up and disabled Windows testing. There _is_ a
problem with multiple JVMs consuming all the memory on the
system/container, as described in a previous thread [1]. Solving that
problem requires tuning the JVM -Xmx and -Xms settings for the agent
Java process running the build, the main Maven Java process, the Maven
Surefire Java process, and/or the agent Java process(es) spawned by
tests themselves to all fit within the total memory constraint of the
system/container (or spending more money on larger instances). To my
knowledge nobody has done this yet. I don't have the time to become a
Jenkins infrastructure developer, so I just disabled Windows testing
for Email Extension. (Windows testing seems to be disabled for Jenkins
core as well.)

[1] https://groups.google.com/g/jenkinsci-dev/c/qF96iWjIwjw/m/iND6ONZtAgAJ

Jesse Glick

unread,
Jun 7, 2021, 12:08:46 PM6/7/21
to Jenkins Dev
On Fri, Jun 4, 2021 at 12:24 PM Bryan Stopp <bryan...@gmail.com> wrote:
Two of the latest runs which show the randomness of the failures are here and here.

Now 404s; Jenkins build records get discarded after the PR is merged. A permalink which I guess captures your error:


Does not look familiar to me. The timestamps in the test log do not match the error message which claims that it waited 15s. Nor does the error make sense: if your test has not called `Jenkins.setExecutors`, the rehydration task should not be waiting at all for the `master` label which is always present (the code in question is designed to wait for an agent to reconnect).
Reply all
Reply to author
Forward
0 new messages