[JIRA] (JENKINS-55356) Some workflow jobs fail after restart on Java 11 server

2 views
Skip to first unread message

mark.earl.waite@gmail.com (JIRA)

unread,
Dec 28, 2018, 4:21:02 PM12/28/18
to jenkinsc...@googlegroups.com
Mark Waite created an issue
 
Jenkins / Bug JENKINS-55356
Some workflow jobs fail after restart on Java 11 server
Issue Type: Bug Bug
Assignee: Unassigned
Components: workflow-job-plugin
Created: 2018-12-28 21:20
Environment: Jenkins JDK 11 docker image and current plugins
Multibranch Pipeline builds for git plugin, git client plugin, and platformlabeler plugin
Priority: Minor Minor
Reporter: Mark Waite

While running Java 11 based Jenkins in a docker container using a pre-release of the workflow support plugin which includes the fix for the null pointer exception, a

jenkins-url/safeRestart

will cause several of the Pipeline jobs that were running to fail when Jenkins tries to resume the jobs.

Build log output from the failed builds has included messages like (seems to be windows specific):

07:40:50 [INFO] Running org.jenkinsci.plugins.gitclient.PushTest
Resuming build at Fri Dec 28 07:48:06 MST 2018 after Jenkins restart
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #6: Waiting for next available executor on ‘coleen-pc2-ssh’
Ready to run at Fri Dec 28 07:48:49 MST 2018
07:48:49 Timeout set to expire in 17 min
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
07:48:50 Failed in branch windows-8-2.150.1
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] End of Pipeline
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on coleen-pc2-ssh

and this (seems to fail on Windows and on Linux):

08:48:33 [INFO] --------------------------------[ hpi ]---------------------------------
Resuming build at Fri Dec 28 08:51:46 MST 2018 after Jenkins restart
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #7: Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #7: ???
???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #7: ???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #7: ???
Ready to run at Fri Dec 28 08:52:07 MST 2018
08:52:07 Timeout set to expire in 54 min
08:52:07 Timeout set to expire in 54 min
08:52:07 Timeout set to expire in 54 min
08:52:07 Timeout set to expire in 54 min
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] }
[Pipeline] // stage
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] }
[Pipeline] // timeout
[Pipeline] // stage
[Pipeline] }
[Pipeline] }
[Pipeline] // node
[Pipeline] // timeout
[Pipeline] }
08:52:08 Failed in branch windows-8
[Pipeline] }
[Pipeline] // node
[Pipeline] }
08:52:08 Failed in branch windows-8-2.150.1
08:57:14 process apparently never started in /home/mwaite/testing-a.markwaite.net-agent/workspace/der_git-client-pipeline_beta-3.0@tmp/durable-3af9d572
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
08:57:14 process apparently never started in /home/mwaite/testing-a.markwaite.net-agent/workspace/der_git-client-pipeline_beta-3.0@tmp/durable-1ba13955
[Pipeline] // node
[Pipeline] }
08:57:14 Failed in branch linux-8
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
08:57:14 Failed in branch linux-8-2.150.1
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] End of Pipeline
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on coleen-pc2-ssh

and this (windows and linux):

09:12:13 [INFO] --- maven-help-plugin:3.1.1:evaluate (default-cli) @ git-client ---
Resuming build at Fri Dec 28 09:13:35 MST 2018 after Jenkins restart
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #8: ???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #8: ???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #8: ???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #8: ???
Ready to run at Fri Dec 28 09:13:51 MST 2018
09:13:51 Timeout set to expire in 55 min
09:13:51 Timeout set to expire in 55 min
09:13:51 Timeout set to expire in 55 min
09:13:51 Timeout set to expire in 55 min
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] }
[Pipeline] }
[Pipeline] // stage
[Pipeline] // withEnv
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] }
[Pipeline] }
[Pipeline] // timeout
[Pipeline] // stage
[Pipeline] // stage
[Pipeline] }
[Pipeline] }
[Pipeline] }
[Pipeline] // timeout
[Pipeline] // node
[Pipeline] // timeout
[Pipeline] }
09:13:52 Failed in branch windows-8-2.150.1
[Pipeline] }
[Pipeline] }
[Pipeline] // node
[Pipeline] // node
[Pipeline] }
09:13:52 Failed in branch linux-8
[Pipeline] }
09:13:52 Failed in branch windows-8
09:18:58 process apparently never started in /home/mwaite/testing-a.markwaite.net-agent/workspace/r_git-client-pipeline_beta-3.0_2@tmp/durable-9d0a69fe
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
09:18:58 Failed in branch linux-8-2.150.1
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] End of Pipeline
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on mark-pc4-ssh

The process never started message and the missing workspace message are visible in both the failed git client plugin builds and in the failed git plugin builds.

I don't know if the problem can be duplicated on a Java 8 environment. Will try that soon

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

mark.earl.waite@gmail.com (JIRA)

unread,
Dec 28, 2018, 4:22:02 PM12/28/18
to jenkinsc...@googlegroups.com
Mark Waite updated an issue
Change By: Mark Waite
Labels: java11-compatibility

mark.earl.waite@gmail.com (JIRA)

unread,
Dec 28, 2018, 6:06:02 PM12/28/18
to jenkinsc...@googlegroups.com
Mark Waite updated an issue
While running Java 11 based Jenkins in a docker container using a pre-release of the workflow support plugin which includes the fix for the null pointer exception, a
{code}
jenkins-url/safeRestart
{code}

will cause several of the Pipeline jobs that were running to fail when Jenkins tries to resume the jobs.

Build log output from the failed builds has included messages like (seems to be windows specific):

{noformat}

07:40:50 [INFO] Running org.jenkinsci.plugins.gitclient.PushTest
Resuming build at Fri Dec 28 07:48:06 MST 2018 after Jenkins restart
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #6: Waiting for next available executor on ‘coleen-pc2-ssh’
Ready to run at Fri Dec 28 07:48:49 MST 2018
07:48:49 Timeout set to expire in 17 min
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
07:48:50 Failed in branch windows-8-2.150.1
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] End of Pipeline
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on coleen-pc2-ssh
{noformat}


and this (seems to fail on Windows and on Linux):

{noformat}
{noformat}


and this (windows and linux):

{noformat}
{noformat}


The {{process never started}} message and the {{missing workspace}} message are visible in both the failed git client plugin builds and in the failed git plugin builds.

I don't know if the The problem can be duplicated does not seem to repeat on a Java 8 environment , just on a Java 11 environment .   Will try that soon   More experiments to come.

batmat@batmat.net (JIRA)

unread,
Dec 31, 2018, 2:45:02 AM12/31/18
to jenkinsc...@googlegroups.com
Baptiste Mathus commented on Bug JENKINS-55356
 
Re: Some workflow jobs fail after restart on Java 11 server

Mark Waite could you please also check the Jenkins logs to see if there's more there? Specifically I'm curious if this could actually be some kind of duplicate of JENKINS-55174.

Also: could you test again with the fix provided (not yet merged) by Devin Nusbaum in https://github.com/jenkinsci/workflow-support-plugin/pull/86?
You can the Incremental release of this PR available at https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/plugins/workflow/workflow-support/3.0-java11-alpha-2-rc689.90369f68d9e9/workflow-support-3.0-java11-alpha-2-rc689.90369f68d9e9.hpi

Thank you!

mark.earl.waite@gmail.com (JIRA)

unread,
Dec 31, 2018, 8:33:02 AM12/31/18
to jenkinsc...@googlegroups.com

Baptiste Mathus thanks for asking!

The details in this report are using the incremental build from the pull request that Devin Nusbaum had provided. The results with that pull request are much better than without it. Without that pull request I see null pointer exception messages in the Jenkins logs on restart. The null pointer exceptions cause the jobs to fail and are described in JENKINS-55338. JENKINS-55338 is a duplicate of JENKINS-55174.

There are no additional entries in the Jenkins log related to the problem as far as I can tell.

I think PR-86 should be merged into the workflow support plugin for Java 11. It improves the results by removing the null pointer exception and moves the code onto a newer JBoss marshalling release.

dnusbaum@cloudbees.com (JIRA)

unread,
Jan 2, 2019, 9:17:02 AM1/2/19
to jenkinsc...@googlegroups.com

It looks like the root failure in all cases is related to the "ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on coleen-pc2-ssh" error message. It think that error message comes from workflow-durable-task-step, and happens if the workspace for a job is not a directory. Maybe this is a bug with the workspace-related changes in recent versions of branch-api?

Mark Waite Can you add the full list of plugins installed in Jenkins and their versions to the description? It would also be great to see the Pipeline jobs that these log snippets come from.

batmat@batmat.net (JIRA)

unread,
Jan 2, 2019, 11:01:02 AM1/2/19
to jenkinsc...@googlegroups.com

@vivek can you please plan analyzing this issue? Thanks!

batmat@batmat.net (JIRA)

unread,
Jan 2, 2019, 11:01:03 AM1/2/19
to jenkinsc...@googlegroups.com
Baptiste Mathus edited a comment on Bug JENKINS-55356
@ [~ vivek ] can you please plan analyzing this issue? Thanks!

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 2, 2019, 2:14:02 PM1/2/19
to jenkinsc...@googlegroups.com
Mark Waite updated an issue
Change By: Mark Waite
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on coleen-pc2-ssh
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on coleen-pc2-ssh
The problem does * not * seem to repeat on a Java 8 environment, just on a Java 11 environment.   
The problem does *not* seem to repeat on a Java 11 environment running on a larger computer.  The failing computer has 8 GB RAM with an older Intel i5 processor, while the passing computer has 32 GB RAM and a newer Intel i5 processor.

  More experiments research.  The [Docker image|https://github.com/MarkEWaite/docker-lfs/tree/30517c315d6dc052e3f88a749834891bdc7c5725/ref/plugins] includes all the plugins that were used in the failure case.  However, I haven't yet been able to come duplicate the problem on any other machine .   More investigation soon.

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 2, 2019, 2:28:03 PM1/2/19
to jenkinsc...@googlegroups.com
Mark Waite commented on Bug JENKINS-55356
 
Re: Some workflow jobs fail after restart on Java 11 server

Devin Nusbaum the link to the list of plugins is a list of plugin files committed to the git repository at the time I ran the test. The Pipeline job definitions are

Both those Jenkinsfile examples are buildPlugin with a few options. The second example includes a long definition of some PCT / ATH code that is not executed.

vivek.pandey@gmail.com (JIRA)

unread,
Jan 2, 2019, 3:49:01 PM1/2/19
to jenkinsc...@googlegroups.com

batmat@batmat.net (JIRA)

unread,
Jan 2, 2019, 4:15:02 PM1/2/19
to jenkinsc...@googlegroups.com

batmat@batmat.net (JIRA)

unread,
Jan 2, 2019, 4:15:02 PM1/2/19
to jenkinsc...@googlegroups.com
Baptiste Mathus started work on Bug JENKINS-55356
 
Change By: Baptiste Mathus
Status: Open In Progress

batmat@batmat.net (JIRA)

unread,
Jan 2, 2019, 4:15:02 PM1/2/19
to jenkinsc...@googlegroups.com

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 2, 2019, 4:51:03 PM1/2/19
to jenkinsc...@googlegroups.com
Mark Waite updated an issue
Change By: Mark Waite
The problem does *not* seem to repeat on a Java 8 environment, just on a Java 11 environment.  
The problem does *
not* seem to repeat less frequently on a Java 11 environment running on a larger computer.  The failing computer has 8 GB RAM with an older Intel i5 processor, while the passing less frequently failing computer has 32 GB RAM and a newer Intel i5 processor.   The 32 GB machine has shown the failure only once.  That failure was during a restart while the agents and the server were very busy.

More research.   The [Docker image|https://github.com/MarkEWaite/docker-lfs/tree/30517c315d6dc052e3f88a749834891bdc7c5725/ref/plugins] includes all the plugins that were used in the failure case.  However, I haven't yet been able to duplicate the problem on any other machine.  More investigation soon.

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 2, 2019, 4:52:02 PM1/2/19
to jenkinsc...@googlegroups.com
Mark Waite commented on Bug JENKINS-55356
 
Re: Some workflow jobs fail after restart on Java 11 server

When the 32 GB i5 machine failed, the message in the failing build log was similar to other Windows "workspace not found" messages:

14:31:36 [INFO] ------------------------------------------------------------------------
Resuming build at Wed Jan 02 14:33:19 MST 2019 after Jenkins restart
Waiting to resume part of Git Plugin Folder » Git Branches - Jenkinsfile (GitHub) » add-tag-action-test #10: ???
Waiting to resume part of Git Plugin Folder » Git Branches - Jenkinsfile (GitHub) » add-tag-action-test #10: ???
Waiting to resume part of Git Plugin Folder » Git Branches - Jenkinsfile (GitHub) » add-tag-action-test #10: ???
Waiting to resume part of Git Plugin Folder » Git Branches - Jenkinsfile (GitHub) » add-tag-action-test #10: ???
Ready to run at Wed Jan 02 14:33:53 MST 2019
14:33:53 Timeout set to expire in 45 min
14:33:53 Timeout set to expire in 45 min
14:33:53 Timeout set to expire in 45 min
14:33:53 Timeout set to expire in 45 min
[Pipeline] }
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] // stage
[Pipeline] }
[Pipeline] }
[Pipeline] }
[Pipeline] // timeout
[Pipeline] // withEnv
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Archive (linux-8-2.150.1))
[Pipeline] }
[Pipeline] }
[Pipeline] junit
[Pipeline] // node
14:33:54 Recording test results
[Pipeline] // stage
[Pipeline] }
14:33:54 Failed in branch linux-8
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
14:33:55 Failed in branch windows-8-2.150.1
14:33:55 No test report files were found. Configuration error?
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
14:33:55 Failed in branch linux-8-2.150.1
Resuming build at Wed Jan 02 14:44:08 MST 2019 after Jenkins restart
Waiting to resume part of Git Plugin Folder » Git Branches - Jenkinsfile (GitHub) » add-tag-action-test #10: ???
Waiting to resume part of Git Plugin Folder » Git Branches - Jenkinsfile (GitHub) » add-tag-action-test #10: Waiting for next available executor on ‘mark-pc4-ssh’
Ready to run at Wed Jan 02 14:44:29 MST 2019
14:44:29 Timeout set to expire in 35 min
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
14:44:29 Failed in branch windows-8
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] End of Pipeline

GitHub has been notified of this commit’s build result

Command Close created at
	at hudson.remoting.Command.<init>(Command.java:68)
	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1267)
	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1265)
	at hudson.remoting.Channel.close(Channel.java:1438)
	at hudson.remoting.Channel.close(Channel.java:1405)
	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1272)
Caused: hudson.remoting.Channel$OrderlyShutdown
Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to aws-ubuntu-18-a
		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1743)
		at hudson.remoting.Request.call(Request.java:202)
		at hudson.remoting.Channel.call(Channel.java:956)
		at hudson.FilePath.act(FilePath.java:1072)
		at hudson.FilePath.act(FilePath.java:1061)
		at hudson.plugins.findbugs.FindBugsPublisher.perform(FindBugsPublisher.java:144)
		at hudson.plugins.analysis.core.HealthAwarePublisher.perform(HealthAwarePublisher.java:69)
		at hudson.plugins.analysis.core.HealthAwareRecorder.perform(HealthAwareRecorder.java:298)
		at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:80)
		at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67)
		at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:51)
		at hudson.security.ACL.impersonate(ACL.java:290)
		at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:48)
		at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
		at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
		at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
		at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
		at java.base/java.lang.Thread.run(Thread.java:834)
Also:   hudson.AbortException: missing workspace C:\J\T\workspace\eline-github_add-tag-action-test on mark-pc3-ssh
		at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.getWorkspace(DurableTaskStep.java:321)
		at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:443)
		at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:426)
		at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
		at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
		at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
		at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
		at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
		at java.base/java.lang.Thread.run(Thread.java:834)
Also:   	Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to testing-a-mwaite
			at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1743)
			at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
			at hudson.remoting.Channel.call(Channel.java:957)
			at hudson.FilePath.act(FilePath.java:1072)
			at hudson.FilePath.act(FilePath.java:1061)
			at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:114)
			at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:137)
			at hudson.tasks.junit.JUnitResultArchiver.parseAndAttach(JUnitResultArchiver.java:167)
			at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:50)
			at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:23)
			at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:51)
			at hudson.security.ACL.impersonate(ACL.java:290)
			at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:48)
			at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
hudson.AbortException: No test report files were found. Configuration error?
		at hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:154)
		at hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:118)
		at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3086)
		at hudson.remoting.UserRequest.perform(UserRequest.java:212)
		at hudson.remoting.UserRequest.perform(UserRequest.java:54)
		at hudson.remoting.Request$2.run(Request.java:369)
		at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
		at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
		at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
		at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
		at java.base/java.lang.Thread.run(Thread.java:834)
Also:   hudson.AbortException: missing workspace C:\J\T\workspace\eline-github_add-tag-action-test on mark-pc4-ssh
		at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.getWorkspace(DurableTaskStep.java:321)
		at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:443)
		at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:426)
		at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
		at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
		at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
		at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
		at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
		at java.base/java.lang.Thread.run(Thread.java:834)
Caused: hudson.remoting.RequestAbortedException
	at hudson.remoting.Request.abort(Request.java:340)
	at hudson.remoting.Channel.terminate(Channel.java:1040)
	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1273)
	at hudson.remoting.Channel$1.handle(Channel.java:565)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:85)
Finished: FAILURE

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 2, 2019, 5:32:02 PM1/2/19
to jenkinsc...@googlegroups.com
Mark Waite edited a comment on Bug JENKINS-55356
Failures seem to be more frequent when the master is heavily loaded.  I've seen multiple failures on my 32 GB machine now.  Will continue exploring to see if I can identify common failure patterns.

When the 32 GB i5 machine failed, the message in the failing build log was similar to other Windows "workspace not found" messages:

{noformat}
{noformat}

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 17, 2019, 5:00:02 PM1/17/19
to jenkinsc...@googlegroups.com

It seems that workflow-support 3.0 and 3.1 did not address this. I still see the following report in the Windows agent on my Java 11 server when I safeRestart the Jenkins server during Pipeline execution:

[INFO] -------------------------------------------------------
[INFO] Running InjectedTest
[INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.181 s - in InjectedTest
[INFO] Running hudson.plugins.git.BranchTest
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.374 s - in hudson.plugins.git.BranchTest
[INFO] Running hudson.plugins.git.GitAPIBadInitTest
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.113 s - in hudson.plugins.git.GitAPIBadInitTest
[INFO] Running hudson.plugins.git.GitExceptionTest
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.534 s - in hudson.plugins.git.GitExceptionTest
[INFO] Running hudson.plugins.git.GitLockFailedExceptionTest
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 s - in hudson.plugins.git.GitLockFailedExceptionTest
[INFO] Running hudson.plugins.git.GitObjectTest
[INFO] Tests run: 48, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.297 s - in hudson.plugins.git.GitObjectTest
[INFO] Running hudson.plugins.git.GitToolResolverTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.028 s - in hudson.plugins.git.GitToolResolverTest
[INFO] Running hudson.plugins.git.GitToolTest
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.005 s - in hudson.plugins.git.GitToolTest
[INFO] Running hudson.plugins.git.IndexEntryTest
[INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 s - in hudson.plugins.git.IndexEntryTest
[INFO] Running hudson.plugins.git.RevisionTest
[INFO] Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 s - in hudson.plugins.git.RevisionTest
[INFO] Running hudson.plugins.git.TagTest
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 s - in hudson.plugins.git.TagTest
[INFO] Running org.jenkinsci.plugins.gitclient.CliGitAPIImplTest
missing workspace C:\J\T\workspace\it-client-pipeline-github_master on mark-pc3-ssh

dnusbaum@cloudbees.com (JIRA)

unread,
Jan 17, 2019, 5:16:02 PM1/17/19
to jenkinsc...@googlegroups.com

Yes, there are no changes in any recent Pipeline plugin release I am aware of that would explain or fix this issue. I investigated a few weeks ago but was not able to create a reproduction case. The error message comes from workflow-durable-task-step. It would be interesting to check whether the directory mentioned in the error actually exists on the agent, or if it is missing, or exists but has a different name (perhaps a randomized suffix or `-` instead of `_` or something, which could be related to recent branch-api changes).

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 17, 2019, 5:48:02 PM1/17/19
to jenkinsc...@googlegroups.com

That directory exists and looks like a typical workspace.

it does contain a directory named .@tmp which was a surprise for me. That directory contains 3 files that with names that start with 'jenkins-gitclient-permission'. I assume some form of temporary file, but don't recognize them as anything specific.

batmat@batmat.net (JIRA)

unread,
Jan 21, 2019, 12:07:02 PM1/21/19
to jenkinsc...@googlegroups.com
Baptiste Mathus updated an issue
 
Change By: Baptiste Mathus
Labels: java11-compatibility scrub triaged

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 21, 2019, 12:22:03 PM1/21/19
to jenkinsc...@googlegroups.com
Mark Waite updated an issue
Change By: Mark Waite
While running Java 11 based Jenkins in a docker container using a pre-release of the workflow support plugin which includes the fix for the null pointer exception, a
{code}
jenkins-url/safeRestart
{code}
will cause several of the Pipeline jobs that were running to fail when Jenkins tries to resume the jobs.

Build log output from the failed builds has included messages like (seems to be more common in windows specific , but visible in Linux agents as well ):


{noformat}
07:40:50 [INFO] Running org.jenkinsci.plugins.gitclient.PushTest
Resuming build at Fri Dec 28 07:48:06 MST 2018 after Jenkins restart
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #6: Waiting for next available executor on ‘coleen-pc2-ssh’
Ready to run at Fri Dec 28 07:48:49 MST 2018
07:48:49 Timeout set to expire in 17 min

[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
07:48:50 Failed in branch windows-8-2.150.1
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] End of Pipeline
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on coleen-pc2-ssh
{noformat}

and this (seems to fail on Windows and on Linux):

{noformat}
08:48:33 [INFO] --------------------------------[ hpi ]---------------------------------
Resuming build at Fri Dec 28 08:51:46 MST 2018 after Jenkins restart
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #7: Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #7: ???
???

Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #7: ???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #7: ???
Ready to run at Fri Dec 28 08:52:07 MST 2018
08:52:07 Timeout set to expire in 54 min
08:52:07 Timeout set to expire in 54 min
08:52:07 Timeout set to expire in 54 min
08:52:07 Timeout set to expire in 54 min

[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] }
[Pipeline] // stage

[Pipeline] // withEnv
[Pipeline] }
[Pipeline] }
[Pipeline] // timeout
[Pipeline] // stage
[Pipeline] }
[Pipeline] }
[Pipeline] // node
[Pipeline] // timeout
[Pipeline] }
08:52:08 Failed in branch windows-8
[Pipeline] }

[Pipeline] // node
[Pipeline] }
08:52:08 Failed in branch windows-8-2.150.1
08:57:14 process apparently never started in /home/mwaite/testing-a.markwaite.net-agent/workspace/der_git-client-pipeline_beta-3.0@tmp/durable-3af9d572
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
08:57:14 process apparently never started in /home/mwaite/testing-a.markwaite.net-agent/workspace/der_git-client-pipeline_beta-3.0@tmp/durable-1ba13955
[Pipeline] // node
[Pipeline] }
08:57:14 Failed in branch linux-8
[Pipeline] }

[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
08:57:14 Failed in branch linux-8-2.150.1
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] End of Pipeline
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on coleen-pc2-ssh
{noformat}

and this (windows and linux):

{noformat}
09:12:13 [INFO] --- maven-help-plugin:3.1.1:evaluate (default-cli) @ git-client ---
Resuming build at Fri Dec 28 09:13:35 MST 2018 after Jenkins restart
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #8: ???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #8: ???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #8: ???
Waiting to resume part of Git Client Plugin Folder » Git Client Branches - Jenkinsfile » beta-3.0 #8: ???
Ready to run at Fri Dec 28 09:13:51 MST 2018
09:13:51 Timeout set to expire in 55 min
09:13:51 Timeout set to expire in 55 min
09:13:51 Timeout set to expire in 55 min
09:13:51 Timeout set to expire in 55 min

[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] }
[Pipeline] }
[Pipeline] // stage
[Pipeline] // withEnv

[Pipeline] // withEnv
[Pipeline] }
[Pipeline] }
[Pipeline] }
[Pipeline] // timeout
[Pipeline] // stage

[Pipeline] // stage
[Pipeline] }
[Pipeline] }
[Pipeline] }
[Pipeline] // timeout
[Pipeline] // node
[Pipeline] // timeout
[Pipeline] }
09:13:52 Failed in branch windows-8-2.150.1
[Pipeline] }
[Pipeline] }
[Pipeline] // node
[Pipeline] // node
[Pipeline] }
09:13:52 Failed in branch linux-8
[Pipeline] }

09:13:52 Failed in branch windows-8
09:18:58 process apparently never started in /home/mwaite/testing-a.markwaite.net-agent/workspace/r_git-client-pipeline_beta-3.0_2@tmp/durable-9d0a69fe
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // node
[Pipeline] }
09:18:58 Failed in branch linux-8-2.150.1
[Pipeline] // parallel
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] End of Pipeline
ERROR: missing workspace C:\J\S\workspace\der_git-client-pipeline_beta-3.0 on mark-pc4-ssh
{noformat}

The {{process never started}} message and the {{missing workspace}} message are visible in both the failed git client plugin builds and in the failed git plugin builds.

The problem does *not* seem to repeat on a Java 8 environment, just on a Java 11 environment.  
The problem does *seem to repeat less frequently on a Java 11 environment running on a larger computer.  The failing computer has 8 GB RAM with an older Intel i5 processor, while the less frequently failing computer has 32 GB RAM and a newer Intel i5 processor.  The 32 GB machine has shown the failure only once.  That failure was during a restart while the agents and the server were very busy.


The [Docker image|https://github.com/MarkEWaite/docker-lfs/tree/30517c315d6dc052e3f88a749834891bdc7c5725/ref/plugins] includes all the plugins that were used in the failure case.  However, I haven't yet been able to duplicate the problem on any other machine.  More investigation soon.

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 21, 2019, 12:23:02 PM1/21/19
to jenkinsc...@googlegroups.com
Mark Waite updated an issue
While running Java 11 based Jenkins in a docker container using a pre-release of the workflow support plugin which includes the fix for the null pointer exception, a
{code}
jenkins-url/safeRestart
{code}
will cause several of the Pipeline jobs that were running to fail when Jenkins tries to resume the jobs.

Build log output from the failed builds has included messages like (seems to be more common in windows, but visible in Linux agents as well):
The problem does * seem to repeat * less frequently * on a Java 11 environment running on a larger computer.  The failing computer has 8 GB RAM with an older Intel i5 processor, while the less frequently failing computer has 32 GB RAM and a newer Intel i5 processor.  The 32 GB machine has shown the failure only once multiple times as well as the smaller computer .  That failure was during a restart while the agents and the server were very busy.


The [Docker image|https://github.com/MarkEWaite/docker-lfs/tree/30517c315d6dc052e3f88a749834891bdc7c5725/ref/plugins] includes all the plugins that were used in the failure case.  However, I haven't yet been able to duplicate the problem on any other machine.  More investigation soon.

mark.earl.waite@gmail.com (JIRA)

unread,
Jan 21, 2019, 12:24:03 PM1/21/19
to jenkinsc...@googlegroups.com
The problem does seem to repeat *less frequently* on a Java 11 environment running on a larger computer.  The failing computer has 8 GB RAM with an older Intel i5 processor, while the less frequently failing computer has 32 GB RAM and a newer Intel i5 processor.  The 32 GB machine has shown the failure multiple times as well as the smaller computer.  That failure was during a restart while the agents and the server were very busy.

The [Docker image|https://github.com/MarkEWaite/docker-lfs/tree/30517c315d6dc052e3f88a749834891bdc7c5725/ref/plugins] includes all the plugins that were used in the failure case.
  However, I haven ' t yet been able to duplicate ve dupilicated the problem failures on any other machine at least two different machines .   More investigation soon.

batmat@batmat.net (JIRA)

unread,
Jan 21, 2019, 3:14:01 PM1/21/19
to jenkinsc...@googlegroups.com
Baptiste Mathus commented on Bug JENKINS-55356
 
Re: Some workflow jobs fail after restart on Java 11 server

> repeat less frequently on a Java 11 environment running on a larger computer

Triggered a thought Mark Waite: would you be able to run that test inside a Docker container with memory constraints? I can help with that if that's something you've never played with before. But basically passing like -m 1G will limit the memory available inside the container to 1GB of RAM, which is possibly going to trigger that behavior more immediately, if that is actually related to some memory settings .

Cheers!

dnusbaum@cloudbees.com (JIRA)

unread,
Jan 25, 2019, 11:36:03 AM1/25/19
to jenkinsc...@googlegroups.com

The error message comes from workflow-durable-task-step. It would be interesting to check whether the directory mentioned in the error actually exists on the agent, or if it is missing, or exists but has a different name (perhaps a randomized suffix or `-` instead of `_` or something, which could be related to recent branch-api changes).

Another thing to point out is that FilePath#isDirectory will return false in some failure cases (see the Javadoc for File#isDirectory). Perhaps we should update this line to use NIO methods so it throws exceptions instead of returning false in some cases (another case of JENKINS-47324).

dnusbaum@cloudbees.com (JIRA)

unread,
Jan 25, 2019, 11:37:02 AM1/25/19
to jenkinsc...@googlegroups.com
Devin Nusbaum edited a comment on Bug JENKINS-55356
{quote}
The error message comes from [workflow-durable-task-step|https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/20dc6a9bc70e8b6ec598d4836b5c20cf57abd8ea/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L345]. It would be interesting to check whether the directory mentioned in the error actually exists on the agent, or if it is missing, or exists but has a different name (perhaps a randomized suffix or `-` instead of `_` or something, which could be related to recent branch-api changes).
{quote}

Another thing to point out is that {{FilePath#isDirectory}} will return false in some failure cases (see the
[ Javadoc for {{File#isDirectory}} |https://docs.oracle.com/javase/7/docs/api/java/io/File.html#isDirectory( ) ]) . Perhaps we should update [this line|https://github.com/jenkinsci/jenkins/blob/226f7b4c2bedb14b70f12da90db15574e18364d0/core/src/main/java/hudson/FilePath.java#L1653] to use NIO methods so it throws exceptions instead of returning false in some cases (another case of JENKINS-47324).

dnusbaum@cloudbees.com (JIRA)

unread,
Jan 25, 2019, 11:48:02 AM1/25/19
to jenkinsc...@googlegroups.com
Devin Nusbaum edited a comment on Bug JENKINS-55356
{quote}
The error message comes from [workflow-durable-task-step|https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/20dc6a9bc70e8b6ec598d4836b5c20cf57abd8ea/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L345]. It would be interesting to check whether the directory mentioned in the error actually exists on the agent, or if it is missing, or exists but has a different name (perhaps a randomized suffix or `-` instead of `_` or something, which could be related to recent branch-api changes).
{quote}

Another thing to point out is that {{FilePath#isDirectory}} will return false in some failure cases (see the [Javadoc for {{File#isDirectory}}|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#isDirectory()]). Perhaps we should update [this line|https://github.com/jenkinsci/jenkins/blob/226f7b4c2bedb14b70f12da90db15574e18364d0/core/src/main/java/hudson/FilePath.java#L1653] to use NIO methods so it throws exceptions instead of returning false in some cases (another case of JENKINS-47324).

 

Edit: I filed [https://github.com/jenkinsci/jenkins/pull/3864] for that issue.

dnusbaum@cloudbees.com (JIRA)

unread,
Jan 25, 2019, 11:57:02 AM1/25/19
to jenkinsc...@googlegroups.com


Double edit: Also to clarify, if {{FilePath#isDirectory}}  _had_ thrown an exception, then [this code|https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/20dc6a9bc70e8b6ec598d4836b5c20cf57abd8ea/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L341] would have been called, which would have caused Pipeline to attempt to connect again rather than aborting the build immediately. That issue wouldn't really explain why we are seeing this on Java 11 and not Java 8, but seems worth investigating.

me@basilcrow.com (JIRA)

unread,
Jan 25, 2019, 1:01:04 PM1/25/19
to jenkinsc...@googlegroups.com

I experienced this failure mode twice on January 16, two weeks after upgrading Jenkins from 2.138.1 LTS (with workflow-job 2.25, workflow-cps 2.54, and workflow-durable-task-step 2.21) to 2.150.1 LTS (with workflow-job 2.31, workflow-cps 2.61, and workflow-durable-task-step 2.27). I am not running Java 11. The job has been running daily and has only failed twice with this failure mode, so the error is transient.

ERROR: missing workspace /var/tmp/jenkins_slaves/jenkins-ops/workspace/devops-gate/master/sync-ova-into-dcod on scale-dc2
ERROR: missing workspace /var/tmp/jenkins_slaves/jenkins-ops/workspace/devops-gate/master/sync-ova-into-dcod@2 on dc3

When this error occurred on the 16th, I logged into these machines and checked the given directories on the command line. In both cases the directories existed. So I suspect there may have been some transient I/O error at the time.

o.v.nenashev@gmail.com (JIRA)

unread,
Jan 28, 2019, 4:17:01 AM1/28/19
to jenkinsc...@googlegroups.com
Oleg Nenashev updated an issue
 
Change By: Oleg Nenashev
Labels: java11-compatibility scrub triaged

o.v.nenashev@gmail.com (JIRA)

unread,
Feb 28, 2019, 9:22:51 AM2/28/19
to jenkinsc...@googlegroups.com
Oleg Nenashev commented on Bug JENKINS-55356
 
Re: Some workflow jobs fail after restart on Java 11 server

Devin Nusbaumwhat is the status here? We are about to proceed with Java 11 GA in Jenkins. I do not think it is a blocker, but it would be nice to get your feedback

dnusbaum@cloudbees.com (JIRA)

unread,
Feb 28, 2019, 9:31:05 AM2/28/19
to jenkinsc...@googlegroups.com

Oleg Nenashev Unchanged from my perspective. Given that Basil Crow mentioned that they have seen the issue on Java 8, it seems like this is not something specific to Java 11. I closed the PR I mentioned in this comment because I was not able to perform the testing necessary to feel confident about the change, and it wasn't clear to me what kinds of failures would have been exposed by switching to NIO. It would probably be safe to reopen it and change but change the behavior to return false if the directory does not exist to be much closer to the original behavior, but I'm not sure what the benefit would be. If anyone is able to come up with a self-contained and consistent reproduction case, then I would be more than happy to take a look, but without any other ideas I am just grasping at straws for now.

mark.earl.waite@gmail.com (JIRA)

unread,
Mar 14, 2019, 6:20:03 PM3/14/19
to jenkinsc...@googlegroups.com
Mark Waite stopped work on Bug JENKINS-55356
 
Change By: Mark Waite
Status: In Progress Open

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 28, 2019, 10:29:04 AM3/28/19
to jenkinsc...@googlegroups.com
Oleg Nenashev updated Bug JENKINS-55356
 

Platform SIG meeting: As far as Mark Waite concerned, it is resolved. 

Change By: Oleg Nenashev
Status: Open Fixed but Unreleased
Resolution: Fixed

o.v.nenashev@gmail.com (JIRA)

unread,
Mar 28, 2019, 10:30:02 AM3/28/19
to jenkinsc...@googlegroups.com

mark.earl.waite@gmail.com (JIRA)

unread,
May 28, 2019, 3:13:02 PM5/28/19
to jenkinsc...@googlegroups.com
Mark Waite closed an issue as Fixed
Change By: Mark Waite
Status: Resolved Closed
Reply all
Reply to author
Forward
0 new messages