I have:
Today I have added a new job that runs a test suite. On build completion I have a few publishers:
-
Archive the artifacts ( logs/* ). Note the build produce no log but archiver is set to not fail
-
PostBuild, to trigger another project (named castor-save).
The archiver fails because the node went offline while it was executing: {{ ✓ retrieve en.wp main page via mobile-sections (364ms) ✓ retrieve lead section of en.wp main page via mobile-sections-lead (306ms) FATAL: no longer a configured node for ci-jessie-wikimedia-33866 java.lang.IllegalStateException: no longer a configured node for ci-jessie-wikimedia-33866 at hudson.model.AbstractBuild$AbstractBuildExecution.getCurrentNode(AbstractBuild.java:456) at hudson.model.AbstractBuild$AbstractBuildExecution.reportBrokenChannel(AbstractBuild.java:813) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:788) at hudson.model.Build$BuildExecution.build(Build.java:205) at hudson.model.Build$BuildExecution.doRun(Build.java:162) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537) at hudson.model.Run.execute(Run.java:1741) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:408) ERROR: Step ‘Archive the artifacts’ failed: no workspace for mobileapps-deploy-npm-node-4.3 #1 [PostBuildScript] - Execution post build scripts. [PostBuildScript] Build is not success : do not execute script Finished: FAILURE }}
I am apparently not the only one impacted. From a recent IRC log at http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-01-18.log.html
> Thelo greghaynes: once in a while I get this error : FATAL: no longer a configured node for d-p-c-local_01-769 in my job's console
JENKINS-26665 "Complete lack of correct synchronization or concern for thread safety in mansion cloud plugin" has a similar stack trace.
Job page: https://integration.wikimedia.org/ci/job/mobileapps-deploy-npm-node-4.3/1/ (hopefully Jenkins will keep it). I have attached the XML configuration. It ran on node ci-jessie-wikimedia-33866.
The job failure occurred on Feb 15th 2016 at 17:39:02
In my case I had two different jobs running on the same node. Which goes something like:
{{ 2016-02-15 17:31:37,287 INFO nodepool.NodeLauncher: Node id: 33866 is ready 2016-02-15 17:31:41,056 INFO nodepool.NodeLauncher: Node id: 33866 added to jenkins 2016-02-15 17:37:21,325 DEBUG nodepool.NodeUpdateListener: Received: onStarted {"name":"integration-config-tox-py27-jessie" ... "node_name":"ci-jessie-wikimedia-33866" 2016-02-15 17:38:01,808 DEBUG nodepool.NodeUpdateListener: Received: onFinalized {"name":"integration-config-tox-py27-jessie" ... "node_name":"ci-jessie-wikimedia-33866" }}
And half a minute after, a different job is assigned to the same node: {{ 2016-02-15 17:38:33,867 DEBUG nodepool.NodeUpdateListener: Received: onStarted {"name":"mobileapps-deploy-npm-node-4.3" ... "node_name":"ci-jessie-wikimedia-33866" 2016-02-15 17:38:33,871 INFO nodepool.NodeUpdateListener: Setting node id: 33866 to USED 2016-02-15 17:39:01,875 DEBUG nodepool.NodePool: Deleting node id: 33866 which has been in used state for 0.00802109248108 hours 2016-02-15 17:39:02,942 DEBUG nodepool.NodeUpdateListener: Received: onCompleted {"name":"mobileapps-deploy-npm-node-4.3" ... "node_name":"ci-jessie-wikimedia-33866" FAILURE a0ab290726d747608dcac63b1f1a33b5","ZUUL_VOTING":"1"}
,"node_name":"ci-jessie-wikimedia-33866" FAILURE 2016-02-15 17:39:06,763 INFO nodepool.NodePool: Deleted jenkins node id: 33866 }}
|