Our Jenkins instance entered a state where the UI was still responsive, but the queue would not populate (jobs/pipelines that were supposed to run were never scheduled), and workers would say there were running builds but those builds were actually already finished. Looking through the logs we notice two particular logs: The following happened intermittently and the last time was an hour before the failure
SEVERE: Failed Inspecting plugin /var/lib/jenkins/plugins/pipeline-model-declarative-agent.hpi
java.io.IOException: Failed to expand /var/lib/jenkins/plugins/pipeline-model-declarative-agent.hpi
at hudson.ClassicPluginStrategy.explode(ClassicPluginStrategy.java:633)
at hudson.ClassicPluginStrategy.createPluginWrapper(ClassicPluginStrategy.java:183)
at hudson.PluginManager$1$3$1.run(PluginManager.java:404)
at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:169)
at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
at jenkins.model.Jenkins$5.runTask(Jenkins.java:1069)
at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: Error while expanding /var/lib/jenkins/plugins/pipeline-model-declarative-agent.hpi
java.util.zip.ZipException: archive is not a ZIP archive
at org.apache.tools.ant.taskdefs.Expand.expandFile(Expand.java:190)
at org.apache.tools.ant.taskdefs.Expand.execute(Expand.java:132)
at hudson.ClassicPluginStrategy.unzipExceptClasses(ClassicPluginStrategy.java:705)
at hudson.ClassicPluginStrategy.explode(ClassicPluginStrategy.java:630)
... 10 more
Caused by: java.util.zip.ZipException: archive is not a ZIP archive
at org.apache.tools.zip.ZipFile.positionAtEndOfCentralDirectoryRecord(ZipFile.java:771)
at org.apache.tools.zip.ZipFile.positionAtCentralDirectory(ZipFile.java:707)
at org.apache.tools.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:452)
at org.apache.tools.zip.ZipFile.<init>(ZipFile.java:214)
at org.apache.tools.ant.taskdefs.Expand.expandFile(Expand.java:168)
... 13 more
The following happened very frequently per each trigger of one particular pipeline. Other pipelines triggered after we noticed the freeze did not show up in the queue or in the logs.
Nov 16, 2018 3:30:07 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
WARNING: Error initializing storage and loading nodes, will try to create placeholders for: CpsFlowExecution[Owner[cosmos/docker-node-base/PR-13/6:cosmos/docker-node-base/PR-13 #6]]
java.io.IOException: Tried to load head FlowNodes for execution Owner[cosmos/docker-node-base/PR-13/6:cosmos/docker-node-base/PR-13 #6] but FlowNode was not found in storage for head id:FlowNodeId 1:38
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.initializeStorage(CpsFlowExecution.java:678)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:715)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:660)
at io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor.<init>(PipelineNodeGraphVisitor.java:107)
at io.jenkins.blueocean.rest.impl.pipeline.NodeGraphBuilder$NodeGraphBuilderFactory.getInstance(NodeGraphBuilder.java:39)
at io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeContainerImpl.<init>(PipelineNodeContainerImpl.java:32)
at io.jenkins.blueocean.rest.impl.pipeline.PipelineRunImpl.getNodes(PipelineRunImpl.java:185)
at io.jenkins.blueocean.rest.impl.pipeline.PipelineRunImpl.getStateObj(PipelineRunImpl.java:121)
at io.jenkins.blueocean.service.embedded.rest.AbstractRunImpl.getResult(AbstractRunImpl.java:149)
at io.jenkins.blueocean.commons.stapler.export.MethodProperty.getValue(MethodProperty.java:72)
at io.jenkins.blueocean.commons.stapler.export.ExportInterceptor$1.getValue(ExportInterceptor.java:46)
at io.jenkins.blueocean.commons.stapler.Export$BlueOceanExportInterceptor.getValue(Export.java:167)
at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:136)
at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228)
at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224)
at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224)
at io.jenkins.blueocean.commons.stapler.export.Model.writeTo(Model.java:199)
at io.jenkins.blueocean.commons.stapler.Export.writeOne(Export.java:148)
at io.jenkins.blueocean.commons.stapler.Export.serveExposedBean(Export.java:136)
at io.jenkins.blueocean.commons.stapler.Export.doJson(Export.java:79)
at io.jenkins.blueocean.rest.pageable.PagedResponse$Processor$1.generateResponse(PagedResponse.java:70)
at org.kohsuke.stapler.HttpResponseRenderer$Default.handleHttpResponse(HttpResponseRenderer.java:124)
at org.kohsuke.stapler.HttpResponseRenderer$Default.generateResponse(HttpResponseRenderer.java:69)
at org.kohsuke.stapler.Function.renderResponse(Function.java:136)
at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:119)
at org.kohsuke.stapler.IndexDispatcher.dispatch(IndexDispatcher.java:27)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.MetaClass$3.doDispatch(MetaClass.java:212)
at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.MetaClass$10.dispatch(MetaClass.java:384)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.MetaClass$10.dispatch(MetaClass.java:384)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.MetaClass$3.doDispatch(MetaClass.java:212)
at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.MetaClass$10.dispatch(MetaClass.java:384)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.MetaClass$10.dispatch(MetaClass.java:384)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.MetaClass$10.dispatch(MetaClass.java:384)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:709)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.MetaClass$10.dispatch(MetaClass.java:384)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:739)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:870)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:668)
at org.kohsuke.stapler.Stapler.service(Stapler.java:238)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:243)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at com.smartcodeltd.jenkinsci.plugin.assetbundler.filters.LessCSS.doFilter(LessCSS.java:47)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
Both errors went away after we restarted the Jenkins master docker container. My theory is that something happened at start up such that FlowNodes could not be initialized which ended up freezing pipelines from executing and causing the queue to be in a bad state. I tried to reproduce by marking all nodes unavailable but could not get the errors to happen again. I did notice this commit https://github.com/jenkinsci/workflow-cps-plugin/commit/85e99f1519f545c8c96cad183a6be8c53affe727 was included in the 2.6.0 release that directly affects how FlowNodes/FlowStorage is initialized so it might also be related. |