[JIRA] (JENKINS-62146) Cache does not work with Slaves nodes

25 views
Skip to first unread message

nfalco79@hotmail.com (JIRA)

unread,
May 2, 2020, 8:02:03 AM5/2/20
to jenkinsc...@googlegroups.com
Nikolas Falco created an issue
 
Jenkins / Bug JENKINS-62146
Cache does not work with Slaves nodes
Issue Type: Bug Bug
Assignee: Mads Mohr Christensen
Components: adoptopenjdk-plugin
Created: 2020-05-02 12:01
Environment: Jenkins 2.222.3
multiple Jenkins Slave on Windows
multiple Jenkins Slave on Linux
Priority: Critical Critical
Reporter: Nikolas Falco

We are testing our code for JDK11. I setup a Tool to use JDK 11

We run a build on jenkins nodeA for the first time. It install the JDK 11 as expected.

We run a build on jenkins nodeB and I got this error:

Installing AdoptOpenJDK to /var/lib/jenkins/tools/hudson.model.JDK/JDK_11
ERROR: Failed to download file:/var/lib/jenkins/caches/adoptopenjdk/LINUX/amd64/jdk-11.0.7+10.zip from agent; will retry from master
Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from prd-cm-as-06.fx.lan/10.1.3.105:58658
		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1788)
		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
		at hudson.remoting.Channel.call(Channel.java:998)
		at hudson.FilePath.act(FilePath.java:1069)
		at hudson.FilePath.act(FilePath.java:1058)
		at hudson.FilePath.installIfNecessaryFrom(FilePath.java:914)
		at hudson.FilePath.installIfNecessaryFrom(FilePath.java:850)
		at io.jenkins.plugins.adoptopenjdk.AdoptOpenJDKInstaller.performInstallation(AdoptOpenJDKInstaller.java:121)
		at hudson.tools.InstallerTranslator.getToolHome(InstallerTranslator.java:69)
		at hudson.tools.ToolLocationNodeProperty.getToolHome(ToolLocationNodeProperty.java:109)
		at hudson.tools.ToolInstallation.translateFor(ToolInstallation.java:206)
		at hudson.model.JDK.forNode(JDK.java:148)
		at hudson.model.JDK.forNode(JDK.java:60)
		at org.jenkinsci.plugins.workflow.steps.ToolStep$Execution.run(ToolStep.java:152)
		at org.jenkinsci.plugins.workflow.steps.ToolStep$Execution.run(ToolStep.java:133)
		at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
		at java.lang.Thread.run(Thread.java:745)
java.io.FileNotFoundException: /var/lib/jenkins/caches/adoptopenjdk/LINUX/amd64/jdk-11.0.7+10.zip (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
	at java.net.URL.openStream(URL.java:1045)
	at hudson.FilePath$Unpack.invoke(FilePath.java:948)
	at hudson.FilePath$Unpack.invoke(FilePath.java:942)
	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3069)
	at hudson.remoting.UserRequest.perform(UserRequest.java:212)
	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
	at hudson.remoting.Request$2.run(Request.java:369)
	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
	at java.lang.Thread.run(Thread.java:748)

The problem seems to be at code line 121

expected.getParent().installIfNecessaryFrom(cache.toURI().toURL(), log, ..)

cache it's a file and is on master node, the installer is perming a MasterToSlave callable using a file:// URL to get a resource from other machine. This can not work. The file content should be streamed.

I see also a concurrent issue at line 137

Path tmp = new File( cache.getPath()+".tmp").toPath();

Two different jenkins slave on linux 64 than run a build for the first time will use the same filename tmp.

I remember jenkins handled installers using a semphore, but I do not remember if this semaphore is for each Node or for all nodes. Latest case do not requires handle concurrency in the installer.

 

The JVM property to disable cache requires open an IT tickets. Since the default Oracle JDK installer does not use cache and cached files must be cleanup manually with a SSH session on master node to free spaces I think coudl be better change default to disabled.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.13.12#713012-sha1:6e07c38)
Atlassian logo

nfalco79@hotmail.com (JIRA)

unread,
May 2, 2020, 8:04:04 AM5/2/20
to jenkinsc...@googlegroups.com
Nikolas Falco updated an issue
Change By: Nikolas Falco
We are testing our code for JDK11. I setup a Tool to use JDK 11

We run a build on jenkins nodeA for the first time. It install the JDK 11 as expected.

We run a build on jenkins nodeB and I got this error:
{noformat}
at java.lang.Thread.run(Thread.java:748){noformat}

The problem seems to be at code line 121
{code:java}
expected.getParent().installIfNecessaryFrom(cache.toURI().toURL(), log, ..)
{code}
cache it's a file
and is on master node, the installer is perming performing a MasterToSlave callable using a [ file:// |file:///] URL to get a resource from other machine master to slave . This can not work , should be something [http://jenkins-|http://jenkins/]master/context/ . The ... otherwise file content should be streamed by the callable .


I see also a concurrent issue at line 137
{code:java}

Path tmp = new File( cache.getPath()+".tmp").toPath();
{code}

Two different jenkins slave on linux 64 than run a build for the first time will use the same filename tmp.

I remember jenkins handled installers using a semphore, but I do not remember if this semaphore is for each Node or for all nodes. Latest case do not requires handle concurrency in the installer.

 

The JVM property to disable cache requires open an IT tickets. Since the default Oracle JDK installer does not use cache and cached files must be cleanup manually with a SSH session on master node to free spaces I think coudl be better change default to disabled.

nfalco79@hotmail.com (JIRA)

unread,
May 2, 2020, 8:18:02 AM5/2/20
to jenkinsc...@googlegroups.com
Nikolas Falco commented on Bug JENKINS-62146
 
Re: Cache does not work with Slaves nodes

Other side effect is that when fails on before the second retry the entire "/var/lib/jenkins/tools/hudson.model.JDK" folder content is deleted.

This make other builds stops and cause download of JDK again and again.

nfalco79@hotmail.com (JIRA)

unread,
May 2, 2020, 8:57:02 AM5/2/20
to jenkinsc...@googlegroups.com
Nikolas Falco edited a comment on Bug JENKINS-62146
Other side effect is that when fails on it retry twice download but before the second retry the entire "/var/lib/jenkins/tools/hudson.model.JDK" folder content is deleted.

This
make causes other builds stops and cause download of JDK also redownload other JDKs again and again .

hr.mohr@gmail.com (JIRA)

unread,
May 3, 2020, 4:47:03 PM5/3/20
to jenkinsc...@googlegroups.com

I have tried to replicate the error and for me the cache seems to work. The error message in this issue looks like the same noise as reported in JENKINS-61913 and not actually a hard error.

What I did to test the cache:

  1. Started local master
  2. Booted a centos/7 VM using Vagrant
  3. Setup a slave over SSH
  4. Started a job that used a JDK tool to setup the initial cache on master
  5. Disabled network on host computer
  6. Deleted tool installation on slave
  7. Started job using JDK tool to download cache from master

This downloaded the JDK tool from the cache on the master for me. 

 

nfalco79@hotmail.com (JIRA)

unread,
May 3, 2020, 4:54:06 PM5/3/20
to jenkinsc...@googlegroups.com

In our environment always happens. The inplementation is similar (but quite different) than Oracle plugin. In Oracle plugin never happens FileNotFound issue. I will debug both to find the difference in behavour

nfalco79@hotmail.com (JIRA)

unread,
May 3, 2020, 4:55:02 PM5/3/20
to jenkinsc...@googlegroups.com

In our cases slaves are indipendent not handled by master (we can not)

Reply all
Reply to author
Forward
0 new messages