Inconsistent job failures because of NoClassDefFoundError on hudson.util.ProcessTree

219 views
Skip to first unread message

Girish Adat

unread,
Feb 11, 2018, 7:51:52 AM2/11/18
to Jenkins Users
Hello,

Sorry for a long post!

We recently upgraded to the latest Jenkins - 2.89.3. But we started getting inconsistent failures from Jenkins builds on our main build server, which is a remote slave for Jenkins server. These looks to be something similar to some "postponed" jdk9 bugs (https://issues.jenkins-ci.org/browse/JENKINS-46523)

I tried experimenting the below.
  1. Upgraded to latest JDK, same version on Jenkins and Slave. Initially JDK 1.8u152, and later to 1.8u144, where the latest Jenkins is built.
  2. Tried downgrading to 1.7 on slave.
  3. Using JNLP with latest agent.jar, as well as Jenkins to start slave.jar via SSH.
  4. Clearing the -jar-cache location. I think after clearing, the stacktrace is slightly different. Earlier it was complaining hudson.util.ProcessTree$UnixReflection, similar to https://issues.jenkins-ci.org/browse/JENKINS-21341 (a bug fixed in 2015, in 1.5xx version).
  5. Checked the hudson.util.ProcessTree from slave's Groovy console. Yet to get into slave/agent jar code and see how the class loading works there.
    1. Below works. (thanks to https://issues.jenkins-ci.org/browse/JENKINS-6068)
      • import hudson.uril.ProcessTree
      • println(ProcessTree.class)
    2. Below does NOT work.
      • ClassLoader.systemClassLoader.loadClass "hudson.uril.ProcessTree"
    3. Checked and made sure that the hudson core jar is fetched to jar cache, and it has the required class. The jar is shown listed in the slave.jar's java process, in lsof command.
Some more TODOs I have, though I am assuming I am getting errors in a very straight forward case. So I should solve than going back to older versions.
  1. Revert back to our older 2.46.1 version.
  2. Experiment with 2.73 versions.
From Slave logs could see that it is showing below error. Note that this is not happening for all the builds, but for at least a two third of the builds. And the builds are not failing at a common place, as you can imagine.


Feb 10, 2018 8:43:44 PM hudson.remoting.UserRequest perform

WARNING: LinkageError while performing UserRequest:UserRPCRequest(4,join)

java.lang.NoClassDefFoundError: hudson/util/ProcessTree

at hudson.Proc$LocalProc.destroy(Proc.java:384)

at hudson.Proc$LocalProc.join(Proc.java:357)

at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1304)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:922)

at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:896)

at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:853)

at hudson.remoting.UserRequest.perform(UserRequest.java:207)

at hudson.remoting.UserRequest.perform(UserRequest.java:53)

at hudson.remoting.Request$2.run(Request.java:358)

at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.ClassNotFoundException: hudson.util.ProcessTree

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:159)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

... 18 more


I could not understand if I am doing something wrong in the configurations. Please share if you came across similar problem and managed to solve it.


Thanks and Regards,

Girish Adat


Girish Adat

unread,
Feb 12, 2018, 6:07:55 AM2/12/18
to Jenkins Users
Had a look at the hudson git repo. Could see that there is something wrong in my assumption.

In fact, the connection failure is happening between the Jenkins and the Slave. And it is triggering the destroy. Here destroy also causes error on latest Jenkins in my setup, after the below mentioned ways.
I arrived at this conclusion after verifying long pings. I could see ICMP redirect is happening a few times (e.g. ~100 times in 3hrs) for a continuous ping.

So to make the long story short, if I can ensure good connectivity between Jenkins and Slave, there won't be any issue.

I can see pings giving "From <slave IP minus 1>: icmp_seq=7094 Redirect Host(New nexthop: <slave IP>)", when pinging from Jenkins server. Now seeing how I can solve this problem. Both these servers are VMs on same VMware host, using the same Physical network.
Reply all
Reply to author
Forward
0 new messages