AgentBootStrap sending SIGTERM to Master, Drivers etc

20 views
Skip to first unread message

A S

unread,
Jan 28, 2016, 4:25:53 PM1/28/16
to faban-users
Guys,

Let me share some of the things I figured.  Few days/week back, I ran into the problem where the MasterImpl process was killed for an unknown reason and my benchmarks abruptly ended. It turns out, that was because the AgentBootStrap was sending a SIGTERM to it and the Drivers etc.

I didnt know this was not directly related to performance before, so I decided that I might want to distribute the load on multiple machines to see if that helps. I looked at the instructions in faban.org site where it was indicated that I run the harness on one of the machines, and run the agent daemon on another machine. Initially even this didnt work. The reason being, the agent daemon in the remote machine is sent four parameters [Name of Agent Host, IP1, IP2, JVM location]. The IP1 in case of a simple private network needs to be the same as IP2 which is the IP of the Master. However this IP1 is determined by a script which does it wrong? (which I think is supposed to work for Multihomed Devices and does not work well for my basic setup and incorrectly identifies the IP1 as the IP of Agent Host). 

The symptom of the script not working is Retrying Connection to agent@ etc... message in the log

I have copied the script tries to do below. 

------------------------------

#!/bin/sh

#########################################################################

# The interface script determines the ip address of the interface used to

# talk to the given remote host. If remote host cannot be contacted, it

# will exit with an exit value of 1.

#########################################################################

 

COMMAND="$0"

TARGET="$1"

 

usage() {

    echo "usage: ${COMMAND} host" >&2

    exit 1;

}

 

if [ -z "${TARGET}" ] ; then

    usage

fi

 

INTERFACE=`ping -R -c1 "${TARGET}" | grep RR | awk '{ print $2 }' 2>/tmp/interface.$$.err`

 

ECHO $INTERFACE

------------------------------

I must say that both of my machines are ubuntu, connected in the same switched network (nothing fancy here). However, they are both ubuntu running on DIFFERENT Xen Servers(Virtualized). Not sure if this is a bug, but I thought I will share.

After I fixed this (by just echoing the IP2 = master IP), I fixed the above problem, but I still saw the Master process was still being killed midway. So it was unfortunately not distribution of load. 
Note, the master, and other processes are killed when I run the benchmark for say more than 10 minutes. If I run the benchmark for a short period of time < 10 min, I dont see the problem. I am currently looking at the code (luckily I have some of the source code of SPECJ and FABAN) to figure out what is going on, but if you guys have any hints or things to check, please do share.

Ashwin




A S

unread,
Jan 28, 2016, 4:28:38 PM1/28/16
to faban-users
Stack trace looks like this.....

Run end failed!

Details:

HostSequenceDateMillis
 322016-01-28T15:06:041454011564608

Logger: com.sun.faban.harness.engine.GenericBenchmark
ThreadClassMethod
44com.sun.faban.harness.engine.GenericBenchmarkstart

Exception:

Message:
java.lang.Exception: Driver failed to complete benchmark run

Stack Trace:
ClassMethodLine
com.sun.faban.harness.DefaultFabanBenchmark2end400
sun.reflect.NativeMethodAccessorImplinvoke0 
sun.reflect.NativeMethodAccessorImplinvoke57
sun.reflect.DelegatingMethodAccessorImplinvoke43
java.lang.reflect.Methodinvoke606
com.sun.faban.harness.util.Invokerinvoke118
com.sun.faban.harness.engine.AnnotationBenchmarkWrapperend168
com.sun.faban.harness.engine.GenericBenchmarkstart301
com.sun.faban.harness.engine.RunDaemonrun338
java.lang.Thread

A S

unread,
Feb 24, 2016, 9:22:05 PM2/24/16
to faban-users
Hello Guys,

I eventually resolved the problem. There is a very good chance this is a bug with OPEN JDK implementation and how it handles sub process (If you google this you will see some bugs resolved in 1.8, I am using 1.7). Swapping out OPEN JDK runtime to Hotspot runtime  ran the benchmark without even a hint of a problem.

A S

On Thursday, January 28, 2016 at 4:25:53 PM UTC-5, A S wrote:

Jeremy Whiting

unread,
Apr 13, 2016, 4:26:55 AM4/13/16
to faban-users
Hi A S,
 I see you are trying to attribute these failures to OpenJDK. I have been running multiple versions of Faban over the years with various versions of OpenJDK without trouble from the runtime.

 You need to do more digging into what is causing the issue. Then report here with your findings to help the community diagnose the problem. What you have provided so far is the general fault report without the low level information.
 Check for things like ClassNotFoundException and configuration errors. Incorrect paths etc.

Regards,
Jeremy

A S

unread,
May 16, 2016, 12:08:31 PM5/16/16
to faban-users
Hi Guys,

Once I swapped the OPEN JDK runtime (Iced Tea) with Oracle runtime (Hotspot), I didnt see any problem.
Sorry, earlier I said Hotspot instead of Iced Tea. I wanted to share this in the forum for other users who may or may not see the issue.

Ashwin
Reply all
Reply to author
Forward
0 new messages