Multiple agent issues on ec2

113 views
Skip to first unread message

Shanti Subramanyam (gmail)

unread,
May 10, 2012, 5:33:47 PM5/10/12
to faban...@googlegroups.com
I am trying to setup faban to run from 2 ec2 instances (1 master, 1 agent). I can ping, ssh, etc. and the master is able to successfully start the CmdAgent via ssh (I'm using Ubuntu on both nodes) and regsiters it successfully.
The agent then pulls down all the benchmark files - so far so good.
However, when CmdService tries to Connect to the CmdAgent and FileAgent on this node, it fails complaining "Error accessing command agent on system <host>"
The Exception is:
java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exception is:
java.net.ConnectException: Connection refused
Stack Trace:
Class Method Line
sun.rmi.transport.tcp.TCPEndpoint newSocket 619
sun.rmi.transport.tcp.TCPChannel createConnection 216
sun.rmi.transport.tcp.TCPChannel newConnection 202
sun.rmi.server.UnicastRef invoke 128
com.sun.faban.harness.agent.CmdAgentImpl_Stub getHostName  
com.sun.faban.harness.engine.CmdService getCmdAgent 641
com.sun.faban.harness.engine.CmdService setup 503
com.sun.faban.harness.engine.GenericBenchmark start 154
com.sun.faban.harness.engine.RunDaemon run 338
java.lang.Thread run 679


I have tried everything and am now out of ideas. I have opened up all ports on both instances for TCP, enabled ICMP, ssh, etc. 

What could be causing this problem?
Here are the entries from the 'hosts' file on the agent instance. Note that 127.0.1.1 is added by ec2. 

127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# Added by cloud-init
127.0.1.1 ip-10-245-123-136.ec2.internal ip-10-245-123-136


Shanti Subramanyam (gmail)

unread,
May 14, 2012, 12:54:04 PM5/14/12
to faban...@googlegroups.com
I finally fixed this problem and so thought I'd share the fix in case anyone else runs into this issue.
The problem is that there is an RMI connection back from the agent on the 2nd driver to the master and this uses the local ip address that corresponds to the specified driver's hostname.
If you look in the hosts file of an ec2 instance, the ip address is specified as '127.0.1.1' (scroll down this message to see my hosts file). 
When I changed this line to include the actual ip address of the host, the problem went away.
For good measure, I also changed this on the master machine as Akara mentioned that the CmdAgent on the drivers use the ip address of the master that gets passed in on invocation.

Shanti
Reply all
Reply to author
Forward
0 new messages