RunScriptOnNode is failing sometimes in 1.6.0-rc2 because of wrong IP

78 views
Skip to first unread message

Jai

unread,
Apr 23, 2013, 8:19:00 PM4/23/13
to jcl...@googlegroups.com
Hi,

  I recently upgraded to 1.6.0 (rc2). I was using runScriptOnNode call on EC2 instance. In some cases, I get the following error.
Apparently, the ssh connection is being tried on the private IP of the instance instead of the public IP. I did not face this problem in the earlier version I was using. 
I am attaching a screen shot of amazon instance details that shows the private ip of the instance. Need help resolving this.

6:45:40.505 [SimpleAsyncTaskExecutor-1] ERROR SLF4JLogger << (root:rsa[fingerprint(20:05:08:81:8c:01:99:fc:30:29:23:e6:c3:6b:12:42),sha1(64:43:e5:ec:da:8f:a6:88:f9:a2:9b:8c:0e:d9:41:cb:e5:4e:51:41)]@10.195.7.24:22) error acquiring {hostAndPort=10.195.7.24:22, loginUser=root, ssh=null, connectTimeout=7200000, sessionTimeout=7200000} (out of retries - max 7): Exhausted available authentication methods
net.schmizz.sshj.userauth.UserAuthException: Exhausted available authentication methods
at net.schmizz.sshj.userauth.UserAuthImpl.authenticate(UserAuthImpl.java:114) ~[sshj-0.8.1.jar:na]
at net.schmizz.sshj.SSHClient.auth(SSHClient.java:205) ~[sshj-0.8.1.jar:na]
at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:305) ~[sshj-0.8.1.jar:na]
at net.schmizz.sshj.SSHClient.authPublickey(SSHClient.java:324) ~[sshj-0.8.1.jar:na]
at org.jclouds.sshj.SSHClientConnection.create(SSHClientConnection.java:144) ~[jclouds-sshj-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.sshj.SSHClientConnection.create(SSHClientConnection.java:40) ~[jclouds-sshj-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.sshj.SshjSshClient.acquire(SshjSshClient.java:193) [jclouds-sshj-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.sshj.SshjSshClient.connect(SshjSshClient.java:223) [jclouds-sshj-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.compute.callables.RunScriptOnNodeUsingSsh.call(RunScriptOnNodeUsingSsh.java:80) [jclouds-compute-1.6.0-rc.4.jar:1.6.0-rc.4]
at org.jclouds.compute.internal.BaseComputeService.runScriptOnNode(BaseComputeService.java:614) [jclouds-compute-1.6.0-rc.4.jar:1.6.0-rc.4]



Jai

unread,
Apr 23, 2013, 8:23:42 PM4/23/13
to jcl...@googlegroups.com
Attaching the ScreenShot.
Screen Shot 2013-04-23 at 5.06.09 PM.png

Andrew Phillips

unread,
Apr 24, 2013, 2:38:10 AM4/24/13
to jcl...@googlegroups.com
Hi Jai

Quick question: what was the earlier version you were using before
trying 1.6.0-rc.2?

ap

Jai

unread,
Apr 24, 2013, 2:57:54 AM4/24/13
to jcl...@googlegroups.com, aphi...@qrmedia.com
Hi Andrew ,

  I was on 1.5.2 before this.

Rgds
Jai

Jai

unread,
Apr 24, 2013, 8:52:04 PM4/24/13
to jcl...@googlegroups.com, aphi...@qrmedia.com
I investigated a little into the code and found the following:

The ConcurrentOpenSocketFinder class tries to identify the reachable IP from the two IP's available for the node (Public and Private).
In this case, my local network happened to have the exact IP that amazon generated for the node as its private IP. So the socket connect test to private IP succeeded.
Now , it tried to ssh to it and the ssh failed because of wrong authentication for obvious reasons.

The following method constructs the FluentIterable by first concating the publicAddress. But still the ssh connect was trying to the private IP.  Can anyone throw light on why the private IP was used first instead of the public or does it randomly pick one IP from the two for ssh purpose ?

 private static FluentIterable<String> checkNodeHasIps(NodeMetadata node) {

      FluentIterable<String> ips = FluentIterable.from(concat(node.getPublicAddresses(), node.getPrivateAddresses()));

      checkState(size(ips) > 0, "node does not have IP addresses configured: " + node);

      return ips;

   }

Adrian Cole

unread,
Apr 25, 2013, 10:28:15 AM4/25/13
to jcl...@googlegroups.com
I think the reason would be evident in the code that calls the method pasted.  At any rate, I'd guess it is more about which socket test completed first, given it is in parallel.  The code should prefer the local address as that's cheaper in public clouds.  Custom routing is possible by making a subclass of this and binding it in a guice module passed to ContextBuilder.modules

Hth
--
You received this message because you are subscribed to the Google Groups "jclouds" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jclouds+u...@googlegroups.com.
To post to this group, send email to jcl...@googlegroups.com.
Visit this group at http://groups.google.com/group/jclouds?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jai

unread,
Apr 25, 2013, 12:41:13 PM4/25/13
to jcl...@googlegroups.com
Thanks Adrian for clarifying. I will try the Subclassing option.
However, this should be treated as a bug because this can happen in a scenario where jclouds is interfacing a public cloud but running within the enterprise. Can we have the routing mechanism as an option configuration for a context.
Let me know if it is ok to file a bug for this.
To unsubscribe from this group and stop receiving emails from it, send an email to jclouds+unsubscribe@googlegroups.com.

Adrian Cole

unread,
Apr 25, 2013, 12:56:58 PM4/25/13
to jcl...@googlegroups.com
I'm not sure I agree with the term bug, as it wasn't written to support the scenario you describe.  I'm sure there are bugs in it, nonetheless :)

Regardless, feel free to log an issue on github, but keep in mind that in all likelihood, we will be an apache incubator project in <1 week.  If you log an issue on github, it isn't going to get into the release this weekend, and probably won't until an apache incubation release which will in all likelihood be tracked in jira.  In other words, don't forget to champion this as we transition.

-A 


To unsubscribe from this group and stop receiving emails from it, send an email to jclouds+u...@googlegroups.com.

Andrew Phillips

unread,
Apr 25, 2013, 5:54:14 PM4/25/13
to jcl...@googlegroups.com
> I'm not sure I agree with the term bug, as it wasn't written to support the
> scenario you describe.

+1. The code doesn't seem "broken" as such. What's interesting is that
two potentially improvements have been suggested here:

- The code should prefer the local address as that's cheaper in public clouds
- The code should prefer the public address since the local address
might match a different local machine

The differentiating factor would seem to be whether the *calling*
machine (i.e. where jclouds is running) is in the same local network
as the machine-being-provisioned.

In the absence of a robust way to determine this automatically (can't
immediately think of one, to be honest) it would seem like a strategy
that could support either would be useful.

ap

Adrian Cole

unread,
Apr 25, 2013, 6:20:47 PM4/25/13
to jcl...@googlegroups.com
Well, the process is probably already too magical, also if it were truly concurrent, then this is a tie-break concern vs an ordering one.  Ex. Concurrent is find first, so basically we need to find all, and then have tiebreak logic between those.  This can get lovely complicated.

I'm more inclined to have the option for a filter which will throw out local addresses when long lining ssh from a distant network.  In other words, I can see permitting the option to filter out networks, but keep default as is.

I also petition again for us to table this a few days and pick it up in ASF as none of the options will go into the code in the next 48hrs.

-A

Andrew Phillips

unread,
Apr 26, 2013, 4:23:23 AM4/26/13
to jcl...@googlegroups.com
> I also petition again for us to table this a few days and pick it up
> in ASF as none of the options will go into the code in the next 48hrs

+1. I can imagine that getting this right and keeping it simple will
take some time.

ap

Jai

unread,
Apr 27, 2013, 12:54:54 PM4/27/13
to jcl...@googlegroups.com, aphi...@qrmedia.com
I have opened an issue to track this

Thanks
Jai
Reply all
Reply to author
Forward
0 new messages