Error on launch with HDP 2.0 GA

816 views
Skip to first unread message

Ruhollah Farchtchi

unread,
Oct 25, 2013, 6:22:20 PM10/25/13
to storm...@googlegroups.com
Hi All. Looking for Andy or someone to respond. I've got the latest branch from the git repository and can't figure this one out. I am able to execute storm-yarn launch successfully, however the yarn application dies after a few seconds of running. I get the following error when looking at the yarn resource management ui. 

Application application_1382717489443_0006 failed 2 times due to AM Container for appattempt_1382717489443_0006_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
.Failing this attempt.. Failing the application.


Any ideas on how to troubleshoot this one?

aperep...@hortonworks.com

unread,
Oct 27, 2013, 6:20:10 PM10/27/13
to storm...@googlegroups.com
Hi,

If you dig deeper in various logs, you will see the root cause. Basically:

* As an hdfs user, create /user/yarn and chown it to 'yarn' (this is in HDFS, not local)
* Run storm-yarn as hdfs

I'm digging through internals myself, but above worked for me :)

Andrew

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Andy Feng

unread,
Oct 28, 2013, 2:10:10 AM10/28/13
to aperep...@hortonworks.com, storm...@googlegroups.com
Folks:

I have not tried out with HDP 2.0 GA yet, and will try to do so early this week.

Andy

--
You received this message because you are subscribed to the Google Groups "storm-yarn" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-yarn+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew Perepelytsya

unread,
Oct 28, 2013, 6:56:07 AM10/28/13
to Andy Feng, storm...@googlegroups.com

No worries, I had it set up with hdp2 ga both in single-node and distributed clusters already. Just recollecting my thoughts to file a few issue reports to fix.

Andrew

Andy Feng

unread,
Oct 28, 2013, 10:17:50 AM10/28/13
to and...@hortonworks.com, storm...@googlegroups.com
Why don't you propose a pull request? That will help all of us.

Andy Feng

Sent from my iPhone

Ruhollah Farchtchi

unread,
Oct 28, 2013, 4:52:45 PM10/28/13
to aperep...@hortonworks.com, storm...@googlegroups.com
Yes that seemed to work if I chown'd it. Thanks for the help. 

Ruhollah Farchtchi
ruhollah....@gmail.com


--
You received this message because you are subscribed to a topic in the Google Groups "storm-yarn" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/storm-yarn/OLcvP6ITbdE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to storm-yarn+...@googlegroups.com.

Ruhollah Farchtchi

unread,
Oct 28, 2013, 6:17:59 PM10/28/13
to aperep...@hortonworks.com, storm...@googlegroups.com
So now my problem is this. I can submit the application to Yarn, but nimbus seems to be failing to start properly to accept Storm jobs. Looking at nimbus.log from the resource manager console I see there are several thrift connection attempts to localhost:2181 where the connection is refused. I'm not sure what is supposed to be running on 2181 on that node, but it seems i'm stuck. Any ideas?

Ruhollah Farchtchi
ruhollah....@gmail.com


On Sun, Oct 27, 2013 at 6:20 PM, <aperep...@hortonworks.com> wrote:

--

Olivier Renault

unread,
Oct 29, 2013, 6:46:35 AM10/29/13
to Ruhollah Farchtchi, aperep...@hortonworks.com, storm...@googlegroups.com
2181 normally is Zookeeper. You might need to configure it so it check your ZK quorum instead of looking on localhost. 

Olivier 


--
You received this message because you are subscribed to the Google Groups "storm-yarn" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-yarn+...@googlegroups.com.

Andrew Perepelytsya

unread,
Oct 29, 2013, 5:44:15 PM10/29/13
to Olivier Renault, storm...@googlegroups.com, Ruhollah Farchtchi

Hi,

Done delays due to Strata conference.

Basically, you have to update your storm config to point to a zookeeper quorum (list out all zookeeper instances after checking where they are deployed via Ambari. Hosts - Components - ZK Servers.

Storm-yarn getStormConfig, update the resulting yaml file. Restart with storm-yarn myconfig.yaml launch.

The reason is your Yarn-based storm cluster is running on nodes which don't have zookeeper (which is fine).

HTH,
Andrew

Ruhollah Farchtchi

unread,
Oct 31, 2013, 9:30:07 AM10/31/13
to and...@hortonworks.com, Olivier Renault, storm...@googlegroups.com
Thanks so much for all the help. The response on this list has been awesome. I really appreciate it. I've been able to get Storm up and running with storm-yarn launch however when submitting one of the example topologies with the following command

storm jar lib/storm-starter-0.0.1-SNAPSHOT.jar storm.starter.WordCountTopology WordCountTopology


I get an exception with regard to Thrift. My hunch is that there is another parameter I am missing in the storm.yaml config or some permission is set wrong, but I can't tell what that is. I can't even tell where this exception is logged, if at all. Any pointers in the right direction are appreciated.

Exception in thread "main" java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused

at backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:21)

at backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:70)

at backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:41)

at storm.starter.WordCountTopology.main(WordCountTopology.java:78)

Caused by: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused

at org.apache.thrift7.transport.TSocket.open(TSocket.java:183)

at org.apache.thrift7.transport.TFramedTransport.open(TFramedTransport.java:81)

at backtype.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:66)

at backtype.storm.security.auth.ThriftClient.<init>(ThriftClient.java:46)

at backtype.storm.utils.NimbusClient.<init>(NimbusClient.java:30)

at backtype.storm.utils.NimbusClient.<init>(NimbusClient.java:26)

at backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:19)

... 3 more

Caused by: java.net.ConnectException: Connection refused

at java.net.PlainSocketImpl.socketConnect(Native Method)

at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)

at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

at java.net.Socket.connect(Socket.java:579)

at org.apache.thrift7.transport.TSocket.open(TSocket.java:178)

... 9 more



Ruhollah Farchtchi
ruhollah....@gmail.com

Andrew Perepelytsya

unread,
Oct 31, 2013, 9:52:05 AM10/31/13
to Ruhollah Farchtchi, storm...@googlegroups.com, Andrew Perepelytsa, Olivier Renault

Identify the IP of the host where storm ui is running. Edit the client config file and point the nimbus host value to that IP. Again, this is the config used by storm when submitting topologies.

Andrew

Ruhollah Farchtchi

unread,
Oct 31, 2013, 10:13:29 AM10/31/13
to and...@hortonworks.com, storm...@googlegroups.com, Olivier Renault
Thanks. I have been looking at the configuration file and trying to rationalize the different "hosts". what is the difference between master.host and nimbus.host in the config? Currently my config shows nimbus.host on a different machine than the ui which is running on the machine that yarn says the storm-yarn app is running on. Should the master.host and nimbus.host the the same machine that yarn has allocated for the storm-yarn app? This would make sense but I just want to be sure about the two config parameters as it seems the nimbus.host was set by Storm while the master.host still points to "localhost".

Ruhollah Farchtchi
ruhollah....@gmail.com

Ruhollah Farchtchi

unread,
Oct 31, 2013, 10:26:41 AM10/31/13
to and...@hortonworks.com, storm...@googlegroups.com, Olivier Renault
Ok. I got word-count running, but I am really confused. Yarn says that storm-yarn is running on node4 and I've confirmed that the Storm UI is accessible on node4, however I had configured nimbus.host to be node5. When I go to Storm UI I saw that the Storm Supervisor was set to node3. After launching the wordcount topology I don't see the Storm Supervisor stuff anymore but I am able to see the topology summary. I've got a 5-node HDP 2.0 cluster with zookeeper on nodes 1-3. I'm wondering what is going on and how I should be laying out storm to have a consistent config.  Thanks for all the help here. 

Ruhollah Farchtchi
ruhollah....@gmail.com

Andrew Perepelytsya

unread,
Oct 31, 2013, 10:40:25 AM10/31/13
to Ruhollah Farchtchi, storm-yarn, Olivier Renault
Are you saying the supervisor disappears after a while?

I think you ran into the same issue as me before when installing the whole cluster from scratch. Long story short - in my case supervisor was trying to execute a native 'unzip' on the node, and the package wasn't installed. Ensure e.g. 'yum install unzip -y' on every node and re-submit the topology.

Cheers,
Andrew
--
Andrew Perepelytsya   Solutions Engineering

 
Phone:       914 439 55 45
Email:      and...@hortonworks.com
 
Follow Us:

photo

Andy Feng

unread,
Oct 31, 2013, 10:57:26 AM10/31/13
to Ruhollah Farchtchi, and...@hortonworks.com, storm...@googlegroups.com, Olivier Renault
Storm yarn master should create a storm yaml for you. After storm cluster is launched, you should download storm.yaml via getConfiguration command.

Currently, storm master launch nimbus and ui on the same node as master. Supervisors are launched on any nodes assigned by yarn.


Andy Feng

Sent from my iPhone

Dhanasekaran Anbalagan

unread,
Jan 6, 2014, 2:38:52 PM1/6/14
to storm...@googlegroups.com
Hi Andrew,

I Getting same exception, 

I have created yarn user. and changed ownership as well. Running storm-lanch in hdfs user. But still getting this exception I am using CDH5

due to AM Container for appattempt_1389036527411_0002_000002 exited with  exitCode: 1 due to: Exception from container-launch: 
org.apache.hadoop.util.Shell$ExitCodeException: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

RM -logs


Please guide me how to fix this.
Reply all
Reply to author
Forward
0 new messages