Problem running topology on a remote cluster

1,014 views
Skip to first unread message

jayati

unread,
Sep 28, 2011, 11:11:45 AM9/28/11
to storm-user
I have setup a cluster of 3 nodes, each having zookeeper and the
native dependencies installed.

1. conf/storm.yaml on nimbus looks like :
storm.zookeeper.servers:
- "192.168.41.67"
- "192.168.41.87"
storm.local.dir: "/home/ilab/storm_setups/storm_temp"
nimbus.host: "192.168.41.53"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703

2. and conf/storm.yaml on both worker nodes looks like :
storm.zookeeper.servers:
- "192.168.41.67"
- "192.168.41.87"
storm.local.dir: "/home/ilab/storm_setups/storm_temp"
nimbus.host: "192.168.41.53"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703

I have setup Storm locally on my machine and I am trying to run the
WordCountTopology in java (which I have tested on my local machine)
using jar

command : storm jar /home/jayati/Desktop/StormProj/WCTopology.jar
storm.starter.WordCountTopology

and it gets submitted successfully with the message :
0 [main] INFO backtype.storm.StormSubmitter - Jar not uploaded to
master yet. Submitting jar...
99 [main] INFO backtype.storm.StormSubmitter - Uploading topology
jar /home/jayati/Desktop/StormProj/WCTopology1.jar to assigned
location: /home/ilab/storm_setups/storm_temp/inbox/stormjar-
cff19274-9fa4-4a26-a1f9-a5610a1d5736.jar
1190 [main] INFO backtype.storm.StormSubmitter - Successfully
uploaded topology jar to assigned location: /home/ilab/storm_setups/
storm_temp/inbox/stormjar-cff19274-9fa4-4a26-a1f9-a5610a1d5736.jar
1192 [main] INFO backtype.storm.StormSubmitter - Submitting topology
word-count in distributed mode with conf
{"nimbus.host":"192.168.41.53","topology.workers":
5,"nimbus.thrift.port":6627,"topology.max.spout.pending":500}
1526 [main] INFO backtype.storm.StormSubmitter - Finished submitting
topology: word-count

I am not able to use the UI to track the topology as http://192.168.41.53:8080/
displays the tomcat server homepage.
Moreover, I can see some errors in the worker logs as well :

2011-09-28 20:44:38 worker [INFO] Launching worker for
wordcount-1-1317216287 on 1ddabfc9-bca5-4149-a937-3cd2c1c3cdee:6700
with id 9c6a9a54-a301-4064-8861-a9c6b547cbe4
2011-09-28 20:44:38 ZooKeeper [INFO] Initiating client connection,
connectString=192.168.41.67:2181,192.168.41.87:2181/
sessionTimeout=10000 watcher=backtype.storm.zookeeper$mk_client
$reify__1169@1f11507
2011-09-28 20:44:38 ClientCnxn [INFO] Opening socket connection to
server /192.168.41.67:2181
2011-09-28 20:44:38 ClientCnxn [INFO] Socket connection established to
hadoop-datanode1/192.168.41.67:2181, initiating session
2011-09-28 20:44:38 ClientCnxn [INFO] Session establishment complete
on server hadoop-datanode1/192.168.41.67:2181, sessionid =
0x1324d3d6fc900e9, negotiated timeout = 10000
2011-09-28 20:44:38 zookeeper [INFO] Zookeeper state
update: :connected:none
2011-09-28 20:44:38 ZooKeeper [INFO] Session: 0x1324d3d6fc900e9 closed
2011-09-28 20:44:38 ClientCnxn [INFO] EventThread shut down
2011-09-28 20:44:38 ZooKeeper [INFO] Initiating client connection,
connectString=192.168.41.67:2181,192.168.41.87:2181/storm
sessionTimeout=10000 watcher=backtype.storm.zookeeper$mk_client
$reify__1169@5b38d7
2011-09-28 20:44:38 ClientCnxn [INFO] Opening socket connection to
server /192.168.41.87:2181
2011-09-28 20:44:38 ClientCnxn [INFO] Opening socket connection to
server /192.168.41.87:2181
2011-09-28 20:44:38 ClientCnxn [INFO] Socket connection established to
hadoop-datanode2/192.168.41.87:2181, initiating session
2011-09-28 20:44:38 ClientCnxn [INFO] Session establishment complete
on server hadoop-datanode2/192.168.41.87:2181, sessionid =
0x2324d46f1a700cc, negotiated timeout = 10000
2011-09-28 20:44:39 worker [ERROR] Error on initialization of server
mk-worker
java.lang.UnsatisfiedLinkError: /usr/local/lib/libjzmq.so.0.0.0:
libzmq.so.1: cannot open shared object file: No such file or directory
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1750)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1675)
at java.lang.Runtime.loadLibrary0(Runtime.java:840)
at java.lang.System.loadLibrary(System.java:1047)
at org.zeromq.ZMQ.<clinit>(ZMQ.java:34)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at zilch.mq__init.load(Unknown Source)
at zilch.mq__init.<clinit>(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at clojure.lang.RT.loadClassForName(RT.java:1578)
at clojure.lang.RT.load(RT.java:399)
at clojure.lang.RT.load(RT.java:381)
at clojure.core$load$fn__4511.invoke(core.clj:4905)
at clojure.core$load.doInvoke(core.clj:4904)
at clojure.lang.RestFn.invoke(RestFn.java:409)
at clojure.core$load_one.invoke(core.clj:4729)
at clojure.core$load_lib.doInvoke(core.clj:4766)
at clojure.lang.RestFn.applyTo(RestFn.java:143)
at clojure.core$apply.invoke(core.clj:542)
at clojure.core$load_libs.doInvoke(core.clj:4800)
at clojure.lang.RestFn.applyTo(RestFn.java:138)
at clojure.core$apply.invoke(core.clj:542)
at clojure.core$require.doInvoke(core.clj:4869)
at clojure.lang.RestFn.invoke(RestFn.java:422)
at backtype.storm.messaging.zmq
$loading__4410__auto__.invoke(zmq.clj:1)
at backtype.storm.messaging.zmq__init.load(Unknown Source)
at backtype.storm.messaging.zmq__init.<clinit>(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at clojure.lang.RT.loadClassForName(RT.java:1578)
at clojure.lang.RT.load(RT.java:399)
at clojure.lang.RT.load(RT.java:381)
at clojure.core$load$fn__4511.invoke(core.clj:4905)
at clojure.core$load.doInvoke(core.clj:4904)
at clojure.lang.RestFn.invoke(RestFn.java:409)
at clojure.core$load_one.invoke(core.clj:4729)
at clojure.core$load_lib.doInvoke(core.clj:4766)
at clojure.lang.RestFn.applyTo(RestFn.java:143)
at clojure.core$apply.invoke(core.clj:542)
at clojure.core$load_libs.doInvoke(core.clj:4800)
at clojure.lang.RestFn.applyTo(RestFn.java:138)
at clojure.core$apply.invoke(core.clj:542)
at clojure.core$require.doInvoke(core.clj:4869)
at clojure.lang.RestFn.invoke(RestFn.java:409)
at backtype.storm.messaging.loader
$mk_zmq_context.doInvoke(loader.clj:8)
at clojure.lang.RestFn.invoke(RestFn.java:437)
at backtype.storm.daemon.worker
$fn__2965$exec_fn__837__auto____2966.invoke(worker.clj:109)

69,1-8 5%
at clojure.lang.AFn.applyToHelper(AFn.java:187)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:540)
at backtype.storm.daemon.worker
$fn__2965$mk_worker__3103.doInvoke(worker.clj:78)
at clojure.lang.RestFn.invoke(RestFn.java:513)
at backtype.storm.daemon.worker$_main.invoke(worker.clj:244)
at clojure.lang.AFn.applyToHelper(AFn.java:174)
at clojure.lang.AFn.applyTo(AFn.java:151)
at backtype.storm.daemon.worker.main(Unknown Source)
2011-09-28 20:44:39 util [INFO] Halting process: ("Error on
initialization")


I might be sounding silly .. but please help !!!

Jayati

nathanmarz

unread,
Sep 28, 2011, 4:11:14 PM9/28/11
to storm-user
Hey Jayati,

I have a few questions that should help me debug this:

1. What OS are you installing on?
2. To what path were the native libs installed?
3. What Storm release are you running?

-Nathan
> I am not able to use the UI to track the topology ashttp://192.168.41.53:8080/

jayati tiwari

unread,
Sep 28, 2011, 4:49:33 PM9/28/11
to storm...@googlegroups.com
Hi,

I have installed on Linux machines (all of them ubuntu version 10+)  and for all the native libs I have used my home folder, exact path is 

"/home/jayati/storm_setups". I am using storm-0.5.2.
 
Jayati

nathanmarz

unread,
Sep 28, 2011, 4:56:55 PM9/28/11
to storm-user
Can you try adding the following line to your storm.yaml files and
then restart the daemons?

java.library.path: /home/jayati/storm_setups

-Nathan
Message has been deleted

jayati tiwari

unread,
Sep 29, 2011, 8:14:40 AM9/29/11
to storm...@googlegroups.com

Hi,

As per your suggestion, I tried with that line added to  the storm.yaml files but one of the worker node stopped working.
Moreover I was confused if you wanted me to specify the path where java is setup so, I next tried
java.library.path: /usr/lib/jvm
since I have java installed at this location. Still the error persists.

I am attaching the log files of the master and the worker machines if that could make the problem clearer.

- Jayati
nimbus.log
supervisor.log
supervisor.log

nathanmarz

unread,
Sep 29, 2011, 2:20:13 PM9/29/11
to storm-user
java.library.path needs to be where the zeromq binaries are installed.
For example, on my system the path looks like this:
ls -lh /usr/local/lib | grep zmq
-rwxr-xr-x  1 root  wheel    27K Jul  7 13:32 libjzmq.0.dylib-rw-
r-----  1 root  wheel   273K Jul  7 13:32 libjzmq.alrwxr-x---  1 root
 wheel    15B Jul  7 13:32 libjzmq.dylib-rwxr-xr-x  1 root  wheel  
943B Jul  7 13:32 libjzmq.la-rwxr-xr-x  1 root  wheel   320K Jul  7
13:30 libzmq.1.dylib-rw-r-----  1 root  wheel   5.1M Jul  7 13:30
libzmq.alrwxr-x---  1 root  wheel    14B Jul  7 13:30 libzmq.dylib-
rwxr-xr-x  1 root  wheel   938B Jul  7 13:30 libzmq.la

The logs indicate that the workers are unable to startup. I have two
recommendations:

1. Upgrade to Storm 0.5.3
2. Set the "java.library.path" conf to the location described above
3. Test that it's working with ExclamationTopology (since it doesn't
do multilang)
3. If that doesn't work, send me the worker logs which will contain
the cause of the error

-Nathan


On Sep 29, 5:14 am, jayati tiwari <tiwarijay...@gmail.com> wrote:
> Hi,
>
> ...
>
> read more »
>
>  nimbus.log
> 48KViewDownload
>
>  supervisor.log
> 275KViewDownload
>
>  supervisor.log
> 13KViewDownload

jayati tiwari

unread,
Oct 1, 2011, 5:45:47 PM10/1/11
to storm...@googlegroups.com


Hi,

I have upgraded to 0.5.3 version and after that I configured the .yaml file and placed a copy of it in ~/.storm/ folder since the cluster was unable to start up without this.

I have also added "java.library.path" to the .yaml files and tried to run the ExclamationTopology. But it did not solve the problem.

I am attaching the supervisor and worker logs of one of the two worker nodes as the logs ot the other node are just the same. Along with that I have also attached the nimbus log, the .yaml files of one of the worker and the master node.

Also the output of the command   "ls -lh /usr/local/lib | grep zmq" :


ls -lh /usr/local/lib | grep zmq
-rw-r--r-- 1 root root  295K 2011-09-28 15:16 libjzmq.a
-rwxr-xr-x 1 root root  1007 2011-09-28 15:16 libjzmq.la
lrwxrwxrwx 1 root root    16 2011-09-28 15:16 libjzmq.so -> libjzmq.so.0.0.0
lrwxrwxrwx 1 root root    16 2011-09-28 15:16 libjzmq.so.0 -> libjzmq.so.0.0.0
-rwxr-xr-x 1 root root  176K 2011-09-28 15:16 libjzmq.so.0.0.0
-rw-r--r-- 1 root root  2.9M 2011-09-28 13:52 libzmq.a
-rwxr-xr-x 1 root root   958 2011-09-28 13:52 libzmq.la
lrwxrwxrwx 1 root root    15 2011-09-28 13:52 libzmq.so -> libzmq.so.1.0.0
lrwxrwxrwx 1 root root    15 2011-09-28 13:52 libzmq.so.1 -> libzmq.so.1.0.0
-rwxr-xr-x 1 root root  1.6M 2011-09-28 13:52 libzmq.so.1.0.0

Thanks,
Jayati
nimbus.log
supervisor.log
worker-6700.log
storm_master.yaml
storm_worker.yaml

nathanmarz

unread,
Oct 2, 2011, 6:47:13 PM10/2/11
to storm-user
It looks jzmq binaries are loading, but jzmq is having trouble finding
zmq. Can you try setting the LD_LIBRARY_PATH environment variable to "/
usr/local/lib". Like this:
export LD_LIBRARY_PATH=/usr/local/lib
Let me know if that fixes things.
-Nathan


On Oct 1, 2:45 pm, jayati tiwari <tiwarijay...@gmail.com> wrote:
> Hi,
>
> ...
>
> read more »
>
>  nimbus.log
> 28KViewDownload
>
>  supervisor.log
> 74KViewDownload
>
>  worker-6700.log
> 148KViewDownload
>
>  storm_master.yaml
> < 1KViewDownload
>
>  storm_worker.yaml
> < 1KViewDownload

jayati tiwari

unread,
Oct 4, 2011, 9:53:23 AM10/4/11
to storm...@googlegroups.com
Hi,

I have set the LD_LIBRARY_PATH and along with that I also found that JAVA_HOME was not set on my worker machines but was set on nimbus.

So after setting these two, I could successfully run the Exclamation Topology on the cluster.

Lots of thanks for the patient help.

- Jayati

nathanmarz

unread,
Oct 4, 2011, 5:19:37 PM10/4/11
to storm-user
Awesome! Glad we finally got things working.



On Oct 4, 6:53 am, jayati tiwari <tiwarijay...@gmail.com> wrote:
> Hi,
>
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages