Processing stops with several supervisors

69 views
Skip to first unread message

Linus Wallgren

unread,
May 14, 2013, 10:46:05 AM5/14/13
to storm...@googlegroups.com
Hi,

I have been trying to get a storm cluster up and running now for a
while. When I run nimbus and one supervisor everything works great, but
as soon as I add another supervisor the topology stops processing.

I am testing everything with storm.starter.ExclamationTopology from the
storm-starter project on storm 0.8.2.

It seems to me that the workers cannot be started and are thus being
killed by the supervisor, however I cannot find a reason why the workers
cannot be started.

The supervisor keeps outputing lines like the following:

2013-05-14 16:21:09 supervisor [INFO]
fcdad4bc-12cc-4570-a74e-45f04ba71956 still hasn't started

I have attached the logfiles for nimbus, the supervisors and one of the
workers that fail to start. supervisor_nimbus.log is from the supervisor
running on the same machine as nimbus and the worker log is from the
machine running the other supervisor.

I have tried to see if I can find any information in the logfiles with
ALL logging enabled, without any success.

Any help on how to further debug the issue would be appreciated.

Have a nice day!
Linus
supervisor.log
worker-6703.log
nimbus.log
supervisor_nimbus.log

Kurtis Mullins

unread,
May 16, 2013, 10:35:34 PM5/16/13
to storm...@googlegroups.com
We started facing a (potentially) similar problem after adding an additional two supervisors to the cluster this past week. I haven't had time to debug the problem -- but as soon as we figure out the solution, I'll see if it correlates with your issue and share our results. Good luck!



Linus

--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Viral Bajaria

unread,
May 16, 2013, 10:41:17 PM5/16/13
to storm...@googlegroups.com
Have you tried manually starting the worker process and seeing the error ? In the supervisor logs you will see a "java" command for the worker. Try running it manually when you submit the topology and see why it is not starting up.

Thanks,
Viral

Linus Wallgren

unread,
May 17, 2013, 3:36:49 AM5/17/13
to storm...@googlegroups.com, Viral Bajaria
Yes I have, normally when I do that I get a IOException (No such file or
directory) which i previosly tracked down to a temporary directory not
existing in storm.local.dir (Im guessing the supervisor deletes it when
it considers that particular worker dead).

If I am fast enough (that is, before the directory is deleted) and
change the port (just to be safe) the process is killed after a few
seconds, without anything of interest in the log (which I have attached)

I have a feeling it is something very trivial I am missing, but I
haven't yet been able to figure out what :(

Have a nice day
Linus
> fcdad4bc-12cc-4570-a74e-__45f04ba71956 still hasn't started
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "storm-user" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to storm-user+...@googlegroups.com.
asdf.log
Reply all
Reply to author
Forward
0 new messages