Re: [storm-user] stopping supervisor and workers

Shrijeet Paliwal

unread,

Jul 27, 2012, 6:53:22 PM7/27/12

to storm...@googlegroups.com

Brayn,
I think you want this https://github.com/nathanmarz/storm/pull/238

--
Shrijeet

On Fri, Jul 27, 2012 at 2:43 PM, Bryan Talbot <bata...@gmail.com> wrote:
> When running storm's supervisor under supervision using supervisord
> (confusing naming I know), what signal should supervisord send to storm's
> supervisor so that storm's supervisor kills any running workers before
> storm's supervisor exits?
>
> All signals I've tried (INT, TERM, QUIT, HUP, USR1, USR2, etc) are either
> ignored by storm's supervisor or cause it to exit and orphan the workers.
> The works must then be manually killed. In most cases, shouldn't storm's
> supervisor process kill any of it's worker children before it exits?
>

Bryan Talbot

unread,

Jul 27, 2012, 7:33:21 PM7/27/12

to storm...@googlegroups.com

I am using that patched version of the storm start script. The issue is that supervisord (the supervisor) sends a signal directly to storm's supervisor and storm's supervisor does indeed stop; however, the workers that storm's supervisor had spawned remain as orphans with PPID=1. Maybe the workers should exit when they receive a HUP when their parent dies?

root 3311 0.0 0.0 160664 9576 ? Ss Jul26 0:00 /usr/bin/python /usr/bin/supervisord
storm 23025 0.4 1.0 1417020 188576 ? Sl 18:39 0:13 \_ java -server -Dstorm.options= -Dstorm.home=/virtual/storm/storm-0.8.0-SNA

-Bryan

Nathan Marz

unread,

Jul 30, 2012, 4:54:08 AM7/30/12

to storm...@googlegroups.com

The supervisors are required for killing workers / worker subprocesses approrpriately. If you want to clean up the machine of all Storm processes a killall java or a simple shell command can get the job done.

--
Twitter: @nathanmarz
http://nathanmarz.com

mayank gururani

unread,

Jul 30, 2012, 5:11:58 AM7/30/12

to storm...@googlegroups.com

Thanks for your reply.

But see my question is not like that.

I am asking how the worker communicates with each other. As in master slave architecture the slaves never communicates with each other. Thay always communicates with master node only. But in storm nimbus is only responsible for accepting the client request and initiating the process. Here the worker communicate with master as well as they communicates with themselves. So how they communicates. (Storm is not a single point of failure).

Let take an aexample:

We have 4 supervisor: A,B,C,D on different clusters.

Each supervisor having 3 workers.

Some topology is created on nimbus and process initiated by client.

Now during the process the worker have to communicates with each other for emiting for processing the emitted data.

How they whoud know and how they communicates with peer worker about the data emition.

Regards--

Mayank Kumar gururani

Bryan Talbot

unread,

Jul 30, 2012, 2:46:54 PM7/30/12

to storm...@googlegroups.com

So the supervisors won't kill their child processes before they exit even when they have the chance (it's not a SIGKILL)? Can the workers at least gracefully die if they receive a SIGHUP which most OS's will send orphaned children? I don't know Clojure (yet) so finding the right places to patch is not trivial.

Having to manually kill workers when restarting supervisors due to a deployment change seems pretty messy. It means restart of supervisor is a two step process to avoid wasting a lot of resources:

1) supervisorctl restart storm-supervisor

2) pkill -TERM -P 1 -u storm -f ' backtype.storm.daemon.worker '

Puppet takes care of restarting storm-supervisor for me when the configuration changes, but workers must be manually killed for now.

-Bryan

Nathan Marz

unread,

Jul 30, 2012, 2:58:49 PM7/30/12

to storm...@googlegroups.com

Storm is designed so that supervisors can die/restart without affecting the workers. Having the life of the workers be independent on the life of Nimbus or Supervisors makes things much easier to manage. It's a pretty minor tradeoff given how easy it is to just kill all the worker processes if that's what you want to do.

Bryan Talbot

unread,

Jul 30, 2012, 3:41:28 PM7/30/12

to storm...@googlegroups.com

Oh, I was assuming this was a bug but it sounds like it's a feature. I assumed that the parent-child relationship between supervisor and worker was needed for the worker to continue to function properly, but that doesn't seem to be the case. Should I assume that any new supervisor will discover (via zookeeper) that workers from a previous supervisor are still running on its host and will not attempt to spawn new workers on the still-in-use ports?

-Bryan

On Monday, July 30, 2012 11:58:49 AM UTC-7, nathanmarz wrote:

Storm is designed so that supervisors can die/restart without affecting the workers. Having the life of the workers be independent on the life of Nimbus or Supervisors makes things much easier to manage. It's a pretty minor tradeoff given how easy it is to just kill all the worker processes if that's what you want to do.

Nathan Marz

unread,

Jul 30, 2012, 3:42:18 PM7/30/12

to storm...@googlegroups.com

That's correct, as long as it uses the same local dir as the previous supervisor.

Reply all

Reply to author

Forward