We're just evaluating storm and looking into how we can use it in a HA
setup for dealing with real-time requests.
From the wiki I can see there is quite a bit of automate fault
tolerance in the design but I've got a few questions on parts of it.
With the Nimbus if you start a pair of these for fail-over in the case
of a machine failing can both be running connected to zookeeper and
fail over when one dies? Or would we have to use linux-ha to ensure
that we've got one running at a time? We'll need to use the latter for
ip fail-over for the Nimbus anyway.
For the worker processes, if a machines with a set of workers on it
dies, what timeouts have to occur before they will get reassigned to
another task slot?
Also with the task slots if the storm cluster has fewer task slots
free than a topology asks for does it just get assigned what's left
and spreads it's workers over those instead?
Any other tips/experiences on productions setups for storm would be
great to hear about too!
Thanks,
Dan