We're looking into using Storm at Knewton. Currently I am putting
together proof of concepts using Storm. So far I am very impressed.
Great work.
One of our needs is the ability to dynamically edit a topology from a
given bolt. A trivial example use case is: bolt A counts the number of
tuples it has received and emits the tuple count to bolt B. Once bolt
A receives the 1,000th tuple it stops emitting to bolt B and instead
emits all tuples to bolt C. It appears this can currently be handled
by predefining the topology, but what are thoughts on extending Nimbus
to allow requests such as "addFieldGrouping"? Our current needs only
have to do with adding/deleting connections between spouts/bolts and
not adding any new application code to the cluster. Is this something
that coincides with the general direction that Storm is heading? We
would be interested in contributing to Storm and adding these
features, but wanted to hear thoughts by the authors.
Thank you.
Trevor
Thanks for your reply. The problems:
1) I want to have a given bolt decide in its emit method what bolt or
bolts to emit to from a given "group" of bolts. Ie not a broadcast.
Assume that all possible consuming bolts will be taking a tuple of the
same type.
2) Ideally I would like to be able to dynamically spin up extra
consuming bolts with low cost. At the moment this seems it would
require a topology restart. The "swapping" feature you described would
probably solve this. I am interested in hearing more about the
swapping feature.
Disclaimer: I have only spent a few hours looking into the source of
Storm. I can see solving problem one by:
1) Adding a field to the tuple being emitted by the bolt. The decision
to interact or not interact with the tuple is then up to consuming
bolts. Cons: lots of unnecessary traffic.
2) Add one output stream for each consumer. Cons: one stream for each
bolt seems a bit excessive (however I am not that familiar with Storm
so perhaps that is fine). Topology definition becomes more complex.
3) Use "directGrouping". Have the emitter bolt keep a mapping between
component and task ids. Choose to send to a given bolt by sending out
to all task ids for the given component. This seems like the best
solution.
Thoughts? Thanks.
Trevor
Does the need for bolts to participate in multiple flows break the intended use-case for Storm? Or is there a workaround that you've run into for cases like this, which might typically be implemented as a federation of custom pub-sub interfaces.
Sincerely yours,
Apostolis Xekoukoulotakis