Split and merge between machines

59 views
Skip to first unread message

suhridk

unread,
Apr 6, 2012, 8:17:48 AM4/6/12
to storm-user
Hello,

I have a very simple storm topology that I have running on three
machines - A, B and C. (The topology has a spout and a bolt.)

I have some data (e.g. on a file) which I would like to feed to each
of the three topologies.
i.e. I need to split the data equally among the three different
machines.
Once each topology finishes its computation, I would then like to
merge the results of the computation into a single place (e.g a file)
on one machine.

How can I achieve this in Storm ?

Regards,
Suhrid.

James Xu

unread,
Apr 6, 2012, 9:49:01 AM4/6/12
to storm...@googlegroups.com
In storm, topologies do not interact with each other. For your case, you want to split the file contents, you can use a shuffle grouping; your want to merge the results you can let the merger subscribe to all the file content processor(bolts). For more about stream grouping, read this: https://github.com/nathanmarz/storm/wiki/Concepts

2012/4/6 suhridk <suh...@gmail.com>

suhridk

unread,
Apr 6, 2012, 10:33:02 AM4/6/12
to storm-user
Thanks for your email.

I understand that topologies cannot interact with each other. Also,
within a topology I'm already using shuffle grouping to split data
between tasks.

Are there some good techniques to deal with the problem across
machines ? I think it may be a common problem/use case. Because in
many cases, the source of the data will be available on only one
machine. Also, we would like to merge the results into a single
location.

I suppose this will have to be implemented manually. Is Kestrel a good
solution for this ?

Cheers,
Suhrid.

On Apr 6, 2:49 pm, James Xu <xumingming64398...@gmail.com> wrote:
> In storm, topologies do not interact with each other. For your case, you
> want to split the file contents, you can use a shuffle grouping; your want
> to merge the results you can let the merger subscribe to all the file
> content processor(bolts). For more about stream grouping, read this:https://github.com/nathanmarz/storm/wiki/Concepts
>
> 2012/4/6 suhridk <suhr...@gmail.com>
Reply all
Reply to author
Forward
0 new messages