About the parallelism of Spout

32 views
Skip to first unread message

Zheng Xue

unread,
Mar 11, 2014, 8:52:10 AM3/11/14
to storm...@googlegroups.com
Hi, all:

   I am a new guy here. I have a question about the parallelism of Spout. In the nextTuple method of Spout, Storm will read a HDFS file, take a sleep of 100 ms, and then next round. If I set the executor number of Spout to 4, will these 4 instances read all the same HDFS files?


--
------------------
Best Regards!

Zheng Xue
http://netlab.sysu.edu.cn/~zhxue
Computer Science and Technology college,Sun Yat-sen University
E-mail: xuez...@gmail.comxuez...@acm.org

Bobby Evans

unread,
Mar 11, 2014, 10:01:51 AM3/11/14
to Zheng Xue, storm...@googlegroups.com
That depends on how you wrote your spout.  If it was something like

Open(…) {
   _input = fs.open(new Path(“/foo/bar”));
}

Then yes they will all read the exact same file.

If you put in some intelligence where it takes the spout number into account like

Open(…) {
  Int index = context.getThisTaskIndex();
  _input = fs.open(new Path(“/foo/bar_”+index));
}

Then it would not.  But that it up to you to write.

—Bobby

--
You received this message because you are subscribed to the Google Groups "storm-yarn" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-yarn+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages