Trying to understand implications of executors in 0.8.0

1,023 views
Skip to first unread message

Curt Kohler

unread,
Aug 20, 2012, 9:44:36 PM8/20/12
to storm-user
Having recently upgraded to Storm 0.8.0, I've been trying to
understand the impact of executors as I modify my old 0.7.x topology
to take advantage of the new scalability capabilities. In 0.7.x
topologies, you basically specified the number of JVMs available for
use by the topology (workers) and the number of copies of each
component you wanted to run (parallelism/tasks). Storm would then
assign the requested number of tasks across the workers. As I
understand the changes in 0.8.0, the semantics of the parallelism
parameter in the setBolt() call has been modified (being a new
abstract container referred to as an executor). In order to keep the
same equivalent topology between a 0.7.x topology and an 0.8.0
topology, the statement:

builder.setBolt('name', new BoltSubClass(),
copies).setShuffleGrouping('field')

would need to turn into

builder.setBolt('name', new BoltSubClass(),
copies).setNumTasks(copies).setShuffleGrouping('field')

Furthermore, as I read things, the setNumTasks() call effectively sets
the number of copies of a component for the life of the topology (you
can't grow/shrink it without restarting everything) while you can use
the storm re-balance APIs to add/remove workers/executors as needed.

Are there any benefits/drawbacks to boosting the number of tasks for
components at startup beyond what you would normally need? For
example, if you have "bursty" stream where occasionally you get large
spikes in processing where it would be nice to have extra copies of
components available to scale up across more workers/executors but
aren't necessary most of the time?

Thanks,
Curt

Nathan Marz

unread,
Aug 21, 2012, 3:16:31 AM8/21/12
to storm...@googlegroups.com
0.8.0 behaves in the same fashion as 0.7.x by default. If you don't explicitly set the number of tasks, the number of tasks is set to the initial parallelism for that component for the lifetime of the topology (which is set via the parallelism hint). The parallelism hint used to indicate the number of tasks, now it indicates the number of executors. So this:

topology.setBolt("bolt", new MyBolt(), 8).shuffleGrouping("spout")

"bolt" will have 8 tasks, and it will initially have 8 executors. The number of executors for a component must be <= than the number of tasks. So if you want to be able to scale up a topology later on, you can do this:

topology.setBolt("bolt", new MyBolt(), 8).setNumTasks(24).shuffleGrouping("spout")

Initially "bolt" will have 8 executors with 3 tasks per executor, and you can adjust the number of executors up to 24 later on.

Tasks are very cheap, so having lots of them per executor is reasonable.
--
Twitter: @nathanmarz
http://nathanmarz.com

Curt Kohler

unread,
Aug 21, 2012, 10:02:42 AM8/21/12
to storm-user
Thanks
Reply all
Reply to author
Forward
0 new messages