Parallelism in Trident

1,412 views
Skip to first unread message

P. Taylor Goetz

unread,
Mar 9, 2013, 2:17:24 PM3/9/13
to storm...@googlegroups.com
(Storm version 0.8.1)

I'm in the process of performance tuning some topologies we recently converted over to trident, and I'm having trouble getting the resulting bolts parallelized the way I want.

In storm, it's pretty straightforward. For example, if I set a bolt in a topology like so:

builder.setBolt(SPLIT_BOLT_ID, splitBolt, 3).shuffleGrouping(SENTENCE_SPOUT_ID);

Then the "splitBolt" will be assigned 3 tasks in the topology.

With trident however, it's not as clear (at least not to me) since you set parallelism on the Stream class. We have a trident topology that's not unlike the one depicted here:


So looking at the first spout and bolt in that diagram (upper left), If I wanted to assign the spout a parallelism hint of 1, and the first bolt a parallelism of 3, I would think I would do something like the following:

Stream stream = topology.newStream("myStream", spout);
stream = stream.parallelismHint(1).each(…).each(…).parallelismHint(3).groupBy(…);

But I'm not seeing the results I'm expecting. I've tried moving the "parallelismHint()" calls around within the topology definition, and am completely baffled by how it plays out when deployed to a cluster. I'm using storm-ui to determine how each resulting bolt got parallelized (which may be the problem). In some cases attempting to set the parallelism of a bolt actually altered the parallelism of the spout.

I'm assuming (perhaps wrongly) that if a trident topology compiles down to 5 bolts, that they will be numbered ("bolt0" through "bolt4") consistently between topology submissions -- i.e. If a topology is submitted/killed multiple times, can I safely assume that "bolt0" always represents the same bolt?

Am I missing something simple? I can't share the actual topology code, but could put together a simple example if that would help.

Thanks in advance,

- Taylor

Nathan Marz

unread,
Mar 10, 2013, 4:58:00 PM3/10/13
to storm...@googlegroups.com
I recommend using the "name" function to name portions of your stream so that the UI shows you what bolts correspond to what sections.

Trident packs operations into as few bolts as possible. In addition, it *never* repartitions your stream unless you've done an operation that explicitly involves a repartitioning (e.g. shuffle, groupBy, partitionBy, global aggregation, etc). This property of Trident ensures that you can control the ordering/semi-ordering of how things are processed. So in this case, everything before the groupBy has to have the same parallelism or else Trident would have to repartition the stream. And since you didn't say you wanted the stream repartitioned, it can't do that. You can get a different parallelism for the spout vs. the each's following by introducing a repartitioning operation, like so:

stream.parallelismHint(1).shuffle().each(…).each(…).parallelismHint(3).groupBy(…);



--
You received this message because you are subscribed to the Google Groups "storm-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to storm-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Twitter: @nathanmarz
http://nathanmarz.com
Message has been deleted

ak

unread,
Dec 10, 2013, 3:47:13 AM12/10/13
to storm...@googlegroups.com, nat...@nathanmarz.com
显然这么做是无效的

在 2013年3月11日星期一UTC+8上午4时58分00秒,Nathan Marz写道:

ak

unread,
Dec 10, 2013, 4:13:23 AM12/10/13
to storm...@googlegroups.com, nat...@nathanmarz.com
storm 0.9 rc3 under the test is invalid

在 2013年12月10日星期二UTC+8下午4时47分13秒,ak写道:
Reply all
Reply to author
Forward
0 new messages