(Storm version 0.8.1)
I'm in the process of performance tuning some topologies we recently converted over to trident, and I'm having trouble getting the resulting bolts parallelized the way I want.
In storm, it's pretty straightforward. For example, if I set a bolt in a topology like so:
builder.setBolt(SPLIT_BOLT_ID, splitBolt, 3).shuffleGrouping(SENTENCE_SPOUT_ID);
Then the "splitBolt" will be assigned 3 tasks in the topology.
With trident however, it's not as clear (at least not to me) since you set parallelism on the Stream class. We have a trident topology that's not unlike the one depicted here:
So looking at the first spout and bolt in that diagram (upper left), If I wanted to assign the spout a parallelism hint of 1, and the first bolt a parallelism of 3, I would think I would do something like the following:
Stream stream = topology.newStream("myStream", spout);
stream = stream.parallelismHint(1).each(…).each(…).parallelismHint(3).groupBy(…);
But I'm not seeing the results I'm expecting. I've tried moving the "parallelismHint()" calls around within the topology definition, and am completely baffled by how it plays out when deployed to a cluster. I'm using storm-ui to determine how each resulting bolt got parallelized (which may be the problem). In some cases attempting to set the parallelism of a bolt actually altered the parallelism of the spout.
I'm assuming (perhaps wrongly) that if a trident topology compiles down to 5 bolts, that they will be numbered ("bolt0" through "bolt4") consistently between topology submissions -- i.e. If a topology is submitted/killed multiple times, can I safely assume that "bolt0" always represents the same bolt?
Am I missing something simple? I can't share the actual topology code, but could put together a simple example if that would help.
Thanks in advance,
- Taylor