topology.newStream("spout1", spout).each(new Fields("entityID"), new SplitViewEntitiesIntoBits(), new Fields("EntityInstanceID", "BitID", "BitValueID")).groupBy(new Fields("BitID")).aggregate(new Fields("BitID", "BitValueID"), new StatisticsAggregator(), new Fields("SampleCount", "Total", "Mean", "Variance")).parallelismHint(6);
As you can see I gave the last step, which is the part I benchmarked, a parallelism of 6, assuming that would certainly get its performance up higher than a single thread. The StatisticsAggregator function is a CombinerAggregator implementation.
And here's the corresponding single-threaded Java code which does the same thing as the entire topology:
for (EntityInstance entityInstance : allEntities) {
for (final Bit b : entityInstance.getBitValues().keySet()) {
for (final BitValue bv : entityCoord.getBitValues().get(b).keySet()) {
aggregates.get(b).offer(bv);
}
}
}
Anything you can spot that might cause the large performance difference?