Trident/Storm Performance

Jason Kolb

unread,

Aug 16, 2012, 9:42:57 AM8/16/12

to storm...@googlegroups.com

Hi,

I recently wrote a Trident topology that seemed to be running slow, and so I benchmarked it against a single-threaded implementation of the same code in plain 'ol Java code. The time difference was pretty staggering--on my Macbook Pro it's 2 seconds for plain Java code, 34 seconds for the equivalent Trident topology--I was hoping you might be able to provide some feedback or point out anything I'm doing wrong.

Here's the topology:

topology.newStream("spout1", spout).each(new Fields("entityID"), new SplitViewEntitiesIntoBits(), new Fields("EntityInstanceID", "BitID", "BitValueID")).groupBy(new Fields("BitID")).aggregate(new Fields("BitID", "BitValueID"), new StatisticsAggregator(), new Fields("SampleCount", "Total", "Mean", "Variance")).parallelismHint(6);

As you can see I gave the last step, which is the part I benchmarked, a parallelism of 6, assuming that would certainly get its performance up higher than a single thread. The StatisticsAggregator function is a CombinerAggregator implementation.

And here's the corresponding single-threaded Java code which does the same thing as the entire topology:

for (EntityInstance entityInstance : allEntities) {
for (final Bit b : entityInstance.getBitValues().keySet()) {
for (final BitValue bv : entityCoord.getBitValues().get(b).keySet()) {
aggregates.get(b).offer(bv);
}
}
}

Anything you can spot that might cause the large performance difference?

Ted Dunning

unread,

Aug 16, 2012, 10:33:42 AM8/16/12

to storm...@googlegroups.com

Serialization is an obvious difference between your code and what Storm has to do. Communication and coordination of batches is another big difference.

Have you profiled anything?

Jason Kolb

unread,

Aug 16, 2012, 11:21:04 AM8/16/12

to storm...@googlegroups.com

No I haven't, I was curious if I was doing something obviously wrong before I attempt that. Nothing's very simple when dealing with the multiple processes that Storm requires :) Besides, I've noticed that any external interference with Storm (such as any type of debug operations) tends to spin up CPU utilization, and I'm afraid introducing a profiler might cause similar problems. Serialization makes sense as overhead, I assume you mean the serialization of the bolts/spouts/etc?

What's your feeling in general about that then... does it follow that Storm will only make sense performance-wise for data that's too large for a single machine to handle, and that the overhead associated with the distribution is the price you pay to handle that much data? If that's the case does it make sense to use Storm as a simple load-balancing mechanism to split the tasks into machine-sized chunks and do the heavy lifting using traditional code (as opposed to something like Trident)?

Nathan Marz

unread,

Aug 17, 2012, 4:31:12 AM8/17/12

to storm...@googlegroups.com

There is overhead to each batch of tuples that Trident processes. So if you only have a handful of tuples in each batch, the overhead will dominate (in addition to serialization that Ted brought up). In normal operation – on a cluster with non-trivial batches – Trident's performance is excellent. I benchmarked it as having essentially the same performance as vanilla Storm.

--
Twitter: @nathanmarz
http://nathanmarz.com

Reply all

Reply to author

Forward