Question about Improving performance

200 views
Skip to first unread message

Harold Lim

unread,
May 29, 2012, 7:31:39 PM5/29/12
to storm...@googlegroups.com
Hi,

I am trying to figure out where the bottleneck is in my topology and have simplified my topology into a spout and bolt.
The spout simply reads from a file (~10MB). Each call to nextTuple will simply read a line from a file, parse the line and emit it. The bolt currently does nothing except ack the tuple. Also, I disabled the reliability mechanism, #ackers = 0. 
The issue I have is it takes minutes to finish reading the whole file. I tried commenting out all of the emit calls in the spout to  measure the time for it to finish reading the whole file and it takes only a few seconds (4-5s) because at first I thought there may be a delay between calls to nextTuple(). However, this seems to be not the case. 

Any ideas how to improve the performance? I tried changing the zmq.threads and zmq.linger.millis values and it doesn't seem to help. I also tried changing the parallelism of the bolt and it doesn't seem to help too.

Thanks.


Steven Siebert

unread,
May 29, 2012, 8:14:26 PM5/29/12
to storm...@googlegroups.com
I'm wondering if it's the Spout implementation, specifically the file IO part.  Could you post your spout code?

What kind of performance do you get if you read the file-based tuples into an in-memory queue in the ISpout#open method and then just poll from that queue in nextTuple?

Regards,

Steve

Harold Lim

unread,
May 29, 2012, 9:46:14 PM5/29/12
to storm...@googlegroups.com
Hi Steve,

I don't think it's the file IO part. My file is stored in HDFS and I
am using the standard HDFS read API. Basically, in the open method, I
open a reader of the file. In the nextTuple, it reads a line. I then
performs some post processing, such as splitting the string and then
emitting them.

I tested this by also commenting the emit call and simply printing a
message when a file has been completely read and it takes only a few
seconds but with emit not commented, it takes longer to finish.

-Harold

Nathan Marz

unread,
May 29, 2012, 11:50:10 PM5/29/12
to storm...@googlegroups.com
First of all – if you want to understand where your performance bottleneck is, you should use a Java profiler rather than try to guess. I highly recommend YourKit, as it's really easy to use.

Storm 0.8.0 (in development) has significant performance improvements (4-5x). It's possible the perf improvements from that branch will help with your situation. The branch is pretty stable now, so you can give it a shot: https://github.com/nathanmarz/storm/tree/0.8.0
--
Twitter: @nathanmarz
http://nathanmarz.com

Harold Lim

unread,
May 30, 2012, 12:02:37 AM5/30/12
to storm...@googlegroups.com
Hi Nathan,

Thanks. I'll try both of them.

Is there an instruction on how to compile the branch? Also, does
storm-deploy work with 0.8.0?


-Harold

Nathan Marz

unread,
May 30, 2012, 12:05:59 AM5/30/12
to storm...@googlegroups.com
To compile the branch you'll need zeromq/jzmq installed locally, leiningen installed, and then you run "sh bin/build_release.sh" to build a release zipfile.

You can get storm-deploy to deploy 0.8.0 by searching for (sh "bin/build_release.sh") in the codebase, and then add:

(git checkout "0.8.0")

before that line.

Then run the deploy without a --version argument, e.g.:

lein run :deploy --start --mynewcluster

Harold Lim

unread,
May 30, 2012, 3:33:46 PM5/30/12
to storm...@googlegroups.com
Hi Nathan,

I compiled my code to 0.8.0 (also updating my customstreamgrouping
implementation to the new version). However, when I run my topology, I
get the following error:

java.lang.AbstractMethodError:
.grouper.CustomShuffleGrouping.prepare(Lbacktype/storm/task/WorkerTopologyContext;Lbacktype/storm/tuple/Fields;Ljava/util/List;)V


Any idea what's causing this problem? I checked my cluster from
storm-deploy and it is running storm 0.8.0.


-Harold

Nathan Marz

unread,
May 30, 2012, 4:01:52 PM5/30/12
to storm...@googlegroups.com
Is the "storm" script you're using also from the 0.8.0 release that you built?

Harold Lim

unread,
May 30, 2012, 5:03:33 PM5/30/12
to storm...@googlegroups.com
Hi Nathan,

Thanks. Turns out I was using the incorrect jar file. It's working now.

I do see improvement in the performance with 0.8.0. Compared to before
where it would take minutes for the spout to completely emit the whole
file, it now only takes ~10s.

Another question: Have you done any benchmarking or do you have any
benchmark numbers of Storm, such as the I/O throughput between tasks,
etc?


-Harold

Nathan Marz

unread,
May 30, 2012, 5:15:28 PM5/30/12
to storm...@googlegroups.com
A lot of it is hardware dependent, and there's a tradeoff between the size of your tuples and the amount of CPU you use vs. network. For example, on EC2 I saw my benchmark topology become bottlenecked by network at tuples around 100 bytes big (emitting about 320K tuples per second per node for 100 byte tuples). For 10 byte tuples, it's CPU constrained, emitting about 600K tuples per sec per node.

I'll be publishing some more thorough numbers soon.
Reply all
Reply to author
Forward
0 new messages