Thanks. I'll try both of them.
On Tue, May 29, 2012 at 11:50 PM, Nathan Marz <nathan.m...
> First of all ľ if you want to understand where your performance bottleneck
> is, you should use a Java profiler rather than try to guess. I highly
> recommend YourKit, as it's really easy to use.
> Storm 0.8.0 (in development) has significant performance improvements
> (4-5x). It's possible the perf improvements from that branch will help with
> your situation. The branch is pretty stable now, so you can give it a shot:
> On Tue, May 29, 2012 at 6:46 PM, Harold Lim <harold.c....@gmail.com> wrote:
>> Hi Steve,
>> I don't think it's the file IO part. My file is stored in HDFS and I
>> am using the standard HDFS read API. Basically, in the open method, I
>> open a reader of the file. In the nextTuple, it reads a line. I then
>> performs some post processing, such as splitting the string and then
>> emitting them.
>> I tested this by also commenting the emit call and simply printing a
>> message when a file has been completely read and it takes only a few
>> seconds but with emit not commented, it takes longer to finish.
>> On Tue, May 29, 2012 at 8:14 PM, Steven Siebert <smsi...@gmail.com> wrote:
>> > I'm wondering if it's the Spout implementation, specifically the file IO
>> > part. áCould you post your spout code?
>> > What kind of performance do you get if you read the file-based tuples
>> > into
>> > an in-memory queue in the ISpout#open method and then just poll from
>> > that
>> > queue in nextTuple?
>> > Regards,
>> > Steve
>> > On Tue, May 29, 2012 at 7:31 PM, Harold Lim <harold.c....@gmail.com>
>> > wrote:
>> >> Hi,
>> >> I am trying to figure out where the bottleneck is in my topology and
>> >> have
>> >> simplified my topology into a spout and bolt.
>> >> The spout simply reads from a file (~10MB). Each call to nextTuple will
>> >> simply read a line from a file, parse the line and emit it. The bolt
>> >> currently does nothing except ack the tuple. Also, I disabled the
>> >> reliability mechanism, #ackers = 0.
>> >> The issue I have is it takes minutes to finish reading the whole file.
>> >> I
>> >> tried commenting out all of the emit calls in the spout to ámeasure the
>> >> time
>> >> for it to finish reading the whole file and it takes only a few seconds
>> >> (4-5s) because at first I thought there may be a delay between calls to
>> >> nextTuple(). However, this seems to be not the case.
>> >> Any ideas how to improve the performance? I tried changing
>> >> theázmq.threads
>> >> andázmq.linger.millis values and it doesn't seem to help. I also tried
>> >> changing the parallelism of the bolt and it doesn't seem to help too.
>> >> Thanks.
> Twitter: @nathanmarz