I've found the reason. And it's quite simple to understand. Don't know why
I've missed it.
The reason for slow processing was the fact that specified time range was
too thin.
> Hi Eugeny,
> The mailing list stripped your attachement (as it often does) so you
> might want to put it on a public web server.
> I don't have much to contribute except than to point to a recent
> conversation that you can find here:
> http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28722
> Hope this helps,
> J-D
> On Fri, Sep 21, 2012 at 5:03 AM, Eugeny Morozov
> <emoro...@griddynamics.com> wrote:
> > Hello!
> > It is known and I saw it in the code that time range set by
> > scan.setTimeRange is used to filter out HFiles for further scan.
> > Which means that speed of following scanner.next must be almost zero in
> case
> > if I set time range far away in future. I am sure that I do not have
> HFiles
> > that fall into the set time range period.
> > But - and here is the question - surprisingly scanning with set time
> range
> > is far longer than without it.
> > My results are following:
> > Use range [false]. Time spent (avg): [0]
> > Use range [true]. Time spent (avg): [525]
> > There are KeyValues listed, when time range is not used.
> > The code is following:
> > public static void run(boolean useRange, HTable table) throws
> Exception
> > {
> > Scan scan = new Scan().addFamily( family ).setCaching( -1
> > ).setCacheBlocks( false );
> > scan.setStartRow( random start row );
> > if (useRange) scan.setTimeRange(1348114401600L, 1348114401700L);
> > ResultScanner scanner = table.getScanner(scan);
> > for(int i = 0 ; i < N; i++) { // There were bunch of measures,
> where
> > N was from 10 to 50
> > long time = System.currentTimeMillis();
> > result = scanner.next();
> > sum += (System.currentTimeMillis() - time) / N;
> > }
> > }
> > Of course such a measurements are include all sort of noise like network
> > overhead, etc, but I'm using virtual machine on my own box, and at the
> time
> > I do measurement there is no other activity neither on my own box or this
> > virtual machine, so such a noise is minimum.
> > Also I've used YourKit to measure tracing and sampling of running
> > HRegionServer, but didn't found anything suspicious. Though I didn't
> look at
> > heap and GC perf. Tracing is in attach.
> > So, the question is why is it so slow when time range is set and so fast
> > without it?
> > --
> > Evgeny Morozov
> > Developer Grid Dynamics
> > Skype: morozov.evgeny
> > www.griddynamics.com
> > emoro...@griddynamics.com