Asynchbase use case

398 views
Skip to first unread message

Michael Morello

unread,
May 2, 2012, 5:32:52 AM5/2/12
to async...@googlegroups.com
Hi all,

I'm trying to understand how AsyncHBase works and in which cases it is better to use it rather than the HBase's client (HTable) or a map/reduce job.
I have setup a simple client that read a whole file and import it into HBase :

public static void main(String[] args) throws IOException {
....
HBaseClient hBaseClient = new HBaseClient(zookeeper_quorum);
...
while ((line = reader.readLine()) != null) {
....
PutRequest putRequest = new PutRequest(TABLE_NAME, rowkey,
                            COLUMN_FAMILY_NAME, valueAsBytes);
hBaseClient.put(putRequest);
}

Unfortunately i have strange behavior : when i run it CPU is fully loaded and nothing is added to the table. (hBaseClient.stats().numBatchedRpcSent(); remains at 0)
After doing some investigations it seems that the bottleneck is RegionClient.acquireMetaLookupPermit() Actually it seems it is called a lot of times (may be each time put is called) and that the HBase regions for .META. and for my table are never looked up.
I think that it is a starvation situation but i'm not sure how to prove it.

A workaround is to call :

Deferred<Object> ensureTableExists = hBaseClient.ensureTableExists(TABLE_NAME);
ensureTableExists.joinUninterruptibly();

before the while loop, then everything is ok : all regions are successfully looked up and lines are inserted with a excellent throughput.

My question is whether my test case is definitively dumb :) (i understand that Asynchbase fits better into a pure async framework like Netty and map/reduce is a better choice in this case) or if it is a unexpected behavior ?

Anyway thank you for your great work.

Best regards,
Michael

P.S. : I have cloned Asynchbase from your github repo and HBase version is 0.92.2 (git branch)

tsuna

unread,
May 2, 2012, 12:44:06 PM5/2/12
to Michael Morello, async...@googlegroups.com
On Wed, May 2, 2012 at 2:32 AM, Michael Morello
<michael...@gmail.com> wrote:
> A workaround is to call :
>
> Deferred<Object> ensureTableExists =
> hBaseClient.ensureTableExists(TABLE_NAME);
> ensureTableExists.joinUninterruptibly();

Yes you need to do this. You can get away without the Deferred and
simply write:
hBaseClient.ensureTableExists(TABLE_NAME).joinUninterruptibly();

The reason this is required is because if you send a bazillion writes
to asynchbase without first finding where the table is, every single
write will cause a meta lookup. As you found out, there is a
semaphore to try to rate-limit META lookups, but it's still not enough
to prevent you from causing what I call a "META storm".

The other solution would be to have asynchbase block until all
outstanding META lookups have completed, but one of the premises of
asynchbase is to never block, so I'm a little hesitant to add an
exception to the non-blocking guarantee. But I agree that, arguably,
asynchbase should do a better job at rate-limiting META lookups in
order to make your use case work without requiring a call to
ensureTableExists().

--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Gautam Borah

unread,
Jun 7, 2013, 9:36:34 PM6/7/13
to async...@googlegroups.com, Michael Morello
Hi Benoit,

I have started using asynchbase for one of our products. I applied you suggestion to my code to call, 

hBaseClient.ensureTableExists(TABLE_NAME).joinUninterruptibly(); 

That solved my problem for bulk loads.

Please suggest if I need to call this method once in the life time of the HBaseClient or I have to call this every time I batch a bunch of put requests for some table. 
From my application I am storing data in multiple HBase tables and one batch stores data into one table.

I am using asynchbase 1.5 and hbase 0.94.6.1.

Thanks,
Gautam

tsuna

unread,
Jun 7, 2013, 10:05:44 PM6/7/13
to Gautam Borah, Michael Morello, AsyncHBase

It's good practice to call it once per table, only once at the very beginning.

--
Sent from my Android phone

Gautam Borah

unread,
Jun 13, 2013, 2:53:20 PM6/13/13
to async...@googlegroups.com, Gautam Borah, Michael Morello
Thanks Benoit.
 
Regards,
Gautam
Reply all
Reply to author
Forward
0 new messages