Coprocessor / threading model

Showing 1-13 of 13 messages
Coprocessor / threading model Adrien Mogenet 1/12/13 4:06 PM
Hi there,

I'm experiencing some issues with CP. I'm trying to implement an indexing
solution (inspired by Annop's slides). In pre-put, I trigger another Put()
in an external table (to build the secondary index). It works perfect for
one client, but when I'm inserting data from 2 separate clients, I met
issues with HTable object (the one used in pre-Put()), because it's not
thread-safe. I decided to move on TablePool and that fixed my issue.

But if I increase the write-load (and concurrency) HBase is throwing a OOM
exception because it can't create new native threads. Looking at HBase
metrics "threads count", I see that roughly 3500 threads are created.

I'm looking for documentation about how CPs are working with threads :
what/when should I protect against concurrency issues ? How may I solve my
issue ?

Help is welcome :-)

--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me
Re: Coprocessor / threading model anil gupta 1/12/13 6:30 PM
I also ran into similar problem with one of my secondary index
implementation. But, i could not dig into the problem as i have to shift
focus on some other stuff. I am also interested in knowing the resolution
of this kind of problem in Coprocessors.

On Sat, Jan 12, 2013 at 5:38 PM, Ted <yuzh...@gmail.com> wrote:

> Please take a look at hbase-6651 which improves thread safety of table
> pool.
>
> Are you using hbase 0.94 ?
>
> Thanks
>
> On Jan 12, 2013, at 4:06 PM, Adrien Mogenet <adrien....@gmail.com>
--
Thanks & Regards,
Anil Gupta
Re: Coprocessor / threading model Andrew Purtell 1/12/13 6:39 PM
> In pre-put, I trigger another Put() in an external table (to build the
secondary index).

We should probably call this a Coprocessor anti-pattern.

Coprocessors are meant to operate on the region to which they are
associated. They are a way you can extend HBase function while it operates
in region on data for the region. Think of them as loadable kernel modules.
They are not a general purpose server side platform for programming as if
you are building a HBase client (with HTable, etc.). Just because you can
do this doesn't mean you should.
--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
Re: Coprocessor / threading model Ted Yu 1/12/13 6:48 PM
bq. Coprocessors are meant to operate on the region to which they are
associated.

For Anoop's case, the secondary table(s) have their regions aligned with
the corresponding region from primary table. Meaning, related regions are
served by the same region server.
Would writes to such regions of secondary table(s) be acceptable ?

Thanks
Re: Coprocessor / threading model Andrew Purtell 1/12/13 7:58 PM
Yes, especially if the cross region communication is in process.
Re: Coprocessor / threading model ramkrishna vasudevan 1/13/13 2:04 AM
In Anoop's soln its basicallly the put happens directly on the index region
rather than doing a put thro HTable.

Regards
Ram
Re: Coprocessor / threading model Adrien Mogenet 1/13/13 2:42 AM
Thanks for pointing me out the Jira, that's useful for my understanding.
I'm using HBase 0.94.3, and regions of main and index table are co-located
on the same RS as in Anoop's design. I'll browse the API tomorrow to find
out how to not use HTable but inter-CPs communication.
Re: Coprocessor / threading model Anoop John 1/13/13 8:12 AM
In your CP methods you will get ObserverContext object from which you can
get HRS object.
ObserverContext.getEnvironment().getRegionServerServices()
From this HRS you can get hold to any of the region served by that RS.
Then directly call methods on HRegion to insert data. :)
Good luck..


-Anoop-
Re: Coprocessor / threading model Wei Tan 1/15/13 10:44 AM
Andrew, could you explain more, why doing cross-table operation is an
anti-pattern of using CP?
Durability might be an issue, as far as I understand. Thanks,


Best Regards,
Wei




From:   Andrew Purtell <apur...@apache.org>
To:     "us...@hbase.apache.org" <us...@hbase.apache.org>,
Date:   01/12/2013 09:39 PM
Subject:        Re: Coprocessor / threading model
Re: Coprocessor / threading model Varun Sharma 1/15/13 10:56 AM
You should look at the jstack - I think HTablePool is the reason for the
large number of threads. Note that HTablePool is a reusable pool HTable(s)
and each HTable consists of an ExecutorService containing 1 thread by
default. Are you closing the HTable you obtain from HTablePool - if you are
not closing the HTable - that will incessantly increase your thread count.
Also on 64 bit machines, I think each thread is allocated 256K or 512K of
stack by default.

Varun
Re: Coprocessor / threading model Andrew Purtell 1/15/13 11:20 AM
HTable is a blocking interface. When a client issues a put, for example, we
do not want to return until we can confirm the store has been durably
persisted. For client convenience many additional details of remote region
invocation are hidden, for example META table lookups for relocated
regions, reconnection, retries. Just about all coprocessor upcalls for the
Observer interface happen with the RPC handler context. RPC handlers are
drawn from a fixed pool of threads. Your CP code is tying up one of a fixed
resource for as long as it has control. And in here you are running the
complex HTable machinery. For many reasons your method call on HTable may
block (potentially for a long time) and therefore the RPC handler your
invocation is executing within will also block. An accidental cycle can
cause a deadlock once there are no free handlers somewhere, which will
happen as part of normal operation when the cluster is loaded, and the
higher the load the more likely.

Instead you can do what Anoop has described in this thread and install a CP
into the master that insures index regions are assigned to the same
regionserver as the primary table, and then call from a region of the
primary table into a colocated region of the index table, or vice versa,
bypassing HTable and the RPC stack. This is just making an in process
method call on one object from another.

Or, you could allocate a small executor pool for cross region RPC. When the
upcall into your CP happens, dispatch work to the executor and return
immediately to release the RPC worker thread back to the pool. This would
avoid the possibility of deadlock but this may not give you the semantics
you want because that background work could lag unpredictably.


On Tue, Jan 15, 2013 at 10:44 AM, Wei Tan <wt...@us.ibm.com> wrote:

> Andrew, could you explain more, why doing cross-table operation is an
> anti-pattern of using CP?
> Durability might be an issue, as far as I understand. Thanks,
>
>
> Best Regards,
> Wei
>

Re: Coprocessor / threading model Wei Tan 1/15/13 2:41 PM
Thanks Andrew for your detailed clarification.
Now I understand that in general, the system is subject to CAP theorem.
You want good consistency AND latency, then partition tolerance needs to
be sacrificed: this is the "local index" approach, i.e., colocate index
and data and avoid RPC.

Otherwise, if you can tolerate consistency but not latency, you put RPCs
in a queue and process them in the background. By this means you can have
a "global" index with some lag.


Best Regards,
Wei

Wei Tan
Research Staff Member
IBM T. J. Watson Research Center
Yorktown Heights, NY 10598
wt...@us.ibm.com; 914-945-4386



From:   Andrew Purtell <apur...@apache.org>
To:     "us...@hbase.apache.org" <us...@hbase.apache.org>,
Date:   01/15/2013 02:20 PM
Subject:        Re: Coprocessor / threading model



RE: Coprocessor / threading model Anoop Sam John 1/15/13 8:39 PM
Thanks Andrew. A detailed and useful reply.... Nothing more needed to explain the anti pattern..  :)

-Anoop-
________________________________________
From: Andrew Purtell [apur...@apache.org]
Sent: Wednesday, January 16, 2013 12:50 AM
To: us...@hbase.apache.org
Subject: Re: Coprocessor / threading model