HBaseWD and coprocessor

43 views
Skip to first unread message

Yanke Qiao

unread,
May 13, 2014, 6:11:49 AM5/13/14
to hba...@googlegroups.com
hi,
     i want to use hbasewd to deal with the sequential data. 
    but i dont know how to use endpoint coprocessor to deal with the salted rowkey, can anybody help me?
thank you 
qiao

Alex Baranau

unread,
May 13, 2014, 3:59:54 PM5/13/14
to hba...@googlegroups.com
Hi qiao,

Can you give more details, what are you doing in coprocessor?

Alex
> --
> You received this message because you are subscribed to the Google Groups
> "HBaseWD - Distribute Sequential HBase Writes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hbasewd+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Yanke Qiao

unread,
May 13, 2014, 10:06:07 PM5/13/14
to hba...@googlegroups.com
thanks for your reply,
 the original row key is like timestamp+other ID,
 i add a byte prefix on the sequential timestamp,
 when i want to use endpoint coprocessor do some sum or average operations,
 i find the internalScanner is only do scans in local region server. I wonder if I use 
 salting rowkey , can I use endpoint coprocessor, or there is no need to use it, since 
 the data is distributed in the cluster.
 
many thanks
qiao

在 2014年5月14日星期三UTC+8上午3时59分54秒,Alex Baranau写道:

Alex Baranau

unread,
May 16, 2014, 11:45:22 PM5/16/14
to hba...@googlegroups.com
Coprocessor endpoint resides in one RegionServer (RS). So if you salt
your keys and data is distributed across multiple RSs, you need to
talk to all of them and then do aggregation of the results returned by
each, on client side.

> or there is no need to use it, since
> the data is distributed in the cluster.

So what is your coprocessor is doing? "sum or avg" of values in
different rows? It seems like in this case you'd do aggregation on
server side (in RS coprocessor) and also on client-side after you get
results from RSs. I believe this is a standard way of doing
aggregation with Cps endpoints, whether you use HBaseWD lib or not:
you never know how many regions the data selection spans, right?

Alex
Reply all
Reply to author
Forward
0 new messages