HBaseWD and coprocessor

Yanke Qiao

unread,

May 13, 2014, 6:11:49 AM5/13/14

to hba...@googlegroups.com

hi,

i want to use hbasewd to deal with the sequential data.

but i dont know how to use endpoint coprocessor to deal with the salted rowkey, can anybody help me?

thank you

qiao

Alex Baranau

unread,

May 13, 2014, 3:59:54 PM5/13/14

to hba...@googlegroups.com

Hi qiao,

Can you give more details, what are you doing in coprocessor?

Alex

> --
> You received this message because you are subscribed to the Google Groups
> "HBaseWD - Distribute Sequential HBase Writes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hbasewd+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Yanke Qiao

unread,

May 13, 2014, 10:06:07 PM5/13/14

to hba...@googlegroups.com

thanks for your reply,

the original row key is like timestamp+other ID,

i add a byte prefix on the sequential timestamp,

when i want to use endpoint coprocessor do some sum or average operations,

i find the internalScanner is only do scans in local region server. I wonder if I use

salting rowkey , can I use endpoint coprocessor, or there is no need to use it, since

the data is distributed in the cluster.

many thanks

qiao

在 2014年5月14日星期三UTC+8上午3时59分54秒，Alex Baranau写道：

Alex Baranau

unread,

May 16, 2014, 11:45:22 PM5/16/14

to hba...@googlegroups.com

Coprocessor endpoint resides in one RegionServer (RS). So if you salt
your keys and data is distributed across multiple RSs, you need to
talk to all of them and then do aggregation of the results returned by
each, on client side.

> or there is no need to use it, since
> the data is distributed in the cluster.

So what is your coprocessor is doing? "sum or avg" of values in
different rows? It seems like in this case you'd do aggregation on
server side (in RS coprocessor) and also on client-side after you get
results from RSs. I believe this is a standard way of doing
aggregation with Cps endpoints, whether you use HBaseWD lib or not:
you never know how many regions the data selection spans, right?

Alex

Reply all

Reply to author

Forward