max buckets

25 views
Skip to first unread message

Andre

unread,
Jun 21, 2012, 5:08:42 AM6/21/12
to hba...@googlegroups.com
Hi folks,

i'm starting to integrate the HBaseWD in our project.
The unanswered question for us is still the amount of buckets, we want to use.

At the moment we have a very small cluster, there are just 3 region servers.
so, actually, there will be no value, if we set the max buckets to more than 3 (assume we have cpu's with only one core), is that correct?

the disadvantage of a too big number of buckets is a scan / mapreduce job, which will take longer, right?

now the question is, can we set the number for now to a small value and increase it later?
if we can change the number of buckets later. afaik, the increasing will work fine, but not the decreasing, since in that case the method "getAllPossiblePrefixes" will not work properly

please correct me, if i'm wrong
thanks in advance for any help
andre

Alex Baranau

unread,
Jun 21, 2012, 4:19:36 PM6/21/12
to hba...@googlegroups.com
Hi Andre,

1. Having more than 3 buckets makes sense even if you have only 3 RSs. If you have 3 buckets, at ideal situation you will be writing to 3 regions. But there still might be a chance that those regions are located not on 3 different RSs (depending on many things). So having bigger value makes sense here as well

2.
> the disadvantage of a too big number of buckets is a scan / mapreduce job, which will take longer, right?

Scan can be slower, as it will run multiple scanners under the hood. It shouldn't be N (where N is buckets number) slower, as scanners will be (likely) distributed over many regions, i.e. RSs. So you should test the performance degradation. It should be very noticeable. Can't tell the numbers right now.

As for MR job, in most of the cases you will not see it working longer with increased number of buckets. Even better, you may see speed improvement (with reasonable number of buckets) as more buckets will cause more Map tasks, which (depending on the situation) may be desired

3. Depending on the KeyDistributor implementation you use you can later increase number of the buckets without any issues, very easily. See some info about that on wiki (I guess). I will explain it in details in next blog post about HBaseWD at blog.sematext.com (see other posts there if you haven't so far).

Sorry for not very detailed response, I will be back from vacation in several days, will be able to provide more info if you still have questions by that time.

Thanks for the interest in HBaseWD project!

Alex
Reply all
Reply to author
Forward
0 new messages