length of RowKeyDistributorByHashPrefix.OneByteSimpleHash

46 views

Skip to first unread message

Max Gensthaler

unread,

Jun 11, 2012, 5:34:01 AM6/11/12

to HBaseWD - Distribute Sequential HBase Writes

For anyone using the RowKeyDistributorByHashPrefix.OneByteSimpleHash I
wanted to share my experiences with it.

I'm still using the RowKeyDistributorByHashPrefix.OneByteSimpleHash
which I first parametrized in the constructor with 16 buckets. I
thought this would be more then enough for our small test cluster of
three nodes. This small number of buckets was enough to balance the
jobs in the cluster to use all nodes but lead to comparable large
inputs for the map reduce jobs. In consequence the progress displayed
in the Jobtracker jumped very fast to ~30% but after that didn't
represent any usable value.
Solution to this problem was to migrate all data to a
RowKeyDistributorByHashPrefix.OneByteSimpleHash(255) row key prefix
with 255 buckets. Since this change I was satisfied totally.

Alex, thank you a lot for this project!!!

Alex Baranau

unread,

Jun 11, 2012, 3:13:53 PM6/11/12

to hba...@googlegroups.com

Thank you Max for sharing your experience of using HBaseWD lib!

Btw, in the next post about HBaseWD (at http://blog.sematext.com) I will be describing some use-cases, incl. how to smoothly change the prefixing strategy when there's already an existing data. Will post link to it on this ML.