length of RowKeyDistributorByHashPrefix.OneByteSimpleHash
46 views
Skip to first unread message
Max Gensthaler
unread,
Jun 11, 2012, 5:34:01 AM6/11/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to HBaseWD - Distribute Sequential HBase Writes
For anyone using the RowKeyDistributorByHashPrefix.OneByteSimpleHash I
wanted to share my experiences with it.
I'm still using the RowKeyDistributorByHashPrefix.OneByteSimpleHash
which I first parametrized in the constructor with 16 buckets. I
thought this would be more then enough for our small test cluster of
three nodes. This small number of buckets was enough to balance the
jobs in the cluster to use all nodes but lead to comparable large
inputs for the map reduce jobs. In consequence the progress displayed
in the Jobtracker jumped very fast to ~30% but after that didn't
represent any usable value.
Solution to this problem was to migrate all data to a
RowKeyDistributorByHashPrefix.OneByteSimpleHash(255) row key prefix
with 255 buckets. Since this change I was satisfied totally.
Alex, thank you a lot for this project!!!
Alex Baranau
unread,
Jun 11, 2012, 3:13:53 PM6/11/12
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hba...@googlegroups.com
Thank you Max for sharing your experience of using HBaseWD lib!
Btw, in the next post about HBaseWD (at http://blog.sematext.com) I will be describing some use-cases, incl. how to smoothly change the prefixing strategy when there's already an existing data. Will post link to it on this ML.