Asking for explanations regarding RowKeyDistributorByHashPrefix implementation

34 views
Skip to first unread message

Ionut

unread,
Sep 17, 2011, 2:20:40 PM9/17/11
to HBaseWD - Distribute Sequential HBase Writes
Hi,

I want to ask you why the range of keys for
RowKeyDistributorByHashPrefix has negative values. For example, if I
choose to have 32 values, the range it will be from -15 to 16. Is
there any special reason to have buckets with negative values?

Regards,
Ionut

Alex Baranau

unread,
Sep 17, 2011, 5:10:02 PM9/17/11
to hba...@googlegroups.com
Hm... Which hash implementation do you use in RowKeyDistributorByHashPrefix?

E.g. in case of OneByteSimpleHash, the prefix is single byte b, so
that (b & 0xFF) is from the interval [0...N], where N is number of
buckets. As byte is signed type the prefix values can be 0, 1, ...,
127, -128, -127, ..., -1. E.g. if N = 32, then b belongs to [0...31].
In case e.g. N = 130, b belongs to [0, 1, .., 127, -128, -127].

There's no special reason in such sequence in case of
OneByteSimpleHash though. The prefixes sorted in ASC order (if used
HBase's native raw bytes comparator which compares b1 & 0xFF and b2 &
0xFF), but it is not required for prefixes returned by
getAllPossiblePrefixes() to be ordered actually.

(Note that I'm referring to the latest code in github:
OneByteSimpleHash was adjusted during this week).

Alex

Ionut

unread,
Sep 18, 2011, 4:20:09 PM9/18/11
to HBaseWD - Distribute Sequential HBase Writes
Hi!

Yes, you're right. I did not consider the last changes you made.
After I found the bug that I mentioned in one previous post here, I
implement my own version of Hasher. My implementation is based on
absolute of hashcode value from Arrays class % maxBuckets. I consider
this idea easier to understand than bit-level operation. Anyway, you
current idea is possible to be faster - a performance test it is
indicated here. I am also curious and I will see in the next days how
faster is the version of distributed scanner.

Otherwise, do you intend to add support for distribute delete?

Regards,
Ionut I.

On Sep 18, 12:10 am, Alex Baranau <alex.barano...@gmail.com> wrote:
> Hm... Which hash implementation do you use in RowKeyDistributorByHashPrefix?
>
> E.g. in case of OneByteSimpleHash, the prefix is single byte b, so
> that (b & 0xFF) is from the interval [0...N], where N is number of
> buckets. As byte is signed type the prefix values can be 0, 1, ...,
> 127, -128, -127, ..., -1. E.g. if N = 32, then b belongs to [0...31].
> In case e.g. N = 130, b belongs to [0, 1, .., 127, -128, -127].
>
> There's no special reason in such sequence in case of
> OneByteSimpleHash though. The prefixes sorted in ASC order (if used
> HBase's native raw bytes comparator  which compares b1 & 0xFF and b2 &
> 0xFF), but it is not required for prefixes returned by
> getAllPossiblePrefixes() to be ordered actually.
>
> (Note that I'm referring to the latest code in github:
> OneByteSimpleHash was adjusted during this week).
>
> Alex
>

Alex Baranau

unread,
Sep 19, 2011, 2:14:33 AM9/19/11
to hba...@googlegroups.com
Hello,

Would be great if you could share the test results.

There's a plan for delete:
https://github.com/sematext/HBaseWD/issues/1. For now you have to
delete records one-by one the same way as you do Get. This is usual
way to delete data in HBase (no delete range operation). But it would
be easier/cleaner for users to have explicit "distributed" analogs of
Get and Delete operation.

Will try to find the time during this week to complete this in case it
is important for you.

Alex.

Reply all
Reply to author
Forward
0 new messages