Hashing and Key Distribution in Voldemort

132 views
Skip to first unread message

Mark Rambacher

unread,
Nov 2, 2009, 11:02:57 AM11/2/09
to project-voldemort
Has anyone done any profiling of how well Voldemort distributes keys
amongst the partitions? The algorithm for consistent hashing does not
appear like it would favor a random distribution of keys and other
issues:

- The key is hashed using an FVN algorithm
- The partition ID for the key is determined by modulo division of the
key by the number of partitions

This algorithm makes sure that each partition is exactly the same
distance from the other partitions on the ring. However, it has some
other issues:
- The "randomness" of the hash becomes less important. Instead, only
the last few bits/bytes of the hash (the modulo partition count bits)
are ever evaluated. The remainder of the hash is ignored for the
consistent hashing.
- If a new partition is added to the system, every node is affected as
their share of the key space was reduced.

A different consistent hashing algorithm (such as suggested here:
http://weblogs.java.net/blog/2007/11/27/consistent-hashing) may not
have these same issues (but may have others, such as how long it takes
to calculate the partition ID may increase or the key space may not be
evenly distributed).

My questions are as follows:

1. Do people see issues with certain partitions becoming dominant or
overloaded using the current algorithms? Or is this masked by having
a relatively small number of nodes?
2. How many partitions are typically used by Voldemort adopters?
Choosing the number carefully (a factor of 2) looks like it would
speed performance -- we were considering using either 512 or 1024
partitions.
3. Were different algorithms tried for Voldemort and how was this
choice settled upon?
4. Has the Rebalancing project considered adding partitions to the
cluster or is that part of a separate effort?

I would like to do some profiling here to see how well our "real" and
"simulated" data maps onto the partitions using the current and other
algorithms but have not had the opportunity to do so yet.

Thanks,

Mark

Jay Kreps

unread,
Nov 2, 2009, 12:59:39 PM11/2/09
to project-...@googlegroups.com
Hi Mark,

Couple of quick things
1. We have done some statistical analysis on the key distribution and
it is fairly good. There is a unit test that does a chi-square test to
verify this (ConsistentRoutingStrategyTest.java).
2. We have seen pretty good load balancing in practice.
3. I see what you are saying about adding partitions, but the method
of data redistribution is to redistribute partitions, not to add them.
It is assumed that the # of partitions is fixed (as in the dynamo
paper). There is no effort to add partitions that i am aware of, the
ability to redistribute partitions seems good enough.
4. The performance of hashing is more or less constant with respect to
the number of partitions (it is just a mapping done via array lookup)

-Jay

anderson guo

unread,
Dec 3, 2009, 11:07:40 PM12/3/09
to project-voldemort
Hi Jay

Could you add the statistical analysis case to voldemort repository.
I'm really very interesting in how to verify the equally distribution
using consistent hashing.

BR/anderson

Jay Kreps

unread,
Dec 3, 2009, 11:17:49 PM12/3/09
to project-...@googlegroups.com
Hi Anderson,

There is a junit test called ConsistentRoutingStrategyTest which is
checked in now which does a simple chi-sq test. The other thing I have
is the method TestDistribution.java which tests the assignment of the
integers between 0 and N for a given cluster.xml and prints out
statistics on the skew--this is an important use case for us as our
member ids are integers. I found that there is actually some skew here
for our clusters, around +/-5% which gives me some worry, but is not
too unreasonable. I have been meaning to test if a change of the
underlying hash function we use from fnv to md5 would resolve this or
not.

-Jay
> --
>
> You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> To post to this group, send email to project-...@googlegroups.com.
> To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.
>
>
>

anderson guo

unread,
Dec 4, 2009, 2:16:37 AM12/4/09
to project-voldemort
Hi, Jay

Thanks for your quick reply. I have the source code of voldermort
v0.51 and found voldemort.routing.ConsistentRoutingStrategyTest class
under test folder, meanwhile, TestDistribution.java doesn't exist. Can
you post source code of it or where can I download or view it?
> >> 4. The performance ofhashingis more or less constant with respect to
> >> the number of partitions (it is just a mapping done via array lookup)
>
> >> -Jay
>
> >> On Mon, Nov 2, 2009 at 8:02 AM, Mark Rambacher <mramb...@gmail.com> wrote:
>
> >> > Has anyone done any profiling of how well Voldemort distributes keys
> >> > amongst the partitions?  The algorithm for consistenthashingdoes not
> >> > appear like it would favor a random distribution of keys and other
> >> > issues:
>
> >> > - The key is hashed using an FVN algorithm
> >> > - The partition ID for the key is determined by modulo division of the
> >> > key by the number of partitions
>
> >> > This algorithm makes sure that each partition is exactly the same
> >> > distance from the other partitions on the ring.  However, it has some
> >> > other issues:
> >> > - The "randomness" of the hash becomes less important.  Instead, only
> >> > the last few bits/bytes of the hash (the modulo partition count bits)
> >> > are ever evaluated.  The remainder of the hash is ignored for the
> >> > consistenthashing.
> >> > - If a new partition is added to the system, every node is affected as
> >> > their share of the key space was reduced.
>
> >> > A different consistenthashingalgorithm (such as suggested here:

Jay Kreps

unread,
Dec 4, 2009, 2:48:22 AM12/4/09
to project-...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages