We have created our own custom storage engine where each partition is
stored in a separate database (similar to what is proposed for BDB in
http://code.google.com/p/project-voldemort/issues/detail?id=179).
However, these changes do not "play nice" with a rebalancing in that
the rebalancing decisions are made at a layer *above* the storage
engine. The rebalance code ends up calling storageEngine.keys() and
storageEngine.get(key):
ByteArray key = keyIterator.next();
if(validPartition(key.get()) && counter % skipRecords == 0) {
for(Versioned<byte[]> value: storageEngine.get(key, null))
{
throttler.maybeThrottle(key.length());
if(filter.accept(key, value)) {
It seems like this code is very inefficient, at least for the case
where the storage engine can already make some of these determinations
(it might be for other cases as well since the store must be traversed
once -- to get the keys -- and then searched -- to get the values --
rather than getting both in one chunk).
Is there a reason not to change the method signatures for
storageEngine.keys() and storageEngine.entries to also take a list of
partitions to get keys from and the filter, e.g.,
public ClosableIterator<K> keys(List<Integer> partitions,
VoldemortFilter filter);
public ClosableIterator<Pair<K, Versioned<V>>>
entries(List<Integer> partitions, VoldemortFilter filter);
If filter is null, all values are returned. If partitions is null or
empty, all values are returned. If the interfaces are being changed,
it might also be nice to add T transform to the StorageEngine and add
it as an argument to entries as well. Finally, there should probably
also be a method like:
public void deleteEntries(List<Integer> partitions,
VoldemortFilter filter);
That can delete all entries that match a filter in the given
partition.
Before we go about making such a change, is there any reason anyone
can think of NOT to do this? It seems like it could be beneficial to
have more of the decisions made closer to the data, as implementations
might be able to optimize some of the decision processing.
Thanks,
Mark