Range queries and secondary indexes

erdody

unread,

Jun 28, 2010, 5:17:48 PM6/28/10

to project-voldemort

Hello,

We have decided to use Voldemort and there is one problem we need to
address.
We have a few fields (we use JsonSerializer and the value is a map of
several fields) we need to query by, on rare occasions, and we would
need range queries on some of these fields. These queries would be
made as maintenance tasks so performance is not critical, but we need
to be able to perform them somehow.
One option I was considering is to take advantage of the Secondary
index support in BDB and create an extension of BdbStorageEngine that
creates a BDB secondary database for each secondary index we need. The
SecondaryKeyCreator would need to deserialize the main value, extract
the corresponding value and create the secondary key. This creates a
performance overhead, but based on our preliminar tests, it's not very
significant.
The main advantage of this approach, as opposed to maintaining a
secondary index by hand (double StoreClient.put) is that the secondary
insert is transactional, handled by BDB.

Then, to make the range query, I was thinking of using something
similar to the AdminClient.fetchKeys, but taking advantage of the BDB
range queries support. I would need to make such query for each node
(could be done in parallel), of course, and merge the results.

A) Is our idea too crazy?
B) Other suggestions?
C) Are range queries or secondary indexes support in the Voldemort
roadmap?

Thanks in advance,

Diego Erdody

Lee

unread,

Jun 30, 2010, 4:43:50 AM6/30/10

to project-voldemort

You should take a look at Carbonado - http://carbonado.sourceforge.net/

It already has the ability to take a key-value BDB and add range
queries and alternate indexes to it.

erdody

unread,

Jul 2, 2010, 1:44:01 AM7/2/10

to project-voldemort

Thanks for the reference, it might be interesting to add more complex
queries support, but for range queries and secondary indexes (local to
each node), BDB already has what we need.

On Jun 30, 5:43 am, Lee <leely...@gmail.com> wrote:
> You should take a look at Carbonado -http://carbonado.sourceforge.net/

Jeffstyr

unread,

Jul 9, 2012, 6:13:36 PM7/9/12

to project-...@googlegroups.com

Unfortunately BDB doesn't allow secondary databases (indexes) when the primary DB allows duplicates (as is the case for Voldemort).

Vinoth Chandar

unread,

Jul 10, 2012, 5:52:45 PM7/10/12

to project-...@googlegroups.com

Hi Diego,

I am working on handling the duplicates in voldemort land itself, so we can turn off bdb sorted duplicates. we discussed your solution to pad the key with the version and do a range query to fetch all values for a key.

But, we went ahead with flattening out the values into a byte array and directly storing it as the bdb value. This helps us get rid of the bdb delete() call in put, which is good. I have a basic implementation done . Of course this needs a lot more testing and needs to be super safe. I will post the update here once its ready..

Thanks
Vinoth

erdody

unread,

Jul 11, 2012, 4:10:39 AM7/11/12

to project-...@googlegroups.com

Thanks for the update Vinoth,

Why is avoiding the delete a good thing? AFAIK, since bdb uses an append only format, updates are not very different from put and delete.

Also, if you flatten several revisions within the same value, now you need to handle read - update concurrency (make sure concurrent puts are not conflicting)

Thanks,

Diego

Vinoth Chandar

unread,

Jul 13, 2012, 12:42:01 PM7/13/12

to project-...@googlegroups.com

Hi Diego,

I was talking about how we implement voldemort put() - as a BDB get(), delete(), put() transaction. There are obvious advantages to rewriting without delete().
update concurrency can be handled the same way using transaction with a RMW cycle.

Thanks
Vinoth

Reply all

Reply to author

Forward