Simulating Column-Oriented Stores in Voldemort

33 views
Skip to first unread message

Mark Rambacher

unread,
Nov 23, 2010, 9:57:01 AM11/23/10
to project-voldemort
Has anyone given any thought to an API or code to simulate a column-
oriented store in Voldemort? For example, it would be nice to be able
to retrieve a subset of the data associated with a value (such as only
the "last name" from a user record). Comparably, it would be nice to
be able to send only a subset of the record to the backend for
updates.

Obviously, this requires the backend server to have some understanding
of the layout of the encoded object. Since the serialization methods
are defined in the store, the server is already partly there.

Could Views be of use here? If new development is needed, what work
do folks believe would be required to simulate column storage?

Thanks,

Mark

Alex Feinberg

unread,
Nov 23, 2010, 5:12:41 PM11/23/10
to project-...@googlegroups.com
Hey Mark,

We do have the views functionality for retrieving only a subset of the
data. This has been merged in master but should be more thoroughly
documented with examples. The idea for this is for being able to do
simple predicate or projection push-down on the server.

The issue is that while the view approach does minimize network
latency, there is still an issue that the whole value would have to be
fetched from disk (as opposed to true columnar storage, where is some
equivalent of an index into the value, allowing only specific columns
to be retrieved from disk). It *may* be possible to do something like
this with BerkeleyDB's ReadPartial mode, but we haven't looked into
it.

Thanks,
- Alex

> --
> You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> To post to this group, send email to project-...@googlegroups.com.
> To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.
>
>

Mark Rambacher

unread,
Nov 23, 2010, 5:25:17 PM11/23/10
to project-voldemort
Alex,

Is there a mechanism with Views of retrieving different sets of data
on each API call or would one need to set up different "Views" in
advance? Establishing and using Views in advance might result in the
need to define views for every possible use case at configuration/
install time, which might prove to be difficult.

Thanks,

Mark


On Nov 23, 5:12 pm, Alex Feinberg <feinb...@gmail.com> wrote:
> Hey Mark,
>
> We do have the views functionality for retrieving only a subset of the
> data. This has been merged in master but should be more thoroughly
> documented with examples. The idea for this is for being able to do
> simple predicate or projection push-down on the server.
>
> The issue is that while the view approach does minimize network
> latency, there is still an issue that the whole value would have to be
> fetched from disk (as opposed to true columnar storage, where is some
> equivalent of an index into the value, allowing only specific columns
> to be retrieved from disk). It *may* be possible to do something like
> this with BerkeleyDB's ReadPartial mode, but we haven't looked into
> it.
>
> Thanks,
> - Alex
>

Alex Feinberg

unread,
Nov 23, 2010, 5:31:50 PM11/23/10
to project-...@googlegroups.com
Hey Mark,

Views are dynamic in the sense that you can pass any objects (with
appropriate serialization) as parameters to the view class. The view
class is just a piece of Java code. This way, you can program it to
retrieve all sorts of data.

There's an example of a range filter view in the unit test:

https://github.com/voldemort/voldemort/blob/master/test/unit/voldemort/store/views/ViewTransformsTest.java

There are some unsolved issues with views, however: namely, how to
perform read repair and hinted handoff.

Thanks,
- Alex

Reply all
Reply to author
Forward
0 new messages