custom index

101 views
Skip to first unread message

Yiannis Volos

unread,
Feb 9, 2014, 12:54:15 PM2/9/14
to voltd...@googlegroups.com
hi,
where does someone start if they want to write a custom index? 
such as a spatial index?



Paul Martel

unread,
Feb 11, 2014, 12:14:04 PM2/11/14
to voltd...@googlegroups.com
Hi Yiannis,

  There are a few possible approaches to defining a custom index, depending on the kinds of queries you are trying to optimize.
The C++ index API is defined by the base class TableIndex in src/ee/indexes/tableindex.h. This allows "arbitrary implementations" to be used by the existing index scan and nest loop index join executor code.
Selection of index implementations is determined in src/ee/indexes/tableindexfactory.cpp. This is where a "fork" would have to be inserted that detected that the custom index is the most suitable for a given index definition.
src/ee/indexes/indexkey.h defines classes for "index key" structures which are all variations on a tuple of values.
The common API to these classes is also implicitly part of the index API. Depending on your needs, new key structures may need to be defined.

That covers one basic facet of custom index implementation. Here are a few others:

-- Custom indexes typically optimize custom SQL functions which have to be supported both on the C++ execution engine and in the java SQL parser.
There is a framework for defining new sql functions and a recipe described at: https://github.com/VoltDB/voltdb/wiki/Implementing-sql-functions

-- Some thought should be given to indexed column and expression types.
VoltDB's type system is not easily extensible, so there is a great advantage to storing your indexed values using existing types like INTEGER, VARCHAR, VARBINARY, or some combination (using multiple columns/arguments in parallel).

-- In some cases, it may be necessary to change the java DDL parser and compiler -- other cases may be covered by the existing code for defining indexes on functions/expressions of columns and constants.

-- In some cases, it may be necessary to change the java query planner -- other cases may be covered by the existing code for applying a given index to a given query.


Spatial indexing is a topic of interest to the VoltDB team, but we have not yet committed to a development/delivery schedule.
My initial take on it is that it would not strictly require support for new TYPES, but WILL likely (at least eventually) require some support in all of these other code areas.
It may be possible to "work around" some of these requirements for the purposes of initial prototyping.

Are you just interested in exploring the technology, or do you have a particular application in mind?

We ask that anyone who would like to contribute to our open source repo send us a signed contributor license agreement. See http://voltdb.com/contributor-license-agreement/

Thanks for your interest.
--paul



--
You received this message because you are subscribed to the Google Groups "VoltDB-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to voltdb-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yiannis Volos

unread,
Feb 12, 2014, 12:03:42 PM2/12/14
to voltd...@googlegroups.com
Hi Paul, 

thanks for such an in-depth answer. 

since VoltDB typing system is not so easily extendable, maybe we could take advantage of VoltDB JSON capabilities and support the GeoJSON format http://geojson.org/geojson-spec.html

then it would take just to create the spatial index and change the planners?

 
--
I am looking to use spatial features for an application.

But, basically the landscape of open-source databases with spatial support is not clear if not confusing, except if I am not looking at the right places. 

Although databases such as Couchbase, neo4j supposedly have spatial features their documentation on spatial performance is again not clear if non-existant.  

PostGIS seems a good candidate but I am not sure of its performance and if it would be suitable for real-time apps with many queries. For now I may stick into MySQL for spatial features which seems to be a better performer at least polygons related queries, but if VoltDB had spatial features it would be no-brainer for me to use it. I am expecting it would be faster too since it will be all in memory.

so, i was thinking that i could use MySQL or PostGIS for now and move to VoltDB when spatial support is there and hopefully by the point I will be in need of that extra performance; and what better way to bring the support to VoltDB other than maybe helping with adding spatial indexes to it? 
Reply all
Reply to author
Forward
0 new messages