> Can anyone comment on whether the H2 MultiDimensional index can be
> used to store and query objects with an extent (i.e. of dimension
> higher than 0 - in other words, not just points).
No, just points.
> AFAIK this is not possible with an indexing scheme based on a
> linearization of space. But I'd be happy to be shown that I'm wrong.
You are right.
Regards,
Thomas
> So I guess to offer a true spatial index there are two choices:
> 1. implement a native R-tree
> 2. implement some sort of grid index over top of the existing B-tree
> (this is the approach that ArcSDE and SQLServer use, so it's not a
> total hack)
Yes. The advantage of 2. is that it works for all database, and can be
written independent of the database used. I would prefer this
solution. Optimizations are still possible if the database supports
more features.
> It seems like your FullText indexing might provide a nice pattern for
> doing #2. The query re-writing is a bit hideous, but presumably will
> work, given the prior examples.
That's a good idea. I wouldn't directly re-use fulltext index, but
create a new table with the relation between object and point. The
bounding box doesn't necessarily need to be rectangular. One idea is:
class GeoPoint { number[] location; number getLocation1D(); }
class GeoObject { number uniqueId; GeoPoint[] getPoints(); }
in the database, you would have a table that is the relation between
GeoObject and GeoPoint.
Maybe it would be better to have not just one, but a number of indexes
(let's say a coarse grained index, and a fine grained).
Regards,
Thomas
> there are some downsides to grid-like indexing:
> 1. they requires some ugly parameters. All the ones I've seen require
> you to specify the extent of your data, and also some parameters to
> define the density of the grid at each level. These are pretty opaque
> to tune
Could it be self-tuning? The index should be re-created after
inserting data (or after you know what the distribution is), but that
sounds like updating index statistics - I think it's not that bad.
> 2. the query structure is more complex. It pretty much requires a
> query-rewriting step (which may be more or less hidden by the SQL
> interface)
I agree. But this logic would be on a higher level, and can be easily
and independently debugged.
> 3. I expect that performance is less (partly as a result of 2)
I'm not sure - why don't we test it :-) My feeling is that Z-order as
described in http://en.wikipedia.org/wiki/Z-order_%28curve%29 is fast
if implemented correctly. The H2 MultiDimensional index is based on
that.
> I would much prefer to see an R-tree implemented. Most "quality"
> spatial DBs have gone this route (Oracle, mySQL, PostGIS).
OK, it's your project ;-) I will help you as much as I can, but I will
not be able to implement the R-tree index.
> Every other spatial DB I'm aware of uses bounding
> boxes. I would be cautious about doing something radically different,
> at least for the first version.
Do you have any links, I don't know too much about spatial indexing.
My idea is: For each object, store the location and bounding box. But
for the index, use the grid approach. I think we need to agree on use
cases. One use case is: store and index buildings and streets to
create a map. We should then agree on some demo data and an
application. I would store things like this:
Object 1 { type: street, name: "A1", level: 5, bounds: (0, 10)/(100,
60), polygon: ..., segments: 1001 - 1100 }
Object 1001 { level: 8, bounds: (0, 59)/(1, 60), polygon: ..., parent: 1 }
So each larger street contains street segments, knows the bounding box
and the polygon. Level is the zoom level (1: world, 10: house). There
are multiple indexes, one for each zoom level. A zoom level could
contain 4 times the space than the previous zoom level, or 16 times.
Each object is indexed as follows: calculate the "squares" (each
square has a number) it occupies in the given zoom level; add a link
to this object to the index. With the data above it could be:
index_level_5: square 500: street 1; square 501: street 1, square 505: street 1
When you want to look up what to draw for the given zoom level, you
search in the right index all data for the given squares you need to
draw. So that's basically like a fulltext search index. This is how I
would do it using the current H2 MultiDimensional index. I guess my
description is not that clear...
Regards,
Thomas
> index parameter settings.
> changing params would basically require a full reindex.
Yes, that's true. However I don't think that this will be happening a lot.
> Real testing would require a pretty much full implementation of both
> paradigms, I suspect.
I'm not sure, but if yes then let's do that :-) What about a simple
in-memory R-Tree implementation first? We can then better decide if
it's easy to add this to the database or not.
Regards,
Thomas
> I'm not familiar with how H2 implements in-memory indexes. Are they
> created and maintained for the duration of a user connection, or for
> the lifetime of the DB process? (I presume the latter...)
Yes, for the lifetime of the database.
> If so, that could be a very useful feature in its own right - it should be a
> lot faster than a disk-based index, right?
Yes.
> As I mentioned in my other post, there seems to be lots of existing
> Java R-tree implementations.
But I guess many are complex and / or don't have the right license.
And my experience is that to understand it fully you need to implement
at least a part of it yourself.
> I guess this will require lots of little changes to the SQL DML syntax
> & language processor, right?
Yes, at some point.
What I meant is: I would first implement the R tree outside of the
database, as a standalone package. At that time the API would be Java.
Adding the R tree to the parser and adding the features to the parser
does not have priority in my view. I would first try to solve the
hardest problem.
Also I would implement a test case first.
> (And I guess I should just look at the
> code instead of asking all these questions... 8^)
If you can't find things in H2 let me know. I need to improve the
source code documentation anyway. The parser is Parser.java, and the
index is Index.java, but you could extend BaseIndex.
Regards,
Thomas