Hi Dick,
> I have tried this approach, but I am afraid that I have
> failed to properly create the spatial index. Do I understand
> correctly
> that the table to be indexed should have an integer autonumber
> primary
> key, be named in lower case, with the geometry column also named in
> lower case?
>
none of the above is strictly necessary:
- using upper- or lowercase names for tables and columns is not at all
relevant; SQL functions such as AddGeometryColumn() or
CreateSpatialIndex()
will then silently apply any case conversion when and if required.
- using an integer primary key is surely a good idea; otherwise the
spatial index could eventually become corrupted/damaged immediately
after performing a VACUUM; but in this case you can actually recover
a properly working spatial index by invoking CheckSpatialIndex and
RecoverSpatialIndex.
- using an Integer AutoIncrement Primary Key is a requirement for the
QGIS data provider, but there is nothing compelling to absolutely
declare an AutoIncrement clause from pure SQL perspective.
> Also, I am unclear as to how to examine the spatial index
> once it has been created, as the notes indicate that SpatialIndex is
> not directly queryable. The examples require a parameter for
> search_frame, and I am not clear as to what that should be.
>
the spatial index simply is a spatial filter; only the features/rows
declaring an MBR (aka BBOX) intersecting the "search_frame" will
pass such filter.
more precisely: the ROWID corresponding to any row whose Geometry
will intersect the "search frame" will be returned from a Spatial
Index query.
so "search_frame" simply is any generic Geometry intended to set
up a filter MBR (aka BBOX); it could eventually be a Point or
Linestring, but more often it's some Polygon (i.e. a "frame").
if I intend well you are mainly interested in some kind of
filtering based on a distance (i.e. a radius): so the the best
way to take profit from the Spatial Index is to set a square
"search_frame" circumscribing you target circle, then using
ST_Distance() or PtDistWithin() in order to get a very
precise selection.
> As well, I am interested in limiting the search to those points that
> are within 100km, i.e. the closest point in table2 to a point in
> table1, as long as that point is within 100k geodetic distance. In my
> test situation, table1 has 200,000 rows, but the real table has 9
> million rows. table2 has 5000 rows, and is the table that I am trying
> to spatially index.
>
a spatial index can effectively optimize the speed of your query
only if it really is a "very strict/selective" filter.
it's not at all relevant if the target table contains few rows or
100 million rows: the unique relevant factor is how effective is
the filter you are attempting to impose.
a very effective spatial index will extract let say only 1% (or
even less than this) of the whole table while discarding the other
99% because it's clearly not interesting.
on the other hand a spatial index returning about 50% of the
whole table will be practically useless; and a spatial index
returning about 25% of the total rows will probably perform
very poorly.
if your data are reasonably well structured and if your search_frame
is enough restricted/selective, don't be afraid.
a spatial index query will be damn fast even when accessing a
table containing many million rows: it's purposely designed for
a task like this.
bye Sandro