CreateSpatialIndex doesn't speed up queries for points-within-polygons... must be doing something wrong

431 views
Skip to first unread message

Karl Lehenbauer

unread,
Jul 10, 2017, 10:02:47 AM7/10/17
to SpatiaLite Users
Hi Sandro and spatialite fans everywhere!

I've got a spatialite database containing 15,802 polygons defining airport runways around the world.

I have created a geometry column containing the polygons and can successfully find if points are within one of them using for example select * from runways where within(GeomFromText('POINT(-95.346824 29.969761)', 4326), runways.runway_polygon);

The table definition looks like this:

CREATE TABLE runways (

airport_id text,

rwy_ident text,

length integer,

width integer,

base_ident varchar,

base_lat_end real,

base_long_end real,

base_brng integer,

recp_ident varchar,

recp_lat_end real,

recp_long_end real,

recp_brng integer,

source varchar,

PRIMARY KEY (airport_id, rwy_ident)

);


SELECT AddGeometryColumn('runways','runway_polygon', 4326, 'POLYGON', 2, 0);


Each query takes about 16 ms, limiting performance to about 60 lookups a second.  OK, this calls for a geospatial index...

SELECT CreateSpatialIndex('runways','runway_polygon');


This runs and returns 1.

But after creating the spatial index, the queries don't run any faster; they still take about 16 ms each.  It doesn't make any difference if I create the spatial index before or after populating the table.

As you can see the query doesn't include any other terms that could've confused sqlite into picking a bad strategy for implementing the query.  "Explain" with ".scanstats on" produces the same results with or without the index.

The polygons seem OK, and as I said the queries do work.  Here's a bit of an extract from the runways table:

spatialite> select airport_id, rwy_ident, AsText(runway_polygon) from runways limit 5;

00C|01/19|POLYGON((-107.872333 37.196775, -107.872172 37.196726, -107.866057 37.209582, -107.866218 37.20963))

00CA|04/22|POLYGON((-116.89309 35.349381, -116.89291 35.349219, -116.87791 35.360219, -116.87809 35.360381))

00CL|16/34|POLYGON((-121.763332 39.432016, -121.763474 39.432017, -121.763524 39.422361, -121.763382 39.422361))

00F|10/28|POLYGON((-105.464784 45.473207, -105.464913 45.473022, -105.449494 45.467732, -105.449365 45.467916))

00M|16/34|POLYGON((-89.236822 31.957662, -89.236975 31.957618, -89.233827 31.949818, -89.233675 31.949862))


I figure I've made some rookie mistake.  Can someone tell me what I did wrong?  (I can make the database available and a little Tcl program that demonstrates the performance, if needed.)

Regards...

Karl

Karl Lehenbauer

unread,
Jul 10, 2017, 10:06:54 AM7/10/17
to SpatiaLite Users
Sorry, forgot to say I'm using spatialite 4.3.0 / sqlite 3.19.3.

mj10777

unread,
Jul 10, 2017, 10:27:54 AM7/10/17
to SpatiaLite Users
Yes, that is true. 

As you can see the query doesn't include any other terms that could've confused sqlite into picking a bad strategy for implementing the query.  "Explain" with ".scanstats on" produces the same results with or without the index.
sqlite know nothing about the SpatialIndex.

The polygons seem OK, and as I said the queries do work.  Here's a bit of an extract from the runways table:

spatialite> select airport_id, rwy_ident, AsText(runway_polygon) from runways limit 5;

00C|01/19|POLYGON((-107.872333 37.196775, -107.872172 37.196726, -107.866057 37.209582, -107.866218 37.20963))

00CA|04/22|POLYGON((-116.89309 35.349381, -116.89291 35.349219, -116.87791 35.360219, -116.87809 35.360381))

00CL|16/34|POLYGON((-121.763332 39.432016, -121.763474 39.432017, -121.763524 39.422361, -121.763382 39.422361))

00F|10/28|POLYGON((-105.464784 45.473207, -105.464913 45.473022, -105.449494 45.467732, -105.449365 45.467916))

00M|16/34|POLYGON((-89.236822 31.957662, -89.236975 31.957618, -89.233827 31.949818, -89.233675 31.949862))


I figure I've made some rookie mistake.
Yes, you must also use the SpatialIndex in your query, since sqlite will NOT do it for you. 
 Can someone tell me what I did wrong?  (I can make the database available and a little Tcl program that demonstrates the performance, if needed.)

Try this, 
- note: the use of MakePoint
- the returned ROWID will be those geometries (runway_polygon) that contain the point
 
SELECT * FROM runways
WHERE 
(
 ROWID IN 
 (
  SELECT ROWID FROM SpatialIndex
  WHERE 
  (
   (f_table_name = 'runways') AND
   (f_geometry_column = 'runway_polygon') AND
   (search_frame = MakePoint(-95.346824,29.969761))
  )
 )
);

Mark
 

Regards...

Karl

mj10777

unread,
Jul 10, 2017, 10:48:34 AM7/10/17
to SpatiaLite Users
Correction:  the returned ROWID will be those geometries (runway_polygon) where the point is contain in the BoundingBox of the geometry.
Also the 'f_geometry_column' portion is only needed when the TABLE contains more than 1 geometry (but does no harm)
Message has been deleted

Karl Lehenbauer

unread,
Jul 10, 2017, 2:40:35 PM7/10/17
to SpatiaLite Users
OK I'm still having problems...  If I create a spatial index by doing "SELECT CreateSpatialIndex('runways','runway_polygon');" then that should populate a row in the SpatialIndex table for each row in the runways table that defines a runway_polygon, right?  The triggers and stuff handle it so that if I do something like (using the Tcl interface to Spatialite)...

"INSERT OR REPLACE INTO runways (airport_id, rwy_ident, length, width, base_ident, base_lat_end, base_long_end, base_brng, recp_ident, recp_lat_end, recp_long_end, recp_brng, source, runway_polygon) VALUES (:row(airport_id), :row(rwy_ident), :row(length), :row(width), :row(base_ident), :row(base_lat_end), :row(base_long_end), :base_brng, :row(recp_ident), :row(recp_lat_end), :row(recp_long_end), :recp_brng, :row(source), ST_GeomFromText(:polygon, 4326))"


...the triggers set up by CreateSpatialIndex should cause the SpatialIndex table to be inserted, deleted or updated in step with inserts, deletes and updates to the runways table, with no extra work on my part, regardless of whether CreateSpatialIndex was invoked before or after inserting into the runways table, right?

Either way, right now "select count(*) from spatialindex" returns a row count of zero.

a.fu...@lqt.it

unread,
Jul 10, 2017, 4:17:10 PM7/10/17
to spatiali...@googlegroups.com
On Mon, 10 Jul 2017 11:40:35 -0700 (PDT), Karl Lehenbauer wrote:
> ...the triggers set up by CreateSpatialIndex should cause the
> SpatialIndex table to be inserted, deleted or updated in step with
> inserts, deletes and updates to the runways table, with no extra work
> on my part
>

Hi Karl,

yes, you are absolutely right.
The installed triggers will automatically intercept every INSERT,
UPDATE or DELETE affecting the "runways" table, thus robustly
ensuring that the companion "spatial index table" will be
always correctly synchronized.

note: in SQLite a Spatial Index isnt really an index, it simply
is another table of the very special R*Tree type [1].

[1] https://sqlite.org/rtree.html

SQLite itself hasn't the slightest idea about the logical relation
between the "mother table" and the corresponding "r*tree table";
it's the application logic that is expected to be aware of such
a relationship.

SpatiaLite always adopts the following convention for identifying
which "mother table" corresponds to which "r*tree table":

1. assuming a table named <TABLE> and a geometry column named
<GEOMETRY>

2. the corresponding R*Tree will be named as:
idx_<TABLE>_<GEOMETRY>

so in your specific case, the R*Tree supporting "runways"
will be named "idx_runways_runway_polygon".
for a direct confirm, please test this SQL query:

SELECT * FROM idx_runways_runway_polygon;

btw the above tables layout explains why SpatiaLite is
unable to automatically take profit of an eventual R*Tree;
this is because the "mother table" and the "r*tree table"
are _TWO_ distinct and independent tables, so the
SQL query _MUST_ explicitly reference in an appropriate
way the R*Tree table; this will usually imply adding
an internal SUBQUERY against the "SpatialIntdex" so
to quickly filter the rows to be further processed.


> regardless of whether CreateSpatialIndex was invoked
> before or after inserting into the runways table, right?
>

yes, it's right.
when you call "CreateSpatialIndex" this SQL function will
automatically check if the "mother table" already contains
any row; if yes, all existing Geometries will be immediately
inserted into the R*Tree table.
this explains why you can indifferently create the R*Tree
before or after inserting any row into the "mother table".


> Either way, right now "select count(*) from spatialindex" returns a
> row count of zero.
>

note: the "SpatialIndex" table isn't at all a real table.
it's a very special VIRTUAL TABLE, i.e. it's kind of a
software driver returning (if possible) a standard SQL
resultset that you can easily reuse in the context of
an ordinary SQL query.
the driver implementing "SpatialIndex" will return a
valid resultset _ONLY_ if the SQL statement has the
one of the following two forms:

SELECT ROWID FROM SpatialIndex
WHERE f_table_name = <table_name> AND
search_frame = <blob_geom>;

or

SELECT ROWID FROM SpatialIndex
WHERE f_table_name = <table_name> AND
f_geometry_column = <column_name> AND
search_frame = <blob_geom>;

any other syntax will simply return a NULL resultset.
in other worlds: the SpatialIndex is just a logic
interface intended to make easier, simpler and safer
accessing an R*Tree spatial index, and can never be
used as if it was an ordinary table.

if you really want to check how many entries have
been stored into an R*Tree the correct SQL query
to be used is something like:

SELECT Count(*) FROM idx_runways_runway_polygon;

bye Sandro

Karl Lehenbauer

unread,
Jul 10, 2017, 6:10:46 PM7/10/17
to SpatiaLite Users
Thank you very much for the detailed explanation.  I've got it working now (!).  And you were right (of course) "select count(*) from idx_runways_runway_polygon" returned 15,802, same as the number of rows in the table.

Reply all
Reply to author
Forward
0 new messages