query with spatial index running much slower than without spatial index

167 views
Skip to first unread message

David Anderson

unread,
Jan 7, 2016, 5:28:49 PM1/7/16
to SpatiaLite Users

When I run this query
create table wtemp
as
select smp.simple_id,smp.slink,lau.lau_name
from smp
 left join
lau
on st_coveredby
(smp.shape,lau.shape)=1


It takes about 110 seconds. 

The green lines are the lau geometry, the hard to see black lines are the smp geometry

I'd like to speed this up.  It seems natural for a spatial index as each lau polygon covers only a portion of the area and should filter out the day from smp going into the st_coveredby test.

Adding this clause

where smp.rowid in
(select rowid from spatialindex where f_table_name='smp' and search_frame=lau.shape)

now make the query run in about 2600 seconds.

Not quite the result I was looking to achieve.

Any ideas on why using an index is substantially slower than without?

David


Brad Hards

unread,
Jan 7, 2016, 5:36:11 PM1/7/16
to spatiali...@googlegroups.com
On Thu, 7 Jan 2016 02:28:49 PM David Anderson wrote:
> Any ideas on why using an index is substantially slower than without?
Without the actual data, its hard to say, but you might be able to debug it
with some EXPLAIN QUERY PLAN application: https://www.sqlite.org/eqp.html

Brad

David Anderson

unread,
Jan 7, 2016, 6:28:41 PM1/7/16
to SpatiaLite Users
This is what I get from explain query
W/o index
0 0 1 SCAN TABLE simpplle_attributes AS sa
0 1 0 SEARCH TABLE slink_lu AS slu USING INDEX slu_slink_idx (slink=?)
0 2 2 SEARCH TABLE gis_data AS g USING INDEX gis_data_simpleid_idx (SIMPLE_ID=?)
0 3 3 SCAN TABLE lau

w
/ index
0 0 1 SCAN TABLE simpplle_attributes AS sa
0 1 0 SEARCH TABLE slink_lu AS slu USING INDEX slu_slink_idx (slink=?)
0 2 2 SEARCH TABLE gis_data AS g USING INDEX gis_data_simpleid_idx (SIMPLE_ID=?)
0 3 3 SCAN TABLE lau
0 0 0 EXECUTE CORRELATED LIST SUBQUERY 1
1 0 0 SCAN TABLE spatialindex VIRTUAL TABLE INDEX 2

I haven't used explain query much but it appears that the query planner is tacking the spatial index on at the end of the query and not using it to limit rows going into the join condition.
As an aside the table smp is actually a spatial view

Brad Hards

unread,
Jan 7, 2016, 11:36:50 PM1/7/16
to spatiali...@googlegroups.com
On Thu, 7 Jan 2016 03:28:40 PM David Anderson wrote:
> As an aside the table smp is actually a spatial view
Could you show the view construction?

Brad

David Anderson

unread,
Jan 8, 2016, 12:04:28 PM1/8/16
to SpatiaLite Users

Here it is.
CREATE VIEW smp AS
 SELECT g
.rowid AS rowid,
 slu
.simple_id AS simple_id,
 slu
.slink AS slink,
 g
.shape AS shape
 FROM slink_lu slu
,
 simpplle_attributes sa
,
 gis_data g
 WHERE g
.simple_id = slu.simple_id AND
 slu
.slink = sa.slink AND
 sa
.ownership = 'NF-OTHER';


Details.  The gis_data table has a spatial index, the spatial view has the index registered like so.

smp shape rowid gis_data shape 1


I did so a test of running a query against the  table storing the geometry data used in the view:

select smp.simple_id,lau.lau_name
from gis_data smp
 left join
lau
on st_coveredby
(smp.shape,lau.shape)=1
where
smp
.rowid in
(select rowid from spatialindex where f_table_name='gis_data' and search_frame=lau.shape)


and did not see any difference in speed.

a.fu...@lqt.it

unread,
Jan 8, 2016, 2:33:59 PM1/8/16
to spatiali...@googlegroups.com
On Fri, 8 Jan 2016 09:04:27 -0800 (PST), David Anderson wrote:
> I did so a test of running a query against the  table storing the
> geometry data used in the view:
>
> select smp.simple_id,lau.lau_name
> from gis_data smp
>  left join
> lau
> on st_coveredby(smp.shape,lau.shape)=1
> where
> smp.rowid in
> (select rowid from spatialindex where f_table_name='gis_data' and
> search_frame=lau.shape)
>
> and did not see any difference in speed.
>

Hi David,

the above query is not well written; let's see why.

FROM gis_data AS smp
LEFT JOIN lau ...

we have two different tables and a LEFT JOIN relationship;
so all rows from the first table (gis) will be unconditionally
inserted into the resultset, and a join relation to the second
table (lau) is expected to be possibly unfulfilled in several
cases.

WHERE smp.rowid IN (
SELECT rowid FROM SpatialIndex
WHERE f_table_name = 'gis_data' AND
search_frame = lau.shape)

now there is a WHERE clause intended to filter the rows
from the first table (smp) depending on a value from
the second table (lau.shape); this is rather bizarre,
and seems to contradict your own expectations declared
in the above LEFT JOIN, i.e. filtering the rows from
the second table by checking a spatial relationship.

I've set up a similar testcase so to directly check
what really happens under such odd conditions.
I've used a Polygon table (about 8,000 italian
municipalities) and a Point table (about 1,000
railway stations); several municipalities have
more than a single stations, but the vast majority
of municipalities have no station at all.

test #1 - not using the spatial index
-------------------------------------
SELECT c.nome AS municipality, s.den AS railway_station
FROM com2011 AS c
LEFT JOIN stazioni AS s ON (
ST_CoveredBy(s.geometry, c.geometry) = 1);
-----
CPU Time: user 118.498360 sys 0.202801


test #2 - correctly using the spatial index
-------------------------------------------
SELECT c.nome AS municipality, s.den AS railway_station
FROM com2011 AS c
LEFT JOIN stazioni AS s ON (
ST_CoveredBy(s.geometry, c.geometry) = 1 AND
s.rowid IN (SELECT rowid FROM SpatialIndex
WHERE f_table_name = 'stazioni'
AND search_frame = c.geometry));
-----
CPU Time: user 1.248008 sys 0.280802

as you can see the spatial index has introduced an
amazing performance boost of about 100X.


test #3 - misusing the spatial index
------------------------------------
SELECT c.nome AS municipality, s.den AS railway_station
FROM com2011 AS c
LEFT JOIN stazioni AS s ON
ST_Intersects(s.geometry, c.geometry) = 1
WHERE c.rowid IN (SELECT rowid FROM SpatialIndex
WHERE f_table_name = 'com2011'
AND search_frame = s.geometry);
-----
CPU Time: user 119.013163 sys 0.327602

we've simply swapped the spatial index selection; this time
we'll use the spatial index based on the _first_ table (not
on the second, as it was in the previous case).
note: this query is formally equivalent to your own.

the measured process timing definitely confirms that no spatial
index at all has been used under such extravagant conditions.

lesson to learn: when you have more tables and more spatial
indices you absolutely have to query the "right" spatial
index, i.e. the one directly supporting the table on the
(LEFT) JOIN side.

bye Sandro

David Anderson

unread,
Jan 8, 2016, 4:35:57 PM1/8/16
to SpatiaLite Users
Thanks Sandro,
recasting the query as this:
select smp.simple_id,smp.slink,lau.lau_name
from
smp
 left join
lau
on st_coveredby
(smp.shape,lau.shape)=1
where lau.rowid in
(select rowid from spatialindex where f_table_name='lau' and search_frame=smp.shape)


cut the time down to 72 seconds.  Which is better but not the level of improvement that I was hoping to get.


One thought I had is the complexity of the search, ie the geometry from the lau table is complex.  I did something similar to your reply (http://www.gaia-gis.it/spatialite-3.0.0-BETA1/WorldBorders.pdf) in this thread https://groups.google.com/forum/#!searchin/spatialite-users/index/spatialite-users/hV43tQPnPgo/SBnO8cJd7PIJ.

using the newer squaregrid and elementarygeometries functions to create the simpler geometry version of the lau layer.


The final join query ran quicker, about 38 seconds but there was some time, about 10 seconds, in generating the grid.


Just quick comment about the query itself.  It is a left join for a reason, as the query shown here is  really a subquery for a query that groups by the lau_name field.  Doing the left join instead of an inner join only adds a few seconds to the query.


David

Reply all
Reply to author
Forward
0 new messages