Need help with Spatialite index

193 views
Skip to first unread message

Jason Williams

unread,
Aug 2, 2022, 1:57:55 PM8/2/22
to SpatiaLite Users
I am using Spatialite 5. Built an index idx_mytable_geometry.  
select * from idx_mytable_geometry returns pkid, xmin, xmax, ymin, ymax.

If I want to use index in the place of
 WHERE ST_Intersects({pointSrc}, geometry)

Is this the correct way:

                       WHERE pk_uid in
                        (select pkid from   idx_mytable_geometry
                        where xmin < X({pointSrc}) and xmax > X({ pointSrc })
                          and ymin < Y({pointSrc}) and ymax > Y({pointSrc }))";

Any help with Spatialite 5 index examples appreciated!

i-s-o

unread,
Aug 2, 2022, 6:29:25 PM8/2/22
to SpatiaLite Users
Here is the pattern I use to engage the spatial index in SpatiaLite:

SELECT ...
FROM pointSrc a, mytable b
WHERE
  ST_Intersects(a.geometry, b.geometry)
  AND b.rowid in (
    select rowid from SpatialIndex
    where f_table_name = 'mytable' AND search_frame = a.geometry
  )

You can find more information here:
https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex

Jason Williams

unread,
Aug 2, 2022, 8:33:04 PM8/2/22
to SpatiaLite Users
Thank you for the ideas! Got it! I have to add "and"...
BTW, I am using spatialite5, there is no  f_table_name in my index, did I do something wrong?

a.fu...@lqt.it

unread,
Aug 3, 2022, 1:57:47 AM8/3/22
to spatiali...@googlegroups.com
On Tue, 2 Aug 2022 17:33:04 -0700 (PDT), Jason Williams wrote:
> Thank you for the ideas! Got it! I have to add "and"...
> BTW, I am using spatialite5, there is no f_table_name in my index,
> did
> I do something wrong?
>

Jason,

if you notice, you are querying the SpatialIndex table.
this one is a very special table, that actually is a VIRTUAL TABLE,
that is kind of a driver simulating to be a table but being in
reality just a sw component.

said in other words: there is no table SpatialIndex at all
into the database, it's just a symbolic interface that you
can query using standard SQL syntax.

SpatialIndex virtually defines the following columns:

1. f_table_name [input]
name of the main Geometry table

2. f_column_name [input]
name of the Geometry column supported by a corresponding
R*Tree SpatialIndex
it can be omitted if the main Geometry table defines
just a single Geometry (will be automatically detected).

3. search_frame [input]
a Geometry acting as a Spatial Filter

4. rowid [output[
the unique IDs of the rows within f_table_name satisfying
the given Spatial Filter.

all together now:

SELECT rowid
FROM SpatialIndex
WHERE f_table_name = 'mytable' AND
search_frame = a.geometry

explained in a verbose form this practically means:

"please SpatialIndex interface, search for a table named
mytable that is expected to be supported by a corresponding
R*Tree. Query this Spatial Index using a.geometry as a
Spatial Filter and finally return back to the caller a list
of all ROWIDs satisfying this filter."

a short rationale:
SQLite supports a very effective form of Spatial Index
represented by R*Trees, but differently from any other
SpatialDBMS they are not at all an "index" because
they actually are separate tables with no explicit
relation with their parent table.

SpatiaLite on its own adds the "syntactic sugar" of
virtual SpatialIndex which allows for an easier
access to spatial indexing completely masking all
low level details of the R*Trees implementation.

bye Sandro

Jason Williams

unread,
Aug 3, 2022, 10:23:15 AM8/3/22
to SpatiaLite Users
Thank you Sandro for the detailed information.

So if I want to search a point (lon, lat) in a table <polygon_table> with a lot of polygons with srid 26918, I will use query like:

WHERE ST_Intersects(ST_Transform(SetSRID(MakePoint(<lon>, <lat>), 4326), 26918), SetSRID(geometry, 26918))
                            AND ROWID in
                            (SELECT ROWID
                                FROM SpatialIndex
                                WHERE f_table_name = 'DB=main.<polygon_table>' AND
                                search_frame = ST_Transform(SetSRID(MakePoint(<lon>, <lat>), 4326), 26918))

is that right?

I do not have a table with the lat/lon stored, I got the lat/lon on-the-fly as input to my function. Also the SRID 26918 is an input to my function on the fly, the same geometry column in the <polygon_table> will have different SRID for different rows, so the <polygon_table> stored as SRID 0.

Thank you very much!

Jason Williams

unread,
Aug 3, 2022, 10:25:17 AM8/3/22
to SpatiaLite Users
Basically I need to search a polygon where the point <lon, lat> is in.

a.fu...@lqt.it

unread,
Aug 3, 2022, 11:02:27 AM8/3/22
to spatiali...@googlegroups.com
On Wed, 3 Aug 2022 07:23:14 -0700 (PDT), Jason Williams wrote:
> I do not have a table with the lat/lon stored, I got the lat/lon
> on-the-fly as input to my function.
>

this surely is a far by optimal strategy because all of those
ST_Transform() will almost certainly cause a severe bottleneck
resulting in slowness.


> Also the SRID 26918 is an input to
> my function on the fly, the same geometry column in the
> <polygon_table> will have different SRID for different
> rows, so the <polygon_table> stored as SRID 0.
>

applying the spatial indexind on behalf of geometries
belonging to different SRIDs seems to be a fantastic
recipe for future disasters.

always remember: an R*Tree spatial index simply is
an ordered hierarchy of overlapping rectangles allowing
for quick spatial filtering.
but it works only if all the rectangles adopts the
same indentical reference system, otherwise it
will be a messy chaos leading to unpredictable
results.

short conclusion: there are very strong reasons
explaining why all the Geometries in the same
column must absolutely share the same SRID, as
usually required by any Spatial DBMS (this
including SpatiaLite too).

bye Sandro


Jason Williams

unread,
Aug 4, 2022, 10:57:03 AM8/4/22
to SpatiaLite Users
Sandro, thank you! Now we made the SRID consistent for each table, each table is for different region so SRIDs are different for each table.
As for the input point <lon, lat> we have to transform to the SRID on-the-fly because we only calculate the region for the point on-the-fly.

Any advices on how to solve the transform issue?

mj10777

unread,
Aug 4, 2022, 2:10:50 PM8/4/22
to SpatiaLite Users
On Thursday, 4 August 2022 at 16:57:03 UTC+2 Jason Williams wrote:
Sandro, thank you! Now we made the SRID consistent for each table, each table is for different region so SRIDs are different for each table.
As for the input point <lon, lat> we have to transform to the SRID on-the-fly because we only calculate the region for the point on-the-fly.

Any advices on how to solve the transform issue?

See the samples at:
SpatiaLite Cookbook (Version 5.0.0) Topic: Haute cuisine
recipe #16a: SpatialIndex as BoundingBox


Notice that samples do the SpatialIndex first and then the ST_Intersects as second. In this way the costly (i.e. slow) ST_Intersects will only be done on geometries that are within the range of the search area.

The goal is to filter out all geometries that out of range (where ST_Intersects will never be true).

If the table has 1000 geometries and 50 are within the range of the given SpatialIndex, then ST_Intersects will be executed 50 time.

Your sample, ST_Intersects will be executed 1000 times (being done first). When it is then found you ask if it is with the search range: this will always be true.

To our great horror, when rewriting the Version 5 of the Cookbook, we noticed that the original samples did this the wrong way around!

First filter out what you don't need. Inside the WHERE statement do the SpatialIndex as first subquery, afterwitch comes the AND where an second subquery (costly) is ONLY done when the first subquery is true. if the first subquery is false the everything after the AND is skipped and it goes on to the next record.

mj10777

unread,
Aug 4, 2022, 3:34:29 PM8/4/22
to SpatiaLite Users
On Thursday, 4 August 2022 at 16:57:03 UTC+2 Jason Williams wrote:
Sandro, thank you! Now we made the SRID consistent for each table, each table is for different region so SRIDs are different for each table.
As for the input point <lon, lat> we have to transform to the SRID on-the-fly because we only calculate the region for the point on-the-fly.

Any advices on how to solve the transform issue?

To spare unneeded, costly, transformations (i.e. do it only once) create the source_point as a subquery datasource inside the FROM portion and use it where required.

SELECT
 source_geometry.*
FROM
 idx_mytable_geometry AS source_geometry,
 (
  SELECT ST_Transform(SetSRID(MakePoint(<lon>,<lat>), 4326), 26918))
 ) source_point
WHERE
(
 source_geometry.ROWID IN
 (
  SELECT ROWID FROM SpatialIndex WHERE
  (
   -- the BoundingBox is a rectangle around the (likley wiggly) boundry of the Community MULTIPOLYGON
   (f_table_name = 'idx_mytable_geometry') AND
   (search_frame = source_point)
  )
 ) AND
 ST_Intersects(source_point,source_geometry) 


This should work, but not tested.

mj10777

unread,
Aug 4, 2022, 3:39:25 PM8/4/22
to SpatiaLite Users
correction:

ST_Intersects(source_point,source_geometry.geometry)

Jason Williams

unread,
Aug 5, 2022, 12:04:28 AM8/5/22
to SpatiaLite Users
Will try this!

Antonio Valanzano

unread,
Aug 10, 2022, 1:05:33 AM8/10/22
to SpatiaLite Users
Hi Mark,
I am not an expert of SQL but from what I have read the order of condition in WHERE  clause doesn't matter,

I made the following test

2 tables
points: google_131_comuni (177951 rows)
polygons:  comuni_italiani_2021 (7903 rows)

--  test 1: first intersection and then spatialindex
-----------------------------------------------------------------------------
CREATE  TABLE app_comune_1 AS
SELECT g.pk_uid, c.comune
FROM google_131_comuni as g, comuni_italiani_2021 as c
WHERE ST_INTERSECTS(g.geom, c.geom) = 1
  AND c.ROWID IN (SELECT ROWID
                    FROM Spatialindex
                   WHERE f_table_name = 'comuni_italiani_2021'
                     AND search_frame = g.geom);
-- tempo di elaborazione: 00:01:09.510


EXPLAIN QUERY PLAN
SELECT g.pk_uid, c.comune
FROM google_131_comuni as g, comuni_italiani_2021 as c
WHERE ST_INTERSECTS(g.geom, c.geom) = 1
  AND c.ROWID IN (SELECT ROWID
                    FROM Spatialindex
                   WHERE f_table_name = 'comuni_italiani_2021'
                     AND search_frame = g.geom);
-- 4 rows
3    0    0    SCAN g
5    0    0    SEARCH c USING INTEGER PRIMARY KEY (rowid=?)
7    0    0    CORRELATED LIST SUBQUERY 1
9    7    0    SCAN Spatialindex VIRTUAL TABLE INDEX 2:     


-- test 2: first spatialindex and then spatialindex intersection
-------------------------------------------------------------------------------------------------
CREATE  TABLE app_comune_2 AS
SELECT g.pk_uid, c.comune
FROM google_131_comuni as g, comuni_italiani_2021 as c
WHERE c.ROWID IN (SELECT ROWID
                    FROM Spatialindex
                   WHERE f_table_name = 'comuni_italiani_2021'
                     AND search_frame = g.geom)
  AND ST_INTERSECTS(g.geom, c.geom) = 1;
-- tempo di elaborazione: 00:01:04.000


EXPLAIN QUERY PLAN
SELECT g.pk_uid, c.comune
FROM google_131_comuni as g, comuni_italiani_2021 as c
WHERE c.ROWID IN (SELECT ROWID
                    FROM Spatialindex
                   WHERE f_table_name = 'comuni_italiani_2021'
                     AND search_frame = g.geom)
  AND ST_INTERSECTS(g.geom, c.geom) = 1;
-- 4 rows
3    0    0    SCAN g
5    0    0    SEARCH c USING INTEGER PRIMARY KEY (rowid=?)
7    0    0    CORRELATED LIST SUBQUERY 1
9    7    0    SCAN Spatialindex VIRTUAL TABLE INDEX 2:


As you can see the output of the EXPLAIN QUERY PLAN is the same and there is only a small difference in the time of execution (few seconds).

Do you know if in SQLite there is a way to force the plan of execution as it exists in SQL Server ?

Could you give me more explanations about your assertion of the importance of the order of condition?

Regards
Antonio


mj10777

unread,
Aug 10, 2022, 8:47:33 AM8/10/22
to SpatiaLite Users
After consultation with Sandro, I can give you the following answer:
This is the result of your EXPLAIN QUERY PLAN:
- which is the same for both versions

3 0 0 SCAN g
5 0 0 SEARCH c USING INTEGER PRIMARY KEY (rowid=?)
7 0 0 CORRELATED LIST SUBQUERY 1
9 7 0 SCAN Spatialindex VIRTUAL TABLE INDEX 2:    

What exactly is it referring to?

- 3: is search_frame = g.geom [SCAN]
   - within the SpatialIndex query
- 5: is c.ROWID [SEARCH]
- 7: is the subquery of the SpatialIndex
Zitierten Text ausblenden


- 9: is FROM Spatialindex
Each are components of standard SQL that SQLite uses and understands.

Where is ST_INTERSECTS(g.geom, c.geom) being shown in this PLAN?
- there is no 'SCAN c'

But more importantly how should the SQLite optimizer evaluate what any external function does?

Since all Spatialite functions are external functions, the optimizer cannot determine where they should be best placed. How can it, having no idea what it does?

Therefore we conclude that the optimizer simply ignores it, leaving it where the user, in their wisdom, has placed it.

Your query result times confirms that the second version brings better time results, since SpatialIndex had reduced the amount rows that ST_Intersects needs to work on.

Therefore when using EXPLAIN QUERY PLAN, you must take into consideration that the SQLite optimizer can only deal with components that SQLite is aware of (i.e. which are not external functions such as any Spatialite function).

Why then does the optimizer see and evaluates the SpatialIndex?
Answer: because it is in a sub-query and looks (in the eyes of SQLite) like a TABLE.

The Stackoverflow answer doesn't take external functions into account (as used by SQLite).

Their statement is however true for sql components that SQLite understands (internal functions).

Hope this helps.

Mark

---
From Sandros reply:

>EXPLAIN QUERY PLAN simply reports the order of operations affecting the tables, but always ignores the evaluation of functions and other comparisons.
>
>I'm assuming that there is an implicit rule stating "as soon as possible, immediately when any required operand is available"
>
>and for sure SQLite can't have the slightest idea about the relative cost of executing ST_Intersects.
>
>So I'm still convinced that the right order of the whole query is higly important; 

Regards
Antonio


Antonio Valanzano

unread,
Aug 11, 2022, 1:18:10 AM8/11/22
to SpatiaLite Users
Mark and Sandro
thanks for the detailed answer which contains information that are  well beyond  my knowledge of SQL and SQLite.

Antonio


Reply all
Reply to author
Forward
0 new messages