Using the SpatialIndex correctly

48 views
Skip to first unread message

Jim

unread,
Apr 1, 2024, 2:55:22 PMApr 1
to SpatiaLite Users
I have a database created from a shapefile by spatialite_gui where I set the SRID to 4326 and requested that a SpatialIndex be created and UTF-8.   All other settings for loading a shape file were left as presented.

The database contains a Polygon named 'Geometry'.

If I query the database using the following, I get the expected result.

SELECT PO_NAME, STATE, ZIP_CODE  
FROM uszip  
WHERE  ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1 

However, if I use this query, I get an empty result set:

SELECT PO_NAME, STATE, ZIP_CODE  
FROM uszip  
WHERE
ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1
AND uszip.PK_UID IN  
(SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'uszip'  
AND search_frame = 'Geometry') ;

I am at a loss.

Is there anything obviously wrong with the second query?

How do I tell if, in fact,  the Spatialndex was created and properly populated?

Regards,
Jim


a.fu...@lqt.it

unread,
Apr 1, 2024, 3:27:27 PMApr 1
to spatiali...@googlegroups.com
On Mon, 1 Apr 2024 11:55:22 -0700 (PDT), Jim wrote:
> I have a database created from a shapefile by spatialite_gui where I
> set the SRID to 4326 and requested that a SpatialIndex be created and
> UTF-8.   All other settings for loading a shape file were left as
> presented.
>
> The database contains a Polygon named 'Geometry'.
>
> If I query the database using the following, I get the expected
> result.
>
> SELECT PO_NAME, STATE, ZIP_CODE  
> FROM uszip  
> WHERE  ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610,
> 4326)) = 1 
>
> However, if I use this query, I get an empty result set:
>
> SELECT PO_NAME, STATE, ZIP_CODE  
> FROM uszip  
> WHERE
> ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) =
> 1
> AND uszip.PK_UID IN  
> (SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'uszip'  
> AND search_frame = 'Geometry') ;
>
> I am at a loss.
>
> Is there anything obviously wrong with the second query?
>

YES ;-)

... AND search_frame = 'Geometry'

if you write 'geometry' (quoted) it becomes a text constant,
it cannot be the reference to the name of a column where you
expect to find somem BLOB-Geometry.
when SpatiaLite finds a text string where it expects to find
a BLOB-Geometry it will obviously fail returning a NULL.

but there is a second more substantial error.

... SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'uszip'

you are querying the SpatialIndex which supports the "uszip" table;
therefore the filter-geometry defined by search_frame certainly
cannot be the one taken from the same table.
it should necessarily be the other one coming fro
MakePoint().


> How do I tell if, in fact,  the Spatialndex was created and properly
> populated?
>

a quick and fast check:

SELECT * FROM idx_<table>_<geometry>;

where <table> is the name of the Table
and <geometry> is the name of the Column
containing the Geometries.


a more sophisticated method:

SELECT CheckSpatialIndex('table', 'geometry');

if the SpatialIndex is actually present and is
correctly populated it will return 1 (TRUE).

best regards,
Sandro

Mark Johnson

unread,
Apr 1, 2024, 3:38:25 PMApr 1
to SpatiaLite Users
On Monday 1 April 2024 at 21:27:27 UTC+2 a.fu...@lqt.it wrote:
On Mon, 1 Apr 2024 11:55:22 -0700 (PDT), Jim wrote:
> I have a database created from a shapefile by spatialite_gui where I
> set the SRID to 4326 and requested that a SpatialIndex be created and
> UTF-8.   All other settings for loading a shape file were left as
> presented.
>
> The database contains a Polygon named 'Geometry'.
>
> If I query the database using the following, I get the expected
> result.
>
> SELECT PO_NAME, STATE, ZIP_CODE  
> FROM uszip  
> WHERE  ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610,
> 4326)) = 1 
Also the 'ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1' should come after the spatial query.

As is, you are performing ST_Contains on every record and if found then the spatial query.

The spatial query returns candidates where the ST_Contains is possible. The records NOT returned by the spatial query are not possible - thus no need to call ST_Contains for those records.

The resulting query will be much swifter. 

Jim

unread,
Apr 1, 2024, 3:44:19 PMApr 1
to SpatiaLite Users
OK, I made changes and the query is now:

SELECT PO_NAME, STATE, ZIP_CODE  
FROM uszip  
WHERE
ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1  
AND uszip.PK_UID IN  
(SELECT uszip.PK_UID FROM SpatialIndex WHERE f_table_name = 'uszip'
AND search_frame = Geometry) ; 

And it works just fine except that it is .1 sec slower, on average,  than without the reference to the SpatialIndex.
Is this to be expected? 
The performance is important as I have a large number of such requests to do.

Regards,
Jim

a.fu...@lqt.it

unread,
Apr 1, 2024, 3:53:39 PMApr 1
to spatiali...@googlegroups.com
On Mon, 1 Apr 2024 12:44:19 -0700 (PDT), Jim wrote:
> And it works just fine except that it is .1 sec slower, on average, 
> than without the reference to the SpatialIndex.
> Is this to be expected? 
>

Querying the SpatialIndex offers great benefits, but it also comes
at a cost.

When you have huge tables (with millions of geometries) the
SpatialIndex is able to give lightning-fast answers.

But if the geometries are limited in number (tens or hundreds),
using the SpatialIndex could paradoxucally be a cause of slowness.

bye Sandro

Jim

unread,
Apr 1, 2024, 3:56:29 PMApr 1
to SpatiaLite Users
Mark,

I'm sorry, but I don't understand your comment.  

Is this what you meant:

SELECT PO_NAME, STATE, ZIP_CODE  
FROM uszip  
WHERE
ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1  
AND uszip.PK_UID IN  
(SELECT uszip.PK_UID FROM SpatialIndex WHERE f_table_name = 'uszip' AND search_frame = Geometry  
AND ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1)

I got the correct result, but it was still a but slower that without the SpatialIndex query

Could you please modify the query in the way you suggested?

Regards,
Jim

Jim

unread,
Apr 1, 2024, 4:03:37 PMApr 1
to SpatiaLite Users
Sandro,

OK, that is starting to explain the speed issue. The 'uszip" table only has 35000 rows,

I have used the SpatialIndex approach with a table of 4M rows with great success.

As I watch spatialite_gui do this query, it appears that it is scanning all of the rows before is presents the result.

I guess I didn't expect to see that.

Regards,
Jim

Mark Johnson

unread,
Apr 1, 2024, 4:06:16 PMApr 1
to SpatiaLite Users
On Monday 1 April 2024 at 21:44:19 UTC+2 Jim wrote:
OK, I made changes and the query is now:

SELECT PO_NAME, STATE, ZIP_CODE  
FROM uszip  
WHERE
ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1  
AND uszip.PK_UID IN  
(SELECT uszip.PK_UID FROM SpatialIndex WHERE f_table_name = 'uszip'
AND search_frame = Geometry) ; 

And it works just fine except that it is .1 sec slower, on average,  than without the reference to the SpatialIndex.
Is this to be expected? 
The performance is important as I have a large number of such requests to do.
Using the spatial query, when used correctly (the ST_Contains after the spatial query), should be noticeably swifter.

The ST_Contains is very CPU expensive and should only be called when required.

The spatial query uses a rectangle around the geometry (which contain an wiggely border). The point may be inside or outside of the wiggely border (ST_Contains checks this).

If out of 10000 records, 500 are returned by the spatial query, the the ST_Contains will be called 500 times.

At present you are calling ST_Contains 10000 times (of which less than 500 will be successfull) and then, if successfull, you are THEN calling the spatial index (which is a waste of time - since ST_Contains has allready told you that it is inside the rectangle). 

a.fu...@lqt.it

unread,
Apr 1, 2024, 4:09:04 PMApr 1
to spatiali...@googlegroups.com
On Mon, 1 Apr 2024 12:44:19 -0700 (PDT), Jim wrote:
> SELECT PO_NAME, STATE, ZIP_CODE  
> FROM uszip  
> WHERE
> ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) =
> 1  
> AND uszip.PK_UID IN  
> (SELECT uszip.PK_UID FROM SpatialIndex WHERE f_table_name = 'uszip'
> AND search_frame = Geometry) ; 
>

note that this query continues to be wrong because

search_frame = Geometry

this way you are asking the SpatialTree if any Geometry
from the uszip table matches some BBOX in the R*Tree,
which will be always TRUE.
in other words, the SpatialIndex will no longer act
as a filter as intended.

------

it should be instead

search_freme = MakePoint(-122.365680,47.763610, 4326)

in order to check if the reference point matches the
BBOX of some geometry from the uszip table.

bye Sandro


Mark Johnson

unread,
Apr 1, 2024, 4:09:55 PMApr 1
to SpatiaLite Users
On Monday 1 April 2024 at 21:56:29 UTC+2 Jim wrote:
Mark,

I'm sorry, but I don't understand your comment.  

Is this what you meant:

SELECT PO_NAME, STATE, ZIP_CODE  
FROM uszip  
WHERE
ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1  
AND uszip.PK_UID IN  
(SELECT uszip.PK_UID FROM SpatialIndex WHERE f_table_name = 'uszip' AND search_frame = Geometry  
AND ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1)

I got the correct result, but it was still a but slower that without the SpatialIndex query
That is because you are calling  ST_Contains twice. 
Remove the first one and it should bring faster results.

Mark Johnson

unread,
Apr 1, 2024, 4:32:13 PMApr 1
to SpatiaLite Users
On Monday 1 April 2024 at 22:06:16 UTC+2 Mark Johnson wrote:
On Monday 1 April 2024 at 21:44:19 UTC+2 Jim wrote:
OK, I made changes and the query is now:

SELECT PO_NAME, STATE, ZIP_CODE  
FROM uszip  
WHERE
ST_Contains(uszip.Geometry, MakePoint(-122.365680,47.763610, 4326)) = 1  
AND uszip.PK_UID IN  
(SELECT uszip.PK_UID FROM SpatialIndex WHERE f_table_name = 'uszip'
AND search_frame = Geometry) ; 

And it works just fine except that it is .1 sec slower, on average,  than without the reference to the SpatialIndex.
Is this to be expected? 
The performance is important as I have a large number of such requests to do.
Using the spatial query, when used correctly (the ST_Contains after the spatial query), should be noticeably swifter.

The ST_Contains is very CPU expensive and should only be called when required.

The spatial query uses a rectangle around the geometry (which contain an wiggely border). The point may be inside or outside of the wiggely border (ST_Contains checks this).

If out of 10000 records, 500 are returned by the spatial query, the the ST_Contains will be called 500 times.

At present you are calling ST_Contains 10000 times (of which less than 500 will be successfull) and then, if successfull, you are THEN calling the spatial index (which is a waste of time - since ST_Contains has allready told you that it is inside the rectangle). 

The spatialite cookbook-5 (https://www.gaia-gis.it/gaia-sins/spatialite-cookbook-5/index.html) contains (I believe) a good sample.

List of italian cities, which include the Vatican City (an enclave of Rome) and the city of Rome.

Your point is on the steps of St. Peters (inside the Vatican City, but outside of the city of Rome).

The spatialite query will return 2 records (out of a few 100 thousand) and ST_Contains will be called 2 times.

ST_Contains will return true for the Vatican City.
ST_Contains will return false for the city of Rome.
Reply all
Reply to author
Forward
0 new messages