A good tutorial/reference to better understand spatial indices and inner sub-queries.

354 views
Skip to first unread message

Stefano Polloni

unread,
Oct 25, 2017, 4:54:20 PM10/25/17
to SpatiaLite Users
Hi there,

I recently switched my spatial analysis workflow from pyGIS (manipulating exclusively shapefiles) to Spatialite w/ pysqlite2. I had no prior experience with SQL but finally managed to set up my first database and execute some simple spatial queries. With that said, I am struggling to fully utilize the SpatialIndex.  For example,  the concepts of search_frame and ROWID are still elusive to me and I have trouble setting up a proper inner sub-query when invoking the spatial index. Does anyone a good step-by-step guide for idiots?

Thanks so much,
Stefano

mj10777

unread,
Oct 26, 2017, 3:18:08 AM10/26/17
to SpatiaLite Users


On Wednesday, 25 October 2017 22:54:20 UTC+2, Stefano Polloni wrote:
Hi there,

I recently switched my spatial analysis workflow from pyGIS (manipulating exclusively shapefiles) to Spatialite w/ pysqlite2. I had no prior experience with SQL but finally managed to set up my first database and execute some simple spatial queries. With that said, I am struggling to fully utilize the SpatialIndex.  For example,  the concepts of search_frame and ROWID are still elusive to me and I have trouble setting up a proper inner sub-query when invoking the spatial index. Does anyone a good step-by-step guide for idiots?
The main page for the use of the SpatialIndex can be found here:

As for a 'good' step-by-step guide is more difficult, since everyone perceives things differently.
Often it is combination of different samples, where during one, a 'click' happens and things begin to be come clear.

At the moment a new spatialite tool is under review, so a sample based on that may help as an alternative to the above link.

The working name of the tool is 'spatialite_dem' which will
- import a standard dem/xyz file often issued with height data
- query the table with a x/y position, returning the nearest z-value found
- set the z-value on prepared existing geometries, where the z-value should be replaced with the nearest point found

This uses a query using the SpatialiIndex, that looks like this
- with comments inside the Sql-Query that may help, as a reference, to create your own queries

SELECT
 ST_Z(utm_point)
FROM
 "berlin_dhhn92_2007"
WHERE
(
 -- First condition
 ( -- Out of the 1.116.000.000 records, select only those that are in range [4 will be returned]
  -- for a SpatialView, the Primary-Key must be used, otherwise ROWID can be used
  id_dem IN
  (
   -- Use the Spatialite internal logic to simplify the SQL-Query, avoiding rounding errors that could miss a valid geometry
   SELECT ROWID FROM SpatialIndex
   WHERE
   (
    -- Use the created index for the given TABLE
    -- To query an ATTACHED Database, replace 'main' with the used schema-name
    -- > 'DB=main.' for a non-ATTACHED Database is not mandatory
    (f_table_name = 'DB=main.berlin_dhhn92_2007') AND
    -- Use the given Geometry-Column [mandatory only where there is more than 1 Geometry-Column]
    (f_geometry_column = 'utm_point') AND
    -- search an area 0.5 meters around the given point (= 1 meter width/height)
    (search_frame = ST_Buffer(ST_Transform(MakePoint(24700.552,20674.744,3068),25833),0.5))
   )
  )
 ) AND
 -- Second condition [will run only against records that fulfill the previous condition(s)]
 (  -- From the 4 records within the 1 meter range returned, select the nearest that is NOT NULL, insuring a valid result
  ST_ClosestPoint(utm_point, ST_Transform(MakePoint(24700.552,20674.744,3068),25833)) IS NOT Null)
 )
 -- Order the 4 returned results by distance
ORDER BY ST_Distance(utm_point,ST_Transform(MakePoint(24700.552,20674.744,3068),25833)) ASC
 -- force the return of 1 result, being the nearest point
LIMIT 1;

The Database-Table being queried contains 1.116 billion points, one point per meter.

Goal is to return the z-value of the nearest point found from the given point (MakePoint(24700.552,20674.744,3068))
- so the first task is to filter out all points that are more than 1 meter away from the given point

Important is, that within the 'WHERE' Statement
- the first condition filters out everything that is out of range
- then searches for the nearest point with what is left over

Since commands such as
- ST_ClosestPoint
- ST_Contains
- ST_Within
- ST_Intersects

are very 'expensive', they should be avoided when not needed.
That is why here, it is used after the SpatialIndex has been queried.

In this sample, 4 records are returned from the SpatialIndex
- thus ST_ClosestPoint is called 4 times
The total time needed, in this sample, is 02 milli-secs.

If it was done the other way around, it would probably take hours, since ST_ClosestPoint would have been called 1.116 billion times.

In a other form of this query, the 4 results looks like this:

id_dem                       x_value                      y_value                       z_value                      ST_Z(utm_point)        distance (here in meters)
630745428 391427.000000 5819298.000000 33.560000 33.560000 0.703269
630745429 391428.000000 5819298.000000 33.600000 33.600000 0.705196
630709428 391427.000000 5819297.000000 33.620000 33.620000 0.709038
630709429 391428.000000 5819297.000000 33.610000 33.610000 0.710950

With the ORDER BY ST_Distance( ... ) ASC
- we make sure that the nearest POINT is the first result

With LIMIT 1 we make sure that one one result is returned.

In QGis the result looks like this:

with the agua/blue point being the given point and the four green points, that are partially covered, the points returned by the use of the SpatialIndex.

The circles are 0.5 meters around the points (1 meter width/height).

The final result is the upper-left green point partially covered by the agua/blue point.


Hope this helps,

Mark

 

Thanks so much,
Stefano

a.fu...@lqt.it

unread,
Oct 26, 2017, 7:17:27 AM10/26/17
to spatiali...@googlegroups.com
On Wed, 25 Oct 2017 13:54:20 -0700 (PDT), Stefano Polloni wrote:
> Hi there,
>
> I recently switched my spatial analysis workflow from pyGIS
> (manipulating exclusively shapefiles) to Spatialite w/ pysqlite2. I
> had no prior experience with SQL but finally managed to set up my
> first database and execute some simple spatial queries.
>

Hi Stefano,

SQLite/SpatiaLite are fully based on SQL, and consequently using them
at their best certainly requires possessing fairly good competences
about SQL (never forgetting that SQL is a complex programming language
supporting sophisticated and complex operations).

You are going to encapsulate your SQL statements within a Python
script, so your mind will probably continue to be mainly focused
on Python own syntax and to consider SQL syntax just as an odd and
extravagant accessory, but this is a conceptual error.

Reality is that your application will now be mainly driven by SQL
syntax, whilst Python syntax will just gluing together single SQL
elementary steps into more complex logical sequences; and consequently
the real strategic component will be SQL and not Python itself.

My personal suggestion; start by initially using only pure SQL
tools such as sqlite3, spatialite and spatialite_gui.
take your right time so to acquire a good familiarity with SQLite
alone; in a second time, when you'll feel confident enough in
mastering even rather complex SQL statements, you can then start
playing with the SpatiaLite spatial extensions.
And finally you'll be ready to adventure yourself in encapsulating
your own Spatial SQL statements into a Python program.
But at this point you'll probably discover that using Python isn't
strictly required, and that you'll be probably able to resolve even
the most complex spatial data processing problems just by using
some advanced SQL scripting.

All this preliminary learning about SQL and Spatial SQL will probably
require several days of intensive studying and testing, but it will
certainly save a lot of troubles and wasted time in your following
development activities.


> With that
> said, I am struggling to fully utilize the SpatialIndex. For
> example, the concepts of search_frame and ROWID are still elusive to
> me and I have trouble setting up a proper inner sub-query when
> invoking the spatial index.
>

It's mainly a false problem; you are now perceiving the Spatial
Index (and related frills) as mysterious and overly complex
simply because you still lack a suffiently deep comprehension
of the SQL language and more specifically of the SQLite's
specific SQL "dialect".
Just to say, you are attempting to directly build the roof of
your house completely ignoring anything about the underlaying
walls and foundations; it could never work.


> Does anyone a good step-by-step guide for idiots?
>

suggested readings:

[1] https://www.sqlite.org/lang.html
[2] https://www.sqlite.org/lang_corefunc.html
[3] https://www.sqlite.org/pragma.html
[4] http://www.gaia-gis.it/spatialite-2.4.0-4/spatialite-cookbook/
[5] https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex
[6] http://www.frogmouth.net/blog/?p=23

bye Sandro

Stefano Polloni

unread,
Oct 26, 2017, 12:06:52 PM10/26/17
to SpatiaLite Users
Fantastic! Thank you both.
Reply all
Reply to author
Forward
0 new messages