On Sun, 4 Aug 2024 10:18:25 +0100,
ckgoo...@gmail.com wrote:
> SELECT COUNT (*) FROM spt_Cities
> WHERE ST_Contains( BuildCircleMbr( 0.0, 51.5, 1.0), location );
> Returns me 12 and when I look at the cities returned they all make
> sense from my knowledge of southeast United Kingdom.
>
> So this will do for now though I would still like to get the results
> limited to a genuine circle rather than an MBR at some point.
>
> The thing I really don't know at the moment is whether this is taking
> advantage of the spatial index I created on my location column.
>
Hello,
This is intended to be a comprehensive answer to all your many
questions
regarding SpatialIndex.
Suggested preliminary readings:
https://www.gaia-gis.it/gaia-sins/spatialite-cookbook/html/rtree.html
https://blog.jrg.com.br/2016/05/23/Querying-spatial-data-using-spatialite-with-indexes/index.html
https://en.wikipedia.org/wiki/R-tree
https://en.wikipedia.org/wiki/R*-tree
Let's start from a first rough classification of Spatial SQL
problems which leads us to identify the following categories:
1. simple problems: dataset with a few hundred or thousands of rows.-
2. problems of medium difficulty: dataset with many hundreds of
thousands of rows.
3 complex problems: dataset with millions of rows.
Class 1. is quite trivial; any strategy will be reasonably fast
and efficient, even those based on pure brute force.
Class 2. will highlight the first flaws; brute force strategies
will start to feel too slow, but on a fast CPU the response
times could still be acceptable.
Class 3. will bring all the problems to the surface; and
inefficient and unintelligent strategies will be intolerably
slow (response times in the order of minutes, but also hours
and perhaps even days).
There is only one way to make a Spatial Query efficient and fast,
and is to be able to use the Spatial Index intelligently.
To take any advantage of the Spatial Index, SQL Queries must be
written in a very specific way.
There is no automatism whatsoever, there are no shortcuts,
there are no silver bullets.
If your Spatial SQL Query doesn't directly takes care
of the Spatial Index it will always be completely
ignored, even if it's present.
----------
Now let's see how and why the spatial Index can give
considerable efficiency to our Spatial Queries.
In a typical brute force approach any Spatial SQL
function in our query will affect all the rows of
the dataset (this is the infamous full table scan).
The more rows there are in the dataset the slower
the query will become.
Remember that any Spatial SQL function always requires
an heavy computational cost; it should start to become
clear that the more rows we have to process, the slower
the query will become.
The Spatial Index can be miraculous because it's nothing
more than a spatial filter allowing to drastically reduce
the number of rows to be processed.
the more we are able to narrow the filter, the faster we'll
be able to go.
Explained in simple words, a Spatial Index can be used as
a rectangular window allowing to very quickly extract only
the geometries that fall within that rectangle, ignoring
all the others.
The smaller the rectangle, the more efficient the filter
will be.
In the most typical case the rectangle to be passed to the
Spatial Index simply coincides with the reference
geometry itself.
(remember that all geometries of any type can always be
represented schematically by their MBR aka BBOX).
It's trivial to demonstrate that it'is useless to try
to calculate ST_Intersects() or ST_Contains() when the
two Geometries have their BBOXes that don't overlap;
we can discard them immediately because we can safely
anticipate the negative result.
However, the opposite condition is not true; two
geometries can have their BBOXes overlapped even if
they don't intersect at all.
Concluding: the Spatial Index is only a fairly coarse
but fast first filter.
Everything that passes this first check will then be
verified more accurately using a real Spatial SQL
function; but since the filter is expected to be
very selective, the impact on the performance of
the extra work necessary to discard false positives
will have a very modest impact.
---------------------------------------
Last step: how to use the Spatial Index for
ST_Distance(); the problem is that now we don't
expect the two BBOXes to be overlapping at all,
especially if we're talking about POINTs.
The best solution is to define the size of the
search rectangle to pass to the Spatial Index
based on some heuristic considerations and
knowledge of the reference dataset.
Just to give a very banal example: if I intend
to search for the nearest restaurant and I am
inside a town, then a search radius of 1 km
can be very reasonable.
But if I find myself in the middle of a forest,
a search radius of several tens of km seems
decidedly preferable.
A smarter generalized approach can be adaptive
in nature: we start with a small search radius,
and if nothing is found we'll try again by doubling
the search radius, and so on until a satisfactory
number of solutions is found.
best regards
Sandro