ST_Distance taking a long time

23 views
Skip to first unread message

ckgoo...@gmail.com

unread,
Aug 19, 2025, 5:01:29 PMAug 19
to spatiali...@googlegroups.com

Hi all, the query below takes around 15 seconds to run.  When I comment out the ST_Distance function call it completes in 1.5seconds.

There are about 18,000 polygons and about 1200 points  but the return set is only 77.

So I would think the ST_Distance would be calculated only on the 77 results.

The purpose of this query is to find the cities (held as points/nodes) that are in a specific country (polygons make up a country), along with various bits of info such as distance and bearing of current location from cities. 

Any thoughts on why I’m seeing this behaviour most welcome. 

 

SELECT

    COALESCE(t_name_en.value, t_name.value) AS pointName,

     ST_Distance

     (

     ST_Transform( MakePoint( -1.5, 52, 4326 ), 6933 ),

     ST_Transform( p.point, 6933 )

     ) / 1000.0 AS distance, --from metres to kilometres

     ST_Azimuth

     (

         ST_Transform( MakePoint( -1.5, 52,4326), 6933 ),

         ST_Transform( p.point, 6933 )

     ) * 180.0 / PI() AS bearing, --Radians to degrees

    p.nodeID,

    X(point),

    Y(point)

  FROM

    spt_Points AS p

  JOIN spt_NTWPolygons AS poly

    JOIN tbl_Tags AS t_name ON p.nodeID = t_name.elementID AND t_name.key = 'name'

    LEFT JOIN tbl_Tags AS t_name_en ON p.nodeID = t_name_en.elementID AND t_name_en.key = 'name:en'

  WHERE

    poly.elementID = 62149

    AND p.ROWID IN (

        SELECT ROWID

        FROM SpatialIndex

        WHERE

            f_table_name = 'spt_Points'

            AND f_geometry_column = 'point'

            AND search_frame = poly.polygon1

    )

    AND ST_Contains( poly.polygon1, p.point );

Best, Chris

a.fu...@lqt.it

unread,
Aug 20, 2025, 2:47:40 AMAug 20
to spatiali...@googlegroups.com
On Tue, 19 Aug 2025 21:20:21 +0100, ckgoo...@gmail.com wrote:
> Hi all, the query below takes around 15 seconds to run. When I
> comment
> out the ST_Distance function call it completes in 1.5seconds.
>

Hi Chris,

Let's start with the first detail (which is perhaps not the most
relevant).
I notice that nested inside ST_Distance() there are 2 calls to
ST_Transform();
I imagine you need them because your geometries are Lat/Long (4326) but
you
need a distance expressed in meters and not in degrees.

I would like to point out that ST_Transform() is a rather expensive
function,
not only because it involves complex geodetic calculations, but above
all
because behind the scenes it triggers several accesses to the DB to
retrieve
the geodetic parameters for the required transformation.
Placing one or more calls to ST_Transform() inside other Spatial
functions
is not the best way to achieve maximum speed.

In this case you had an alternative solution available, probably less
expensive.
ST_Distance() in its 3-argument form (g1, g2, use_ellipsoid) is meant
to directly get a length in meters when using Lat/Long Geometries, and
should be faster because it doesn't require any additional DB access.
please see:
https://www.gaia-gis.it/gaia-sins/spatialite-sql-5.1.0.html#p13

It would at least be worth doing some tests to objectively measure the
difference between the two approaches.


> So I would think the ST_Distance would be calculated only on the 77
> results.
>


Yours is a very dangerous assumption, which could be false.
There's no guarantee that SQLite will execute the query in the order
you
expect; at runtime, it may decide to go in a completely different
direction.

One of SQLite's strengths, which often makes it so surprisingly fast,
is its
Query Planner module. please see:
https://www.sqlite.org/queryplanner.html

Simply put, SQLite takes the liberty of rewriting your query before
executing
it, possibly modifying it heavily when it deems appropriate.
Very often, the Query Planner is able to find the most efficient and
fastest
data access strategy. But there are also cases where it ends up making
the
worst choice, leading to a very slow query.

Generally, the Query Planner attempts to select the indexes that appear
most promising.
This, however, doesn't apply to Spatial Indexes, because for SQLite an
R*Tree
isn't an index at all but a table unrelated to any other table, and
therefore tends to receive the lowest priority, especially when dealing
with a complex query, or worse, a query that contains subqueries.
Which is exactly your case.

There's a way to check in advance which access strategy the Query
Planner
will effectively use at runtime:

EXPLAIN QUERY PLAN <your sql query>;

Will return a resultset listing all operations sorted by their
priority.
It's a SQL tool that many overlook or ignore, but it's absolutely
essential
for getting high-performance queries and to avoid reaching completely
unfounded conclusions.

In this specific case, I'll bet you two cents that you'll eventually
discover
that that query is executed in a very different way than you imagined,
and
that ST_Distance() is not the main cause of slowness.

bye Sandro

ckgoo...@gmail.com

unread,
Aug 20, 2025, 5:38:58 PMAug 20
to spatiali...@googlegroups.com
Hi Sandro
Thanks for all your thoughts here. Using the third parameter set to 0 takes the runtime down from about 15 seconds to about 1.3 seconds.
The output from EXPLAIN QUERY PLAN looked sensible though I'm new to examining this info. Each element of the query ran as a parent step apart from two steps which seemed to consolidate the spatial query.
I could do with a further magnitude of improvement in query time though. I only have European cities in my database at the moment, around 1270 entries, but I'll add cities across the world shortly which will only increase the run time. And when my user is bashing around the world on their keyboard they don't want to wait 1.3 seconds just to get a list of cities in the current country.
I'm thinking I'm going to need to pre-calculate the spatial query.
CREATE TABLE tbl_CountryNodes
countryID INTEGER,
pointID INTEGER
);

CREATE INDEX idx_tbl_CountryNodes_countryID_nodeID ON tbl_CountryNodes( 'countryID', 'nodeID' );

And prepopulate it using a spatial index query to get all the nodes/cities in each country. My data is after all static. I feel this would have a good chance of getting me the performance I'm after.
Would you expect this approach to get me increased performance also?
Kind regards, Chris
--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spatialite-users/768c413b09b8372168c405c9376e6d6e%40lqt.it.

Reply all
Reply to author
Forward
0 new messages