Intersects query takes long time to run

29 views
Skip to first unread message

ckgoo...@gmail.com

unread,
Jul 22, 2025, 4:17:18 AMJul 22
to spatiali...@googlegroups.com

Hi, I have a query which takes a very long time to run so would like to know if I'm approaching my problem correctly.

The general problem is that I have polygons in my Spatialite database representing countries which is derived from OSM data.

These polygons include the territorial waters of the country and I want to prune these back to the actual coastline data.

So I plan to do an intersection with coastline data derived from OSM data.

Number of country polygons is  981.

Number of coastline data polygons is  783,507

This coastline data I presume will include a huge polygon covering the continents of Africa, Europe and Asia plus the other continents and then hundreds of thousands of islands.

As an example of what I'm looking to achieve, I have a single country polygon for Spain which includes its territorial waters. When I intersect this with the coastline data I get 138 polygons because of all the islands within the territorial waters of Spain but which I now know belong to Spain which I didn't know before.

My query looks like this:

INSERT INTO spt_PureLandPolygons (elementType, elementID, polygon, point)

SELECT

    spt.elementType,  --elementType = 3 indicates a OSM relation

    spt.elementID,  --The OSM relation ID

    ST_Intersection( spt.polygon, c.lpolygon ),

  spt.point  --A label position which probably needs moving elsewhere

  FROM

    spt_Polygons AS spt

  JOIN

    cdb.spt_LandPolygons AS c

    ON ST_Intersects( spt.polygon, c.lpolygon)  --Ensures polygons overlap before finding intersection

  JOIN

    tbl_Tags AS t

    ON spt.elementID = t.elementID

  WHERE

    spt.elementType = 3  --Relations only

    AND ( t.key='admin_level' AND t.value='2' ) --Admin level 2 relations only

    AND spt.elementID=62149;

 

The last AND clause limits the query to the United Kingdom and it takes several minutes to create 18,000 or so polygons.    If I removed this to cover the whole world then the query is still running after 12 hours. No surprise.

I believe the ST_INTERSECTS function will inherently use any spatial indexes available. Is that correct?

I did write a version of this query which added a specific spatial index  check but there was no change in performance.

So I'm after advice on whether this is a sensible approach and what I can do to reduce the runtime?

Is it simply that INTERSECTS and INTERSECTION are costly functions.

Or what I have done is doomed to failure and I should do it all a different way?

Best, Chris

a.fu...@lqt.it

unread,
Jul 22, 2025, 5:22:42 AMJul 22
to spatiali...@googlegroups.com
On Tue, 22 Jul 2025 08:47:45 +0100, ckgoo...@gmail.com wrote:
> Hi, I have a query which takes a very long time to run so would like
> to know if I'm approaching my problem correctly.
>
> The general problem is that I have polygons in my Spatialite database
> representing countries which is derived from OSM data.
>
> Number of country polygons is 981.
> Number of coastline data polygons is 783,507
>

which means your query will have to calculate just under 1 billion
of polygon intersections.
it’s no surprise that it takes forever, because that’s a really
impressive workload.


> This coastline data I presume will include a huge polygon covering
> the
> continents of Africa, Europe and Asia plus the other continents and
> then hundreds of thousands of islands.
>

This is pure poison.
A polygon of this type will surely have an enormous number of vertices,
and calculating any intersections involving it will require a truly
exceptional computational workload (and therefore will take a lot
of time).

Monsters of this nature are real performance killers, and
should be avoided at all costs.
for example, by breaking them down into many smaller polygons,
perhaps cutting them out along a reasonably sized square grid.


> I believe the ST_INTERSECTS function will inherently use any spatial
> indexes available. Is that correct?
>

SpatiaLite's spatial indexes are quite special; they can be very
efficient,
but they don't support any automation.
If you want your query to actually take advantage of a Spatial Index,
you
need to write it appropriately by explicitly referencing the
appropriate
Spatial Index.

here you can find few practical examples:
https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex


> I did write a version of this query which added a specific spatial
> index check but there was no change in performance.
>

However, there is a general problem in your query that prevents it
from running efficiently.

1) You are using JOINs between multiple tables.
2) You are using too many filters in the WHERE clause.

In short, it is a fairly complex query that can easily fool
the SQLite's optimizer, which will end up choosing the least
appropriate data access strategy.
Always remember that SQLite has no idea about spatial data,
and that the priority of the various actions is decided by
SQLite alone... SpatiaLite only comes into play at the very end.

The general rule for writing efficient spatial queries is always
to keep things as simple as possible.
a) prepare the data for processing: perhaps using some Temporary
Tables to store the already filtered and cleaned data.
b) so that in the end there will be a very simple query that simply
calculate the spatial functions and nothing else.
c) possibly using an appropiate Spatial Index

I'm sorry to be very general and not very specific, but if you want
a better calibrated answer you should send me a copy of your DB so
I can take a closer look at your data until I can fine-tune the
optimal access strategy.

Writing highly efficient spatial queries in SpatiaLite is more
of an art than a science, and each specific problem requires
its own unique approach.

bye Sandro

ckgoo...@gmail.com

unread,
Jul 22, 2025, 7:37:22 AMJul 22
to spatiali...@googlegroups.com
Hi Sandro,

Thanks for your reply. General comments was all I was hoping for.

You wrote:
Monsters of this nature are real performance killers, and
should be avoided at all costs.
for example, by breaking them down into many smaller polygons,
perhaps cutting them out along a reasonably sized square grid.

The coastline data is available in tiled format, each tile being one degree square I believe. I avoided this because it would mean my country polygons would get split into the same tiles when I do the intersection. Then when I draw my country polygons, each with a 1 pixel edge, it would look all cut up into squares.

However, I'm just thinking if this actually matters. If I draw my polygons without an edge line, it would visually look all joined up. I have allocated a different colour for each country, so as long as I painnt all French polygons in the same colour, visually, France will look all like one big area on the screen.

If I still did want to get a polygon for the external border of a country, and it was made up of several polygons as a result of using the tiled coastline data, is there a spatialite function to help merge them into one geometry? I don't know if the tiled coastline data is structured so the tiles overlap slightly or whether they just touch.

I'll have a go at using the tiled coastline data and see if I can work out a way to use the spatial indexes on the two tables. Hopefully the runtime will be much quicker.

One complication I forgot to mention last time is that they are in different .db files so I have to attach them. I seem to remember having trouble with spatial indexes on attached databases before but I'll see how it goes.

Best, Chris
--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spatialite-users/971f7f70c32992d10f348e079e363438%40lqt.it.

a.fu...@lqt.it

unread,
Jul 22, 2025, 9:51:10 AMJul 22
to spatiali...@googlegroups.com
On Tue, 22 Jul 2025 12:25:28 +0100, ckgoo...@gmail.com wrote:
> is there a spatialite function to help merge
> them into one geometry? I don't know if the tiled coastline data is
> structured so the tiles overlap slightly or whether they just touch.
>

you can use ST_Union() as an aggregate function, something like this:

SELECT country_code, ST_Union(geom)
FROM my_countries
GROUP BY country_code;

The final result will be a single geometry for each county; don't
bother
Don't worry about any overlaps; this function is designed specifically
to handle these situations.


> One complication I forgot to mention last time is that they are in
> different .db files so I have to attach them. I seem to remember
> having trouble with spatial indexes on attached databases before but
> I'll see how it goes.
>

The Spatial Index correctly works even when more datases are attacched
together.
You just have to be careful and use a slightly special syntax so to
identify the DB where the Spatial Index is located.

see the exmples you'll find here:
https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex

carefully read the final paragraph about ATTACHE DB

bye Sandro

ckgoo...@gmail.com

unread,
Jul 23, 2025, 6:21:14 PMJul 23
to spatiali...@googlegroups.com
Hi Sandro,
Gemini suggested a double spatial index query for my problem.
INSERT INTO spt_PureLandPolygons (elementType, elementID, polygon, point)
SELECT
p.elementType,
p.elementID,
ST_Intersection( p.polygon, c.lpolygon ) AS intPolygon,
p.point
FROM
spt_Polygons AS p
INNER JOIN
cdb.spt_LandPolygons AS c
ON
p.ROWID IN (
SELECT ROWID FROM SpatialIndex
WHERE f_table_name = 'spt_Polygons'
AND f_geometry_column = 'polygon'
AND search_frame = ST_Envelope(c.lpolygon)
)
AND c.ROWID IN (
SELECT ROWID FROM SpatialIndex
WHERE f_table_name = 'DB=cdb.spt_LandPolygons' -- Ensure this matches your schema
AND search_frame = ST_Envelope(p.polygon)
)
WHERE
p.elementID IN ( 62149, 2202162, 1311341, 50046, 2323309 )
AND IntPolygon IS NOT NULL;

This query works on the north west of Europe for UK, France, Norway and one other country I can't remember! But the good news is that it only takes about 2 minutes to complete. This is using tiled coastline data. I think that is probably as good as I'm going to get and I suspect doing my whole list of countries will take only an hour or so. But haven't done that yet.

I'm now playing with ST_UNION to merge my polygons by elementID. Here, elementID represents one relation in OSM terminology. So this works pretty well. I end up with a single multipolygon geometry. My previous query though to retrieve data from the visible area of my map is now returning nothing. But that is a problem to be solved another day.

I do seem to be making progress.
Thanks, Chris

-----Original Message-----
From: spatiali...@googlegroups.com <spatiali...@googlegroups.com> On Behalf Of a.fu...@lqt.it
--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spatialite-users/1d35c3997ac0a514b4ac85a28583d1e4%40lqt.it.

Reply all
Reply to author
Forward
0 new messages