[postgis-users] ST_DWithin question

66 views
Skip to first unread message

Marco Boeringa

unread,
Oct 22, 2021, 7:31:23 AM10/22/21
to PostGIS Users Discussion
Hi all,

I have a specific requirement where I would like to find all unique
'IDs' of polygon geometries having a 'neighbor', that is, another
polygon bordering them (sharing a segment). For my requirements, it
doesn't actually have to be a really exact result, e.g., if 'touching'
at one point neighbors are found or not is irrelevant. I just want the
most efficient and fast way to separate out those polygons having a
'neighbor' from 'single' disjunct polygons (e.g. think terraced housing
versus stand alone buildings).

I currently use the following two queries. This first query creates a
table with all unique 'pairs' of polygons as identified by their ID
using 'ST_DWithin with a 0 range from the <MY_TABLE1> original polygon
table. The second one deduplicates the two columns of the first table to
create a unique list of IDs.

This works quite well, however, especially the first step can still be
quite time consuming on very large datasets (note that there is a proper
GiST spatial index on the dataset). From the documentation of
ST_DWithin, it is clear it is already optimized to use the spatial index
and bounding boxes to speed up the search, so I guess there is nothing
more I can do. It is properly using multiple parallel workers in
PostgreSQL as well for creating the table from a look in PgAdmin.

Or does someone potentially know a more efficient way to achieve this
specific result?

CREATE TABLE <MY_TABLE2> AS SELECT
 a.id AS a_id,
 b.id AS b_id
FROM
 <MY_TABLE1> AS a, <MY_TABLE1> AS b
WHERE
 a.id < b.id AND ST_DWithin(a.<GEOMETRY_COLUMN> , b.<GEOMETRY_COLUMN> , 0)
ORDER BY a.id, b.id

I then de-duplicate the ids using:

SELECT DISTINCT c.id FROM (
  SELECT DISTINCT a_id AS id FROM <MY_TABLE2> a
  UNION ALL
  SELECT DISTINCT b_id AS id FROM <MY_TABLE2> b) c ORDER BY c.id

which I can subsequently join back to the original table.

Marco
_______________________________________________
postgis-users mailing list
postgi...@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

Arnaud L.

unread,
Oct 22, 2021, 7:59:15 AM10/22/21
to postgi...@lists.osgeo.org
Hi Marco


Le 22/10/2021 à 13:31, Marco Boeringa a écrit :
I currently use the following two queries. This first query creates a 
table with all unique 'pairs' of polygons as identified by their ID 
using 'ST_DWithin with a 0 range from the <MY_TABLE1> original polygon 
table. The second one deduplicates the two columns of the first table to 
create a unique list of IDs.
I think I would write this in a single query, using an EXISTS subquery instead of a join step and a dedup step :

SELECT tbl.id
FROM <MYTABLE1> AS tbl
WHERE EXISTS (
    SELECT 1
    FROM <MYTABLE1> AS subq
    WHERE subq.id != tbl.id AND ST_Intersects(tbl.<GEOMETRY_COLUMN> , subq.<GEOMETRY_COLUMN>)
);

No need to DISTINCT or anything here, the result should already be correct.


From the documentation of 
ST_DWithin, it is clear it is already optimized to use the spatial index 
and bounding boxes to speed up the search, so I guess there is nothing 
more I can do. It is properly using multiple parallel workers in 
PostgreSQL as well for creating the table from a look in PgAdmin.

ST_DWithin(a,b,0) is logically equivalent to ST_Intersects(a,b), but as per this post on SE, ST_Intersects might be faster due to internal optimizations :
https://gis.stackexchange.com/questions/400700/st-intersects-vs-st-dwithin-behavior-on-polygons-using-postgiseo.org/mailman/listinfo/postgis-users


Regards
--
Arnaud

Marco Boeringa

unread,
Oct 22, 2021, 8:04:54 AM10/22/21
to postgi...@lists.osgeo.org

Thanks Arnaud,

This is a very useful answer. I will try the ST_Intersects option you propose.

Minor correction to the link provided, which should read:

https://gis.stackexchange.com/questions/400700/st-intersects-vs-st-dwithin-behavior-on-polygons-using-postgis

Marco

Op 22-10-2021 om 13:52 schreef Arnaud L.:

Arnaud L.

unread,
Oct 22, 2021, 8:07:58 AM10/22/21
to postgi...@lists.osgeo.org
Le 22/10/2021 à 14:04, Marco Boeringa a écrit :

Minor correction to the link provided, which should read:

https://gis.stackexchange.com/questions/400700/st-intersects-vs-st-dwithin-behavior-on-polygons-using-postgis


Thanks, I don't know what happened here...
Also, in the EXISTS (subquery) version you don't have to rejoin to your original table, since you are already querying your original table.

Regards
--
Arnaud

Marco Boeringa

unread,
Oct 22, 2021, 10:02:05 AM10/22/21
to postgi...@lists.osgeo.org

Hi Arnaud,

I have now done some preliminary testing with a > 10M records dataset, but my first impression is that 'ST_Intersects' is not faster than 'ST_DWithin' with range 0.

In fact, I have the feeling it may be about 20-25% slower (e.g. 2m48s versus for 'ST_Intersects' versus 2m07s for 'ST_DWithin' was one result). At least, that is what I am currently seeing when I replace the 'ST_Intersects' call with 'ST_DWithin' in your optimized query, and this difference seems quite consistent when I re-run this multiple times.

So the query below appears to be the most efficient:

SELECT tbl.id
FROM <MYTABLE1> AS tbl
WHERE EXISTS (
    SELECT 1
    FROM <MYTABLE1> AS subq

    WHERE subq.id != tbl.id AND ST_DWithin(tbl.<GEOMETRY_COLUMN> , subq.<GEOMETRY_COLUMN>, 0)
);

Marco

Op 22-10-2021 om 14:07 schreef Arnaud L.:

Arnaud L.

unread,
Oct 22, 2021, 10:04:34 AM10/22/21
to postgi...@lists.osgeo.org
Le 22/10/2021 à 16:01, Marco Boeringa a écrit :
I have now done some preliminary testing with a > 10M records dataset, but my first impression is that 'ST_Intersects' is not faster than 'ST_DWithin' with range 0.

In fact, I have the feeling it may be about 20-25% slower (e.g. 2m48s versus for 'ST_Intersects' versus 2m07s for 'ST_DWithin' was one result). At least, that is what I am currently seeing when I replace the 'ST_Intersects' call with 'ST_DWithin' in your optimized query, and this difference seems quite consistent when I re-run this multiple times.


Thanks for the feedback Marco.
Quite surprising to me (but I know nothing about Postgis' internals), but good to know.

Regards
--
Arnaud
Reply all
Reply to author
Forward
0 new messages