I have a specific requirement where I would like to find all unique
'IDs' of polygon geometries having a 'neighbor', that is, another
polygon bordering them (sharing a segment). For my requirements, it
doesn't actually have to be a really exact result, e.g., if 'touching'
at one point neighbors are found or not is irrelevant. I just want the
most efficient and fast way to separate out those polygons having a
'neighbor' from 'single' disjunct polygons (e.g. think terraced housing
versus stand alone buildings).
I currently use the following two queries. This first query creates a
table with all unique 'pairs' of polygons as identified by their ID
using 'ST_DWithin with a 0 range from the <MY_TABLE1> original polygon
table. The second one deduplicates the two columns of the first table to
create a unique list of IDs.
This works quite well, however, especially the first step can still be
quite time consuming on very large datasets (note that there is a proper
GiST spatial index on the dataset). From the documentation of
ST_DWithin, it is clear it is already optimized to use the spatial index
and bounding boxes to speed up the search, so I guess there is nothing
more I can do. It is properly using multiple parallel workers in
PostgreSQL as well for creating the table from a look in PgAdmin.
Or does someone potentially know a more efficient way to achieve this
specific result?
CREATE TABLE <MY_TABLE2> AS SELECT
a.id AS a_id,
b.id AS b_id
FROM
<MY_TABLE1> AS a, <MY_TABLE1> AS b
WHERE
a.id < b.id AND ST_DWithin(a.<GEOMETRY_COLUMN> , b.<GEOMETRY_COLUMN> , 0)
ORDER BY a.id, b.id
I then de-duplicate the ids using:
SELECT DISTINCT c.id FROM (
SELECT DISTINCT a_id AS id FROM <MY_TABLE2> a
UNION ALL
SELECT DISTINCT b_id AS id FROM <MY_TABLE2> b) c ORDER BY c.id
which I can subsequently join back to the original table.
Marco
_______________________________________________
postgis-users mailing list
postgi...@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users
I currently use the following two queries. This first query creates a table with all unique 'pairs' of polygons as identified by their ID using 'ST_DWithin with a 0 range from the <MY_TABLE1> original polygon table. The second one deduplicates the two columns of the first table to create a unique list of IDs.
From the documentation of ST_DWithin, it is clear it is already optimized to use the spatial index and bounding boxes to speed up the search, so I guess there is nothing more I can do. It is properly using multiple parallel workers in PostgreSQL as well for creating the table from a look in PgAdmin.
Thanks Arnaud,
This is a very useful answer. I will try the ST_Intersects option
you propose.
Minor correction to the link provided, which should read:
Marco
Minor correction to the link provided, which should read:
Hi Arnaud,
I have now done some preliminary testing with a > 10M records
dataset, but my first impression is that 'ST_Intersects' is not
faster than 'ST_DWithin' with range 0.
In fact, I have the feeling it may be about 20-25% slower (e.g.
2m48s versus for 'ST_Intersects' versus 2m07s for 'ST_DWithin' was
one result). At least, that is what I am currently seeing when I
replace the 'ST_Intersects' call with 'ST_DWithin' in your
optimized query, and this difference seems quite consistent when I
re-run this multiple times.
So the query below appears to be the most efficient:
SELECT tbl.id
FROM <MYTABLE1> AS tbl
WHERE EXISTS (
SELECT 1
FROM <MYTABLE1> AS subq
WHERE subq.id != tbl.id AND
ST_DWithin(tbl.<GEOMETRY_COLUMN> ,
subq.<GEOMETRY_COLUMN>, 0)
);
Marco
I have now done some preliminary testing with a > 10M records dataset, but my first impression is that 'ST_Intersects' is not faster than 'ST_DWithin' with range 0.
In fact, I have the feeling it may be about 20-25% slower (e.g. 2m48s versus for 'ST_Intersects' versus 2m07s for 'ST_DWithin' was one result). At least, that is what I am currently seeing when I replace the 'ST_Intersects' call with 'ST_DWithin' in your optimized query, and this difference seems quite consistent when I re-run this multiple times.