[postgis-users] Tricks to find polygon/line intersection faster

765 views
Skip to first unread message

Evan Martin

unread,
Jul 2, 2013, 9:39:04 AM7/2/13
to postgi...@postgis.refractions.net
Hi,

I have tables of ~25,000 polygons and ~80,000 lines and I want to find which lines intersect which polygons using PostGIS 2.1. Both are geographies and can span the dateline. Doing this the simple way using ST_Intersects(geog, geog) takes about 3 hours on my machine and I'd to see if there's a way to speed this up.

I already have indexes on the geography columns and one of them is being used (the one on the lines). Each line only has 2 points, but the polygons have anywhere from 4 to 20,000 points and some of them are very large. It would be OK to miss some of the smaller intersections (ie. where the two only just barely intersect), but I wouldn't want the query to return false positives. In fact, ideally, I'd like to find only the lines that "substantially" intersect a polygon, eg. at least x km or x% of the line is in the polygon, but finding any intersections at all would be a start.

One trick I tried is ST_SimplifyPreserveTopology. I used that to create simplified version of the polygons (at least those that don't span the dateline) and check those first, then if they intersect then check the real polygons. This seems to work, but the performance gains are marginal compared to the simple approach.

Is there another trick I can use to do this faster? I know ST_Intersects() internally calls ST_Distance(), which calculates the distance to a fraction of a metre. I don't need that kind of precision, so surely there's some "shorcut" to be found?

Thanks,

Evan

Lelo - Luiz Rogério De Pieri

unread,
Jul 2, 2013, 10:18:51 AM7/2/13
to PostGIS Users Discussion
Hi Evan,

What about instead of verify the intersection with Simplified polygon, verify the intersection with the BBox ?
I've been using this strategy on my system but using geotools and gives me nice results.

Good luck


_______________________________________________
postgis-users mailing list
postgi...@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users




--
Rogério De Pieri  (Lelo)
MBA em Gerenciamento de Projetos - FGV
SCJP 5
Buscando melhorar a cada dia
Áudio, Hardware & Software

Evan Martin

unread,
Jul 4, 2013, 1:26:31 PM7/4/13
to postgi...@lists.osgeo.org
Hi Lelo,

I don't think that will help - ST_Envelope only works on geometry and I need this calculated on the sphere. Besides ST_Intersects() already does a bounding box check.

Evan

Paul Ramsey

unread,
Jul 5, 2013, 9:48:25 PM7/5/13
to PostGIS Users Discussion
Without seeing your data it's quite hard to say. Many large things vs
many large things yields a problem where indexes and so on don't have
a lot of leverage on the problem.

Evan Martin

unread,
Jul 6, 2013, 1:10:24 PM7/6/13
to postgi...@lists.osgeo.org
It's not really "many large things vs many large things". Most lines are
< 100 km long (but there are some over 1000 km). Here's a percentile
chart: https://imageshack.us/a/img16/940/w5s.png

Most of the polygons are also quite small and simple, but there are a
few really large complex ones. From my testing it looks like a few of
the "worst" polygons (multi-polygons, actually) take all the time, so
that 25,000 count was a bit misleading. 96% of them have < 100 points,
but the worst one has > 23,000. I couldn't get the area, because
ST_Area(geog) is returning some ridiculously high numbers, but it would
be millions of sq km.

Stephen Woodbridge

unread,
Jul 6, 2013, 1:19:19 PM7/6/13
to postgi...@lists.osgeo.org
The standard way of dealing this this is to chop you really large
polygons into tiles. Or if the multipolygons can be split into multiple
individual polygons you might get better performance.

google: postgis tiling large polygons

if you need the distance that the line intersects the multiple tiles or
multiple split multipolygons you will need to sum() and group on the
original id of the split object.

-Steve

Simon ----------------

unread,
Jul 8, 2013, 7:13:04 AM7/8/13
to PostGIS Users Discussion
Try something like this.
 
 
CREATE TABLE interlines(geom geometry,id numeric) WITH (OIDS=FALSE); ALTER TABLE interlines OWNER TO postgres;
 
CREATE OR REPLACE FUNCTION interprob() RETURNS SETOF your_polygon_table AS
$BODY$
DECLARE
    r your_polygon_table%rowtype;
BEGIN
    FOR r IN SELECT * FROM your_polygon_table 
    LOOP
    insert into interlines(geom,id) SELECT a.geom,r.id from your_line_table a where st_intersects(a.geom,r.geom);
        RETURN NEXT r; 
    END LOOP;
    RETURN;
END
$BODY$
LANGUAGE 'plpgsql' ;
 
select * from interprob();
 
Greetings
Simon
 
Gesendet: Samstag, 06. Juli 2013 um 19:19 Uhr
Von: "Stephen Woodbridge" <woo...@swoodbridge.com>
An: postgi...@lists.osgeo.org
Betreff: Re: [postgis-users] Tricks to find polygon/line intersection faster

Evan Martin

unread,
Jul 8, 2013, 11:36:00 AM7/8/13
to postgi...@lists.osgeo.org
Thanks, Steve, that seems to do the trick. Of course the results change
a bit, so it's a trade-off of accuracy vs. speed. I presume the change
is because I do the tiling on the plane - ST_Intersection(geom, geom).
When I tried doing tiling on geography the results changed much more
(compared to no tiling). Would be interesting to understand why that is.
Am I doing something wrong? I create a grid of 1x1 degree polygons and
then do this:

SELECT poly_id, ST_Intersection(poly_border::geometry,
tile)::geography AS poly_tile
FROM my_polygon p
JOIN world_tile_1 t ON ST_Intersects(p.border::geometry, t.tile)

The intersection with lines is then done on geography, as before. I only
do this for polygons that don't span the dateline (which is 99% of them,
luckily).

Evan

Stephen Woodbridge

unread,
Jul 8, 2013, 11:52:17 AM7/8/13
to postgi...@lists.osgeo.org
Evan,

I don't work with geography types that much but I'm guessing the error
is a result of the fact that your 1 degree edges are straight lines in
geographic projection (ie: unprojected lat/lon) but when you use a
geographic they really should be curved lines but the edge has only the
end points so they get projected internally to the geography projection
and the edge is a straight line between the endpoints rather than a
curve, hence the error. If your lines are long with just the endpoints
also that will generate an error like this too. If you add points along
the edges and along your lines then when they are projected by the
geography functions they will more accurately follow the curve in space
that you would expect them to. Look at function ST_Segmentize().

If I'm wrong about how this works, someone jump in here please.

-Steve

Evan Martin

unread,
Feb 10, 2014, 3:33:00 PM2/10/14
to postgi...@lists.osgeo.org
I've discovered a slight problem with the handy "tiled intersection"
trick suggested earlier: some of my lines run exactly along a meridian
or along a parallel and so do the tiles, so those intersections get
counted twice! For example, LINESTRING(-18 14.5,-18 15.5) results in the
following intersections with a particular tiled polygon

LINESTRING(-17.9999148001148 14.7502863979935,-17.9998863986648
15.0000002498516)
LINESTRING(-18.0000851998852 14.7502863979933,-18.0001136013352
15.0000002498516)

LINESTRING(-18.0001136013352 15.0000002498516,-18.0000852013123
15.2502965758817)
LINESTRING(-17.9998863986648 15.0000002498516,-17.9999147986877
15.250296575882)

Could someone suggest the best (fastest while still accurate) way to
filter out such duplicates? As you can see, they're not exactly the
same, so ST_Equals() returns false on them.

Evan

Rémi Cura

unread,
Feb 11, 2014, 2:39:20 AM2/11/14
to PostGIS Users Discussion
Hey,
geometry equality can be defined in many ways.


For your duplication problem, it is a simple postgres problem :
you want that for any couple (line, poly), you have at most one result (given polygon are convex, which they are if they are squares).

so at the end of you computing you just add a filtering select :
SELECT DISTINCT ON (line_id,poly_id)

If you don't want random result, you should order your result to know which line of the 4 you get (you could order by length, centroid, first point, whatever.)

WITH (your computing)
SELECT DISTINCT ON (line_id, poly_id) , poly, line
FROM your computing
ORDER BY ST_Length(line) ASC

Now if you want to solve this the PostGIS way it would be harder, as you would need a distance between shape.
Of course you could also use implicit distance, for example by snapping your result line to a given precision and doing a regular test after
(WHERE ST_Equals(ST_SnapToGrid(line1, 0.01),ST_SnapToGrid(line2, 0.01))=TRUE )

Cheers,

Rémi-C


Evan Martin

unread,
Feb 11, 2014, 5:39:45 PM2/11/14
to postgi...@lists.osgeo.org
Thanks, but I realised later that just eliminating the duplicate intersections is not enough, because sometimes only part of the intersection is doubled and maybe there are other inaccuracies there as well. (Perhaps the line running along tile border somehow falls "in-between" tiles when re-projected by ST_Intersection?) Anyway, my fix was to detect such lines and intersect them with the original polygon instead of the tiled one. Since there are relatively few of them it doesn't affect the overall time noticeably.

Regards,

Evan
Reply all
Reply to author
Forward
0 new messages