[postgis-users] problem with points in polygon edge

43 views
Skip to first unread message

Gustavo Martinez

unread,
Mar 14, 2022, 9:52:27 AM3/14/22
to postgi...@lists.osgeo.org

Hi all,

I am working with fishing vessels positions. I have millions of points, and when doing some analysis I need to do some intersections with polygons.

All the database is on EPSG 4326 (Lat Long)

I've noticed some problems when the points are on the edges of the polygons. When I use ST_Within I lose the points on the edges, while when using ST_Intersects those points get counted twice.

To fix that I wrote a function:

CREATE OR REPLACE FUNCTION "monitoreo"."dentro" (punto geometry, rect geometry)  RETURNS boolean
  VOLATILE
AS $body$
begin
IF (st_within(punto, rect) OR
(
    st_x(punto)=st_xmax(rect) AND st_y(punto) BETWEEN st_ymin(rect) AND st_ymax(rect)
)
OR
(
    st_y(punto)=st_ymax(rect) AND st_x(punto) BETWEEN st_xmin(rect) AND st_xmax(rect)
)
) THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
end
$body$ LANGUAGE plpgsql


I have two questions:

1- that function works but it is slow. Do you have suggestions to optimize it?

2- the function works for rectangles, do you have an example of how to handle irregular polygons?

Many thanks in advance.


--

Gustavo Martínez Puljak


Linkedin:www.linkedin.com/in/gustavomartinezpuljak


Que la tierra se vaya haciendo camino ante tus pasos,
que el viento sople siempre a tus espaldas,
que el sol brille cálido sobre tu cara,
que la lluvia caiga suavemente sobre tus campos y,
hasta tanto volvamos a encontrarnos,
que Dios te lleve en la palma de su mano.



Greg Troxel

unread,
Mar 14, 2022, 10:41:37 AM3/14/22
to Gustavo Martinez, postgi...@lists.osgeo.org

Gustavo Martinez <gmpu...@yahoo.com.ar> writes:

> Hi all,
>
> I am working with fishing vessels positions. I have millions of
> points, and when doing some analysis I need to do some intersections
> with polygons.
>
> All the database is on EPSG 4326 (Lat Long)
>
> I've noticed some problems when the points are on the edges of the
> polygons. When I use ST_Within I lose the points on the edges, while
> when using ST_Intersects those points get counted twice.

I would suggest figuring out what is wrong with the queries that ought
to make sense and fixing that instead of trying to code a function that
behaves like that and then optimizing that funciton.


You say "lose points on the edge", but you did not post your query and
you did not post a sample point that is mishandled.

Do you mean something like a pseudo-square from say 40-41N, 20-21E and a
point that is 40.5N 20E and wanting that to be included in ST_Within?
If you don't mean that please explain precisely.

I also don't understand "with ST_Intersects I get duplicates", but again
you didn't post your query or sample data.

It seems like if you want to select all vessel points within some
polyon (including points that are on the edge), ST_Intersects should be
want you want.


I would also wonder why you care about points that are strictly on the
polygon. First, boats are not points, but I am guessing you have "Boat
Reference Position", perhaps implicitly a GPS antenna.

All measurements have error, and thus for any actual boat
position near the polygon edge, a recorded position could be inside or
outside. So worrying that a position exactly on the line numerically --
which seems pretty unlikely with real data of high -- is not something
that I understand.

Hope this helps,
Greg
signature.asc

Shaozhong SHI

unread,
Mar 14, 2022, 10:56:01 AM3/14/22
to PostGIS Users Discussion, Gustavo Martinez
_______________________________________________
postgis-users mailing list
postgi...@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

Hi, Greg,

This is interesting.  A detector function should be developed so that we can find such points and their locations.

Regards,

David 

Greg Troxel

unread,
Mar 15, 2022, 7:08:22 PM3/15/22
to Gustavo Martinez, postgi...@lists.osgeo.org

Gustavo Martinez <gmpu...@yahoo.com.ar> writes:

> attached you can see an example of the situation. I am interested in
> all the points. The arrow indicates one point that lays on the edge of
> polygon 1 and polygon 2.

Ah, now I understand that you are trying to use polygons that have
overlapping edges and that you further want each point to be assigned to
exactly one polygon.

> If I use st_within(), that point is not counted in either polygon, but
> if I use st_intersects() it is counted in both polygons. For that
> reason, I wrote the function.
> The data comes from a VMS device, that reports the GPS position, along
> speed and vessel ID with a fixed frequency.
> Hope it is more clear now.

The question is then what you are really trying to do. If you are
trying to count all the points in the union of some polygons, you can
union the polygons and then count. If you want to know how many points
are in each polygon, with a definition of "in" that says

a point is in a polygon if it is really in it, or if it is on the
boundary and the point is not in some other polygon sharing the
boundary, however it is arbitrary which polygon the piont is
considered to be in
with the stipulation that polygons do not really overlap, but just
touch at outer edges.

You might consider a function that assigns points to polygons, perhaps
taking the first member of selecting all polygons such that
ST_Intersects(point, polygon) and assigning that in a table of points
and assigned polygons, and then using that for your statistics.

I guess you could also think about a function like ST_HalfOpenIntersect but
that somehow treats each edge as either a closed or open interval
depending on angle, so that points on shared edges would end up matching
one and not the other.
signature.asc

Gustavo Martinez

unread,
Mar 16, 2022, 1:27:44 PM3/16/22
to Greg Troxel, postgi...@lists.osgeo.org

Hi Greg,

El 15/3/22 a las 20:08, Greg Troxel escribió:
Gustavo Martinez <gmpu...@yahoo.com.ar> writes:

attached you can see an example of the situation. I am interested in
all the points. The arrow indicates one point that lays on the edge of
polygon 1 and polygon 2.
Ah, now I understand that you are trying to use polygons that have
overlapping edges and that you further want each point to be assigned to
exactly one polygon.

If I use st_within(), that point is not counted in either polygon, but
if I use st_intersects() it is counted in both polygons. For that
reason, I wrote the function.
The data comes from a VMS device, that reports the GPS position, along
speed and vessel ID with a fixed frequency.
Hope it is more clear now.
The question is then what you are really trying to do.   If you are
trying to count all the points in the union of some polygons, you can
union the polygons and then count.  If you want to know how many points
are in each polygon, with a definition of "in" that says

  a point is in a polygon if it is really in it, or if it is on the
  boundary and the point is not in some other polygon sharing the
  boundary, however it is arbitrary which polygon the piont is
  considered to be in
   with the stipulation that polygons do not really overlap, but just
   touch at outer edges.

I have two data sources: vessel positions (VMS) and logbooks with fish landings for each trip. I developed an algorithm that links the catches with some points that show "fishing behaviour". From this database (point  based), we get data at different spatial and temporal aggregations (polygons). I need to consider all the points, either on the boundaries or not. That's why I wrote the function, that has a fixed criteria for the selection. The question is, can it be optimized? It is quite slow.

On the other hand, it works for rectangular (lat-long) polygons, but I don't know how to obtain this for irregular polygons (exclusive economic zones, marine protected areas, etc.)


You might consider a function that assigns points to polygons, perhaps
taking the first member of selecting all polygons such that
ST_Intersects(point, polygon) and assigning that in a table of points
and assigned polygons, and then using that for your statistics.

I guess you could also think about a function like ST_HalfOpenIntersect but
that somehow treats each edge as either a closed or open interval
depending on angle, so that points on shared edges would end up matching
one and not the other.
--

Gustavo Martínez Puljak


cel: +549 11 64089000
Linkedin:www.linkedin.com/in/gustavomartinezpuljak

Martin Davis

unread,
Mar 16, 2022, 1:44:44 PM3/16/22
to PostGIS Users Discussion
Can you fix this by treating it as a standard non-spatial data de-duplication problem?   I.e. use the spatial query to assign a polygon ID to each point, and then keep only a single attribute record for each point.

Lots of examples of this out there, e.g.

Gustavo Martinez

unread,
Mar 17, 2022, 7:54:35 AM3/17/22
to postgi...@lists.osgeo.org




-------- Mensaje reenviado --------
Asunto: Re: [postgis-users] problem with points in polygon edge
Fecha: Wed, 16 Mar 2022 15:21:31 -0300
De: Gustavo Martinez <gmpu...@yahoo.com.ar>
A: Greg Troxel <g...@lexort.com>


Hi Greg,

I can't share the database. Attached are 2 shapefiles with the data of the following example (consider that the queries run on the server):

- select b.grado, b.cuarto, count(idvms) from monitoreo.vms_2021 a, datos.cuadricula_05 b where a.idprts='20215019326' and st_within(a.geom,b.geom) group by 1,2;

Gives:

grado    cuarto    count
3857    NE    13
3857    NO    49
3857    SO    6
3858    SE    13
3958    NE    16
3958    SO    16
4058    NO    15
4059    NE    3
4059    SE    19
4159    NE    11
4159    NO    6
4159    SO    18
4259    NO    18
4259    SO    10
4260    SE    6
4360    NE    17
4360    SE    14
4459    NO    453
4460    NE    307


- select b.grado, b.cuarto, count(idvms) from monitoreo.vms_2021 a, datos.cuadricula_05 b where a.idprts='20215019326' and st_intersects(a.geom,b.geom) group by 1,2;

Gives:

grado    cuarto    count
3857    NE    13
3857    NO    49
3857    SO    6
3858    SE    13
3958    NE    16
3958    SO    16
4058    NO    15
4059    NE    3
4059    SE    19
4159    NE    11
4159    NO    6
4159    SO    18
4259    NO    18
4259    SO    10
4260    SE    6
4360    NE    17
4360    SE    14
4459    NO    461
4460    NE    315


- select b.grado, b.cuarto, count(idvms) from monitoreo.vms_2021 a, datos.cuadricula_05 b where a.idprts='20215019326' and monitoreo.dentro(a.geom,b.geom) group by 1,2;

Gives:

grado    cuarto    count
3958    SO    16
4259    NO    18
4059    NE    3
4159    NO    6
4460    NE    315
3857    NE    13
4159    SO    18
3958    NE    16
3857    SO    6
3857    NO    49
4360    NE    17
4260    SE    6
4259    SO    10
4459    NO    453
4058    NO    15
3858    SE    13
4360    SE    14
4059    SE    19
4159    NE    11


- select count(idvms) from monitoreo.vms_2021 where idprts='20215019326'

Gives:

count
1018

You can see that st_intersects adds points and st_within gives a lower count that the real one.


El 16/3/22 a las 14:30, Greg Troxel escribió:

You still haven't explained what you are trying to do, and haven't
posted your queries.  That's ok, but I have not further useful
suggestions for you.

example.zip
Reply all
Reply to author
Forward
0 new messages