alternatives to simplify() for reducing points in polygons

2,720 views
Skip to first unread message

Steve Howell

unread,
Jun 8, 2009, 7:51:04 PM6/8/09
to geodjango
Hi, I have some shape files from the census site that have far more
granularity than I really need for my application, and I want to
reduce the number of points in the polygons to save storage, make
queries more efficient, etc.

The simplify() method seems to fit the bill for me, but instead of
specifying a distance epsilon, I just want to set a maximum number of
points in the polygon that I keep.

The underlying algorithm looks straightforward enough, so I'm thinking
of writing my own variation in Python, and I'm just wondering if
anybody has already done the same thing and wouldn't mind sharing the
code. Basically, the idea is to do something like this:

1) Find all points' orthogonal distances from segment defined by
neighbors.
2) Discard the point that is closest to its neighbors' segment.
3) Recalculate the two neighbors' distance values.
4) Repeat steps 2 and 3.
5) Stop when I'm down to the number of points that I desire to keep.

Another approach that I've though of:

1) Figure out a good way to calibrate the initial epsilon, such as by
taking the diagonal of the extent of the polygon and applying a
percentage.
2) Keep bumping up the epsilon until I get to a desired number of
points.

Any advice will be appreciated.

Also, I am wondering if anybody has ever calibrated the algorithm to
look at relative distances instead of absolute distances. For
example, it seems like a point that is 100 feet from the rest of the
polygon adds more value if the closest segment is only 1000 feet vs.
the scenario where the closest segment is maybe a mile long.

Also, it seems like you might want to grant immunity to any point that
is not "between" its neighbors, but I'm wondering if that really comes
up in the real world, and if so, how you would define "between" in a
rigorous manner.

One final question--does anybody have a handy online link to the code
for simplify()?

Cheers,

Steve

Josh Livni

unread,
Jun 8, 2009, 8:09:23 PM6/8/09
to geod...@googlegroups.com
Hi Steve,

There are a number of different polyline simplification algorithms, but the one used by st_simplify in postgis uses douglas-peucker (http://postgis.refractions.net/documentation/manual-svn/ST_Simplify.html).  

Here's some python code for that algorithm (courtesy of Schuyler Erle) which should get you started  http://mappinghacks.com/code/dp.py.txt

Mapshaper.org, I think, does a really nice job showing some different algorithms which you might find interesting.


 -Josh

Steve Howell

unread,
Jun 8, 2009, 9:37:56 PM6/8/09
to geod...@googlegroups.com

--- On Mon, 6/8/09, Josh Livni <jo...@umbrellaconsulting.com> wrote:
>
> Here's some python code for that algorithm
> (courtesy of Schuyler Erle) which should get you started
>  http://mappinghacks.com/code/dp.py.txt
>
> Mapshaper.org, I think, does a really nice job
> showing some different algorithms which you might find
> interesting.
>
>

Thanks, all good stuff. I am leaning toward the VW algorithm now.

So now for a more basic question--are there examples anywhere of how to instantiate a MultiPolygonField from a simple list of points? I am guessing it has a MultiPolygon class as its back end, but how do I tell whether it's the one from gdal.geometries or geos.collections?

Steve Howell

unread,
Jun 9, 2009, 4:43:08 PM6/9/09
to geod...@googlegroups.com

Hi, just to follow up on my own recent posting, folks may find this snippet useful as an alternative to the batteries-included simplify():

http://www.djangosnippets.org/snippets/1559/

Thanks to Josh Livni for helpful pointers.

Comments are welcome, of course, either here or on the snippets page.

Thanks,

Steve


Dane Springmeyer

unread,
Jun 10, 2009, 12:13:39 AM6/10/09
to geod...@googlegroups.com

On Jun 8, 2009, at 6:37 PM, Steve Howell wrote:

> Thanks, all good stuff. I am leaning toward the VW algorithm now.


VW rocks. I'd love to see that implemented in python.

http://users.cs.cf.ac.uk/C.B.Jones/Zhou_JonesSDH04.pdf

Dane

Steve Howell

unread,
Jun 10, 2009, 12:40:16 PM6/10/09
to geod...@googlegroups.com

--- On Tue, 6/9/09, Dane Springmeyer <dane.spr...@gmail.com> wrote:
>
> VW rocks. I'd love to see that implemented in python.
>
> http://users.cs.cf.ac.uk/C.B.Jones/Zhou_JonesSDH04.pdf
>

I didn't read the paper carefully, but this Python code at least captures some of the concepts.

http://www.djangosnippets.org/snippets/1559/

The code removes vertices by area, and recalculates triangles as vertices are removed. It does a straight area calculation, though, i.e. there is no tuning for flatness, convexity, etc.

Also, I didn't quite understand the section about removing all points and then restoring them in order, so I may be missing something, but my interpretation of the gist of the algorithm seemed to produce good results on our dataset.

If anybody gets a chance to play with the code I posted above, please let us know how you modified it.

Thanks,

Steve


Josh Livni

unread,
Jun 10, 2009, 12:56:24 PM6/10/09
to geod...@googlegroups.com
I only skimmed the paper -- but I think the bit about restoring points after the calculation is the key to the overall VW method:  after calculating the 'importance' of a vertex (using whatever method), you don't remove it from future calculations just because it's 'unimportant' -- when you calculate the next vertex, the potentially unimportant one you previously calculated will still be there (because you put it back).    So their method is something like:
 1) give every single vertex an 'importance' (eg based on effective area of triangle)
 2) after doing that for every vertex, only then choose which ones to remove (eg the least important 40%)

whereas I assume your code (which I haven't even looked at) does something like
 1) walk along the line, removing 'unimportant' vertices as you go along, so perhaps previously removed vertices are not included in current calculations about what to remove ...

One advantage of their method, as I see it, is once the original calculation is done you don't need to recalculate importance for new simplification, you just display those vertices where importance > x.

But then again I only skimmed it, so I could be wrong.

 -Josh

Steve Howell

unread,
Jun 10, 2009, 2:20:19 PM6/10/09
to geod...@googlegroups.com

--- On Wed, 6/10/09, Josh Livni <jo...@umbrellaconsulting.com> wrote:

>
> whereas I assume your code (which I haven't
> even looked at) does something like 1) walk
> along the line, removing 'unimportant' vertices as
> you go along, so perhaps previously removed vertices are not
> included in current calculations about what to remove

Just to be clear, after I remove any point, I do recalculate the effective areas for its neighbors, as you tend to want to keep a point more after its neighbor has been removed. So the algorithm is smart that way. But it *is* greedy, i.e. every iteration removes the least effective point, and there is no "restore" process.

Here is the main piece of code.


def simplify_points(points, max, id, name):
tups = []
for i, point in enumerate(points):
if i == 0 or i == len(points) - 1:
benefit = None
else:
benefit = benefit_function(point, points[i-1], points[i+1])
tups.append((point, benefit))
removals = 0
while len(tups) > max + 1:
min_benefit = tups[1][1]
min_i = 1
for i, (point, benefit) in enumerate(tups):
# print i, benefit
if benefit and benefit < min_benefit:
min_benefit = benefit
min_i = i
del tups[min_i]
removals += 1
for i in [min_i-1, min_i]:
point = tups[i][0]
if i == 0 or i == len(tups) - 1:
benefit = None
else:
benefit = benefit_function(point, tups[i-1][0], tups[i+1][0])
tups[i] = (point, benefit)

points = [point for point, benefit in tups]
return points

Elliot Hallmark

unread,
Jul 26, 2014, 2:00:14 AM7/26/14
to geod...@googlegroups.com
Hi all,

I have implemented the Visvalingam-Whyatt algorithm in python and am happy with how it works.  Once the array of vertex importance is built, filter operations are super fast.  But the building of the importance index isn't very slow, and is faster than the python Ramer-Douglas-Peucker implementation I found.

https://github.com/Permafacture/Py-Visvalingam-Whyatt/

thanks,
  Elliot

Max Demars

unread,
Jul 28, 2014, 2:21:12 PM7/28/14
to geod...@googlegroups.com
Hi Elliot,

I would like to test your algorithm. If I understand your script on gitHub, the input should be array of coordinates? Would it be hard for you to add the possibility to have GDAL object as input, especially multipolygon?

Thanks for your share, I am quite enthousiast with it.

-Max Demars

Elliot Hallmark

unread,
Jul 31, 2014, 4:45:59 PM7/31/14
to Elliot Hallmark, geod...@googlegroups.com
Max,

I've added OGRGeometry support. It isn't throughly tested, but I was able to compress 21,000 polygons and multipolygons from ftp://ftp.ci.austin.tx.us/GIS-Data/Regional/zoning/zoning.zip

Building the importance lists took 55 seconds on my computer and 2.5 seconds for getting simplified OGRGeometry objects back.  Of the 2.5 seconds, 1.4 was OGRGeometry object creation time and at least another .4 seconds was string manipulation setting up for creating the objects. I made it so one can return OGRGeometry objects or arrays of floats.  The array of floats returns in 400ms.

Let me know if this works for you.

-Elliot


On Mon, Jul 28, 2014 at 1:33 PM, Elliot Hallmark <Permaf...@gmail.com> wrote:
Hi Max,

Yes I can.  Thanks for asking.  The only thing is that GDAL objects take a while to create and each masking operation would have to create new ones for the result. I'm guesstimating that gdal geom creation time will be a good portion of the run time. A small number of gdal geoms with a large number of points would be fine, but 50,000+ might be sluggish.  I'm sure it'll be fine though. 


Also, do you mean django.contrib.gis.gdal objects?


-Elliot


--
You received this message because you are subscribed to a topic in the Google Groups "geodjango" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/geodjango/EwgAxiVf0eA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to geodjango+...@googlegroups.com.
To post to this group, send email to geod...@googlegroups.com.
Visit this group at http://groups.google.com/group/geodjango.
For more options, visit https://groups.google.com/d/optout.


Elliot Hallmark

unread,
Jul 31, 2014, 4:45:59 PM7/31/14
to geod...@googlegroups.com
Hi Max,

Yes I can.  Thanks for asking.  The only thing is that GDAL objects take a while to create and each masking operation would have to create new ones for the result. I'm guesstimating that gdal geom creation time will be a good portion of the run time. A small number of gdal geoms with a large number of points would be fine, but 50,000+ might be sluggish.  I'm sure it'll be fine though. 


Also, do you mean django.contrib.gis.gdal objects?


-Elliot
On Mon, Jul 28, 2014 at 1:21 PM, Max Demars <burton...@gmail.com> wrote:

--
Reply all
Reply to author
Forward
0 new messages