Hi, I have some shape files from the census site that have far more
granularity than I really need for my application, and I want to
reduce the number of points in the polygons to save storage, make
queries more efficient, etc.
The simplify() method seems to fit the bill for me, but instead of
specifying a distance epsilon, I just want to set a maximum number of
points in the polygon that I keep.
The underlying algorithm looks straightforward enough, so I'm thinking
of writing my own variation in Python, and I'm just wondering if
anybody has already done the same thing and wouldn't mind sharing the
code. Basically, the idea is to do something like this:
1) Find all points' orthogonal distances from segment defined by
neighbors.
2) Discard the point that is closest to its neighbors' segment.
3) Recalculate the two neighbors' distance values.
4) Repeat steps 2 and 3.
5) Stop when I'm down to the number of points that I desire to keep.
Another approach that I've though of:
1) Figure out a good way to calibrate the initial epsilon, such as by
taking the diagonal of the extent of the polygon and applying a
percentage.
2) Keep bumping up the epsilon until I get to a desired number of
points.
Any advice will be appreciated.
Also, I am wondering if anybody has ever calibrated the algorithm to
look at relative distances instead of absolute distances. For
example, it seems like a point that is 100 feet from the rest of the
polygon adds more value if the closest segment is only 1000 feet vs.
the scenario where the closest segment is maybe a mile long.
Also, it seems like you might want to grant immunity to any point that
is not "between" its neighbors, but I'm wondering if that really comes
up in the real world, and if so, how you would define "between" in a
rigorous manner.
One final question--does anybody have a handy online link to the code
for simplify()?
On Mon, Jun 8, 2009 at 4:51 PM, Steve Howell <showel...@yahoo.com> wrote:
> Hi, I have some shape files from the census site that have far more
> granularity than I really need for my application, and I want to
> reduce the number of points in the polygons to save storage, make
> queries more efficient, etc.
> The simplify() method seems to fit the bill for me, but instead of
> specifying a distance epsilon, I just want to set a maximum number of
> points in the polygon that I keep.
> The underlying algorithm looks straightforward enough, so I'm thinking
> of writing my own variation in Python, and I'm just wondering if
> anybody has already done the same thing and wouldn't mind sharing the
> code. Basically, the idea is to do something like this:
> 1) Find all points' orthogonal distances from segment defined by
> neighbors.
> 2) Discard the point that is closest to its neighbors' segment.
> 3) Recalculate the two neighbors' distance values.
> 4) Repeat steps 2 and 3.
> 5) Stop when I'm down to the number of points that I desire to keep.
> Another approach that I've though of:
> 1) Figure out a good way to calibrate the initial epsilon, such as by
> taking the diagonal of the extent of the polygon and applying a
> percentage.
> 2) Keep bumping up the epsilon until I get to a desired number of
> points.
> Any advice will be appreciated.
> Also, I am wondering if anybody has ever calibrated the algorithm to
> look at relative distances instead of absolute distances. For
> example, it seems like a point that is 100 feet from the rest of the
> polygon adds more value if the closest segment is only 1000 feet vs.
> the scenario where the closest segment is maybe a mile long.
> Also, it seems like you might want to grant immunity to any point that
> is not "between" its neighbors, but I'm wondering if that really comes
> up in the real world, and if so, how you would define "between" in a
> rigorous manner.
> One final question--does anybody have a handy online link to the code
> for simplify()?
> Mapshaper.org, I think, does a really nice job > showing some different algorithms which you might find > interesting.
Thanks, all good stuff. I am leaning toward the VW algorithm now.
So now for a more basic question--are there examples anywhere of how to instantiate a MultiPolygonField from a simple list of points? I am guessing it has a MultiPolygon class as its back end, but how do I tell whether it's the one from gdal.geometries or geos.collections?
The code removes vertices by area, and recalculates triangles as vertices are removed. It does a straight area calculation, though, i.e. there is no tuning for flatness, convexity, etc.
Also, I didn't quite understand the section about removing all points and then restoring them in order, so I may be missing something, but my interpretation of the gist of the algorithm seemed to produce good results on our dataset.
If anybody gets a chance to play with the code I posted above, please let us know how you modified it.
I only skimmed the paper -- but I think the bit about restoring points after
the calculation is the key to the overall VW method: after calculating the
'importance' of a vertex (using whatever method), you don't remove it from
future calculations just because it's 'unimportant' -- when you calculate
the next vertex, the potentially unimportant one you previously calculated
will still be there (because you put it back). So their method is
something like: 1) give every single vertex an 'importance' (eg based on
effective area of triangle)
2) after doing that for every vertex, only then choose which ones to remove
(eg the least important 40%)
whereas I assume your code (which I haven't even looked at) does something
like
1) walk along the line, removing 'unimportant' vertices as you go along, so
perhaps previously removed vertices are not included in current calculations
about what to remove ...
One advantage of their method, as I see it, is once the original calculation
is done you don't need to recalculate importance for new simplification, you
just display those vertices where importance > x.
But then again I only skimmed it, so I could be wrong.
> The code removes vertices by area, and recalculates triangles as vertices
> are removed. It does a straight area calculation, though, i.e. there is no
> tuning for flatness, convexity, etc.
> Also, I didn't quite understand the section about removing all points and
> then restoring them in order, so I may be missing something, but my
> interpretation of the gist of the algorithm seemed to produce good results
> on our dataset.
> If anybody gets a chance to play with the code I posted above, please let
> us know how you modified it.
--- On Wed, 6/10/09, Josh Livni <j...@umbrellaconsulting.com> wrote:
> whereas I assume your code (which I haven't > even looked at) does something like 1) walk > along the line, removing 'unimportant' vertices as > you go along, so perhaps previously removed vertices are not > included in current calculations about what to remove
Just to be clear, after I remove any point, I do recalculate the effective areas for its neighbors, as you tend to want to keep a point more after its neighbor has been removed. So the algorithm is smart that way. But it *is* greedy, i.e. every iteration removes the least effective point, and there is no "restore" process.
Here is the main piece of code.
def simplify_points(points, max, id, name): tups = [] for i, point in enumerate(points): if i == 0 or i == len(points) - 1: benefit = None else: benefit = benefit_function(point, points[i-1], points[i+1]) tups.append((point, benefit)) removals = 0 while len(tups) > max + 1: min_benefit = tups[1][1] min_i = 1 for i, (point, benefit) in enumerate(tups): # print i, benefit if benefit and benefit < min_benefit: min_benefit = benefit min_i = i del tups[min_i] removals += 1 for i in [min_i-1, min_i]: point = tups[i][0] if i == 0 or i == len(tups) - 1: benefit = None else: benefit = benefit_function(point, tups[i-1][0], tups[i+1][0]) tups[i] = (point, benefit)
points = [point for point, benefit in tups] return points