Slice operations (django ticket #9877)

7 visualizações
Pular para a primeira mensagem não lida

Aryeh Leib Taurog

não lida,
21 de dez. de 2008, 12:21:3121/12/2008
para geodjango
Since I don't have any IRC experience, I'll be the first to take
advantage of the new group. I'm copying below the text of django
ticket #9877, in which I posted my patch to the LineString and
LinearRing classes to allow get/set/del operations using slices. I
plan to do the same with the geometry collection classes, probably
this week, because it will make my current project much easier for me.

I'd really love to get some feedback regarding the patch in general,
or addressing any of the questions below in particular.

Thanks in advance
Aryeh Leib Taurog

--------------
from http://code.djangoproject.com/ticket/9877

Experimenting with geodjango, I found myself immediately wanting
easier, more Pythonic ways to manipulate the data in the GEOS geometry
objects:

geometryObj[i:j] = ((33.42,-70.89),(34.819,-67.61))

So I have extended the LineString and LinearRing classes' __getitem__
and __setitem__ methods to allow slicing, added a __delitem__ while I
was at it, and threw in most of the non-special, standard list methods
as well.

I'd like to do this for the geometry collections next, but I'm posting
this here (is there a better place?) to solicit feedback on these mods
before I continue.

Questions
1. Is there some really good reason not to do this that I'm missing,
or am I just the first one who wanted it enough to sit down and do it?

2. Did I do this correctly? It seems to work. I tested the correctness
of the mutations themselves, but haven't extensively tested the GEOS
methods on the mutated objects.

3. I've created a situation where

del geometryObj[:] # now works
geometryObj = LinearString([]) # doesn't work
Is there some reason I shouldn't do that?

4. Does it make sense to implement similar mutation methods for Point
objects or Polygon objects?

5. I think it would be nice to make the LinearRing mutations fool-
proof. Meaning since the first and last point must be the same, a
LinearRing object with n points should really behave like a list of n
- 1 coordinates, all the while maintaining the last coordinate as a
mirror of the first so that the geometry won't be invalidated by the
mutations. Comments?

6. I implemented count() which is redundant with __len__() and, in
this case, num_points(). Overkill?

7. Also I'm not sure if index(x) and remove(x) made sense in this
context, but I could imagine cases where they might be useful and so I
erred on the side of implementing as much of the standard list
interface as I thought reasonable. Comments?

8. The sort() and reverse() methods don't seem applicable here. These
are the only standard list methods which I did not implement. However,
I think that they might apply to the geometry collections. Comments?

Justin Bronn

não lida,
31 de dez. de 2008, 14:41:1831/12/2008
para geod...@googlegroups.com
Aryeh Leib Taurog wrote:
> So I have extended the LineString and LinearRing classes' __getitem__
> and __setitem__ methods to allow slicing, added a __delitem__ while I
> was at it, and threw in most of the non-special, standard list methods
> as well.
>
> I'd like to do this for the geometry collections next, but I'm posting
> this here (is there a better place?) to solicit feedback on these mods
> before I continue.
>
> Questions
> 1. Is there some really good reason not to do this that I'm missing,
> or am I just the first one who wanted it enough to sit down and do it?
>

When I first created the GEOS interface, I wanted to have fully mutable
geometries. In fact, I tried to make the objects more memory efficient
by re-using geometries (e.g., keeping previous references to internal
rings if shell is replaced). This did not turn out well -- recycling
`const` pointers is a bad idea, leading to unpredictable behavior and
crashes.

From there I moved to a non-modification policy, with any geometry
modifications resulting in a new cloned geometry. This has turned out
to be a solid approach thus far.

The moral of the story is that as much as we want these things to behave
like Python objects, we have to remember it's really a thin layer over
the GEOS C++ objects -- and have to abide by its limitations or face
very serious consequences when things go awry.

> 2. Did I do this correctly? It seems to work. I tested the correctness
> of the mutations themselves, but haven't extensively tested the GEOS
> methods on the mutated objects.
>

Let me save any implementation commentary until I've applied and tested
myself. I've skimmed the patch, and it "looks" to be right (the
expansive test suite is appreciated and necessary for such a growth of
the API).

What I want to make sure of is that these methods don't create any
adverse side effects. Specifically, I want to avoid segmentation
faults. Unfortunately, these type of problems occur a lot with the type
of operations you're doing, e.g., copying a geom to a modified state and
deleting the original reference.

I'm not saying these bugs are present, but they can manifest themselves
in subtle ways, especially outside common usage scenarios. Thus, I'd
want very robust tests of these features, especially deletion, of corner
cases and strange user hijinks to lessen the risk of unanticipated crashes.

> 3. I've created a situation where
>
> del geometryObj[:] # now works
> geometryObj = LinearString([]) # doesn't work
> Is there some reason I shouldn't do that?
>

In the first statement, `del` is deleting what is retrieved in the slice
operation, in this case it would be clones -- so `geometryObj` itself
isn't actually deleted, and persists. In the current implementation, I
don't support 'empty' geometries (although the OGRGeometry
implementation does). This is a reflection of the underlying libraries
themselves as (if IIRC) GEOS doesn't let you create empty geometries and
OGR (GDAL) does.

> 4. Does it make sense to implement similar mutation methods for Point
> objects or Polygon objects?
>

I'm not sure -- as it is, you can just use `geom.coords`, get an array,
mutate as necessary and then use the modified arrays in the geometry
constructors.

> 5. I think it would be nice to make the LinearRing mutations fool-
> proof. Meaning since the first and last point must be the same, a
> LinearRing object with n points should really behave like a list of n
> - 1 coordinates, all the while maintaining the last coordinate as a
> mirror of the first so that the geometry won't be invalidated by the
> mutations. Comments?

This type of coordinate management is done at the GEOS level, and I'm
not sure about this level of implementation without a robust use case.

>
> 6. I implemented count() which is redundant with __len__() and, in
> this case, num_points(). Overkill?
>

Possibly -- I'm trying to keep the API parred down, especially since I
already have a lot of redundant methods (e.g., `exterior_ring`, `shell`).

> 7. Also I'm not sure if index(x) and remove(x) made sense in this
> context, but I could imagine cases where they might be useful and so I
> erred on the side of implementing as much of the standard list
> interface as I thought reasonable. Comments?
>
> 8. The sort() and reverse() methods don't seem applicable here. These
> are the only standard list methods which I did not implement. However,
> I think that they might apply to the geometry collections. Comments?

I think the `index` routine would be useful; but I'm hesitant about the
in-place modification routines (sort, reverse, remove) for the
aforementioned reasons about geometry modification.

Anyone else with comments?

-Justin

Aryeh Leib Taurog

não lida,
1 de jan. de 2009, 08:37:4501/01/2009
para geodjango

On Dec 31 2008, 9:41 pm, Justin Bronn <jbr...@gmail.com> wrote:
> Aryeh Leib Taurog wrote:
> > So I have extended the LineString and LinearRing classes' __getitem__
> >
> > Questions
> > 1. Is there some really good reason not to do this that I'm missing,
> > or am I just the first one who wanted it enough to sit down and do it?
>
> When I first created the GEOS interface, I wanted to have fully mutable
> geometries.  In fact, I tried to make the objects more memory efficient
> by re-using geometries (e.g., keeping previous references to internal
> rings if shell is replaced).  This did not turn out well -- recycling
> `const` pointers is a bad idea, leading to unpredictable behavior and
> crashes.
>
>  From there I moved to a non-modification policy, with any geometry
> modifications resulting in a new cloned geometry.  This has turned out
> to be a solid approach thus far.

This makes a lot of sense to me. If I understand correctly, then,
it's not the behavior which is problematic, but the initial
implementation. I have tried to follow suit. Namely:

A. When modifying a LineString or LinearRing object, if num_coords
remains constant, changes are made only to specific coordinates in the
sequence. If num_coords changes, however, the whole underlying object
is created from scratch.

B. When modifying a GeometryCollection, the whole thing is always
rebuilt from scratch using clones of the original (and newly assigned)
geometries.

> > 3. I've created a situation where
>
> > del geometryObj[:]  # now works
> > geometryObj = LinearString([]) # doesn't work
> > Is there some reason I shouldn't do that?
>
> In the first statement, `del` is deleting what is retrieved in the slice
> operation, in this case it would be clones -- so `geometryObj` itself
> isn't actually deleted, and persists.  In the current implementation, I
> don't support 'empty' geometries (although the OGRGeometry
> implementation does).  This is a reflection of the underlying libraries
> themselves as (if IIRC) GEOS doesn't let you create empty geometries and
> OGR (GDAL) does.

Unless I missed something, the usual meaning of 'del obj[a:b]' in
Python is that obj should modify itself. This is what I've done to
the geometry objects. In the case above, geometryObj deletes the
original underlying object and creates a new, empty one. As far as I
can tell (I admit I didn't look carefully), the GEOS library seems to
support this, at least as far as the instantiation is concerned, but
the Python wrapper doesn't permit it. Now I've gone and created a
workaround.

> > 4. Does it make sense to implement similar mutation methods for Point
> > objects or Polygon objects?
>
> I'm not sure -- as it is, you can just use `geom.coords`, get an array,
> mutate as necessary and then use the modified arrays in the geometry
> constructors.

Yes. I could have done the same for LineString and MultiPoint, but
that quickly become painfully awkward. Hence the patch.


I should mention two other compatibility issues that I didn't consider
AT ALL in writing the patch: NumPy and earlier versions of Python. I
am using Python 2.5 and I'm not so well-versed in Python or its
history, so I have probably used some idioms not supported in earlier
versions. Also I didn't consider NumPy arrays as possible parameters
to LineString.__setitem__, which I suppose should be considered for
consistency.

Thanks and best regards,
Aryeh Leib

Justin Bronn

não lida,
2 de jan. de 2009, 09:07:0802/01/2009
para geod...@googlegroups.com
Aryeh,

I have a mercurial branch for the GIS features that will appear in
Django 1.1. The repository is: http://geodjango.org/hg/gis-1.1/

One of the things I'm doing for 1.1 is refactoring the GEOS and GDAL
interfaces a bit. For example, I've moved the Point, LineString, and
Polygon classes to their own modules and I've moved the GEOS tests to be
within the `geos` module (e.g., `from django.contrib.gis.geos import
tests; tests.run()`).

I think the changes of #9877 should be a part of this refactor -- thus,
can you restructure the patch so it'll apply to what's in the gis-1.1
mercurial repo?

Contact me off-list if you need any help installing/using mercurial.

Thanks,
-Justin

Aryeh Leib Taurog

não lida,
12 de jan. de 2009, 14:42:0812/01/2009
para geodjango
I've checked out the 1.1 branch and made the appropriate
modifications.
Now I've found the clincher bug, and I'd love suggestions/votes on how
to deal with it.

I augmented the functionality of the geometries which I modified by
adding a new base class, ListMixin, to their inheritance lists. I was
relying on the ListMixin's __init__ function being called. Now I see
that when a geometry is instantiated using from_str, from_wkt, etc. it
is instantiated as a GEOSGeometry object and subsequently made into an
instance of the specific subclass by reassigning to self.__class__ and
therefore the __init__ methods of the subclass and of my new base
class are never called. And so the mutations don't work (at least
some of them).

1. I guess an easy/stupid solution would be to not assume that
__init__ gets called, put the initialization code in a separate
function, and then call it before performing any dependent
operations. I don't like that though, because it either means some
fancy footwork or checking a flag every time I want to perform
specific operations.

2. Is there some way that from_wkt, etc. could be made to instantiate
objects of the correct subclass to begin with? It seems to me this
could be done using a factory function which would perform the class
lookup and then instantiate the class directly. I think that is a
much more desirable approach.

3. The third option, which also seems pretty reasonable to me, would
be to make GEOSGeometry a subclass of ListMixin and implement the
necessary functions for the remaining two geometries. I think it
would be quite appropriate for Polygon and occasionally convenient (if
a bit overkill) for Point.

4. Or perhaps (if we really want to be twisted) the _post_init
function could invoke __init__ on the object's base classes once it
has changed self.__class__.

Comments?

Aryeh Leib

Justin Bronn

não lida,
31 de jan. de 2009, 16:28:5631/01/2009
para geod...@googlegroups.com
Aryeh Leib Taurog wrote:
> 3. The third option, which also seems pretty reasonable to me, would
> be to make GEOSGeometry a subclass of ListMixin and implement the
> necessary functions for the remaining two geometries. I think it
> would be quite appropriate for Polygon and occasionally convenient (if
> a bit overkill) for Point.

I merged in your patch, and went with this approach. [1] Also, I got
rid of __init__() on ListMixin because all it was doing was setting
attributes/functions that we can set directly on the class definitions.
With no need to call __init__ it solves the problem.

The additional routines for Point/Polygon still need to be implemented
(a TODO). But, as is, all tests pass, including the original and new
mutability suite.

Best Regards,
-Justin

[1] http://geodjango.org/hg/gis-1.1/rev/466bece04a15

Aryeh Leib Taurog

não lida,
1 de fev. de 2009, 08:47:3001/02/2009
para geod...@googlegroups.com
Justin Bronn wrote:
> Aryeh Leib Taurog wrote:
>> 3. The third option, which also seems pretty reasonable to me,
>> would be to make GEOSGeometry a subclass of ListMixin and
>> implement the necessary functions for the remaining two
>> geometries. I think it would be quite appropriate for Polygon
>> and occasionally convenient (if a bit overkill) for Point.
>
> I merged in your patch, and went with this approach. [1]

Okay. I did almost the same thing. I also implemented Point and
Polygon. I thought I'd finish sooner, but I had a bout of pneumonia
and then some family events which interfered with my coding schedule.

> Also, I got rid of __init__() on ListMixin because all it was doing
> was setting attributes/functions that we can set directly on the
> class definitions. With no need to call __init__ it solves the
> problem.

Agreed, but it violates encapsulation--or at least convenience and
elegance. If GEOSGeometry inherits from ListMixin, I see no reason
not to use __init__

> The additional routines for Point/Polygon still need to be
> implemented (a TODO). But, as is, all tests pass, including the
> original and new mutability suite.

I will try to finish my version of the patch with Point and Polygon
mutations implemented and send you by mid-week.

Aryeh Leib

> [1] http://geodjango.org/hg/gis-1.1/rev/466bece04a15

Responder a todos
Responder ao autor
Encaminhar
0 nova mensagem