Field.contribute_to_class and contrib fields

337 views
Skip to first unread message

Jeremy Dunck

unread,
May 15, 2007, 6:09:58 PM5/15/07
to django-d...@googlegroups.com
In the gis work, we'd like to contribute a fair bit of additional
functionality to GIS fields and models with GIS fields.

So far, field contributions seem to be largely done by adding
attributes to the model, such as FileField's get_%s_url.

I think this approach may just be legacy from the old codegen days,
and I'm not sure this is a good approach when the number of
contributions or additional attributes is large.

As an example, I'd like to add GEOS attributes to GIS fields, which
includes things like dimensions, area, intersection, boundary, convex
hull, etc.

It seems to me that keeping such things as attributes of the field
would be desirable since they're related and since doing so would
namespace them. Namespaces are good. :)

Am I missing a complication in the implementation? Of course I have
some reluctance since this would make the interfaces uneven depending
on whether fields pre-existed magic-removal...

Malcolm Tredinnick

unread,
May 15, 2007, 6:16:29 PM5/15/07
to django-d...@googlegroups.com
On Tue, 2007-05-15 at 17:09 -0500, Jeremy Dunck wrote:
> In the gis work, we'd like to contribute a fair bit of additional
> functionality to GIS fields and models with GIS fields.
>
> So far, field contributions seem to be largely done by adding
> attributes to the model, such as FileField's get_%s_url.
>
> I think this approach may just be legacy from the old codegen days,
> and I'm not sure this is a good approach when the number of
> contributions or additional attributes is large.
>
> As an example, I'd like to add GEOS attributes to GIS fields, which
> includes things like dimensions, area, intersection, boundary, convex
> hull, etc.
>
> It seems to me that keeping such things as attributes of the field
> would be desirable since they're related and since doing so would
> namespace them. Namespaces are good. :)

I agree with all of this. Once the model instance is created, the field
attributes are just Python objects anyway, so you might as well use the
full power and make your utility methods be methods on the field itself.

Regards,
Malcolm

Gulopine

unread,
May 15, 2007, 7:48:38 PM5/15/07
to Django developers
Yeah, this then goes along with the topic of coercion on model fields,
since the current structure only allows basic Python types to be
created once the model is done. While working with DurationField, I
did manage to come up with a fairly simple way to do this, and it
seems to pass all the test I've thrown at it so far. I don't expect
it's complete, but it's attached to ticket #3982. I'll be updating
DurationField shortly to use this, so #2443 will have an example of
this in action soon.

Let me know if I'm on the right track.

-Gul

On May 15, 6:16 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Jason Davies

unread,
May 16, 2007, 5:51:58 AM5/16/07
to Django developers
On May 15, 11:09 pm, "Jeremy Dunck" <jdu...@gmail.com> wrote:

> It seems to me that keeping such things as attributes of the field
> would be desirable since they're related and since doing so would
> namespace them. Namespaces are good. :)

Agreed. This did cross my mind some time ago actually. It makes more
sense to have model_instance.file_field.url instead of
model_instance.get_file_field_url().

Jason

Robert Coup

unread,
May 16, 2007, 9:52:05 PM5/16/07
to django-d...@googlegroups.com
Jeremy Dunck wrote:
> In the gis work, we'd like to contribute a fair bit of additional
> functionality to GIS fields and models with GIS fields.
>
> So far, field contributions seem to be largely done by adding
> attributes to the model, such as FileField's get_%s_url.
>
> I think this approach may just be legacy from the old codegen days,
> and I'm not sure this is a good approach when the number of
> contributions or additional attributes is large.
>
> As an example, I'd like to add GEOS attributes to GIS fields, which
> includes things like dimensions, area, intersection, boundary, convex
> hull, etc.
>
If we store the geometry /as/ a GEOSGeometry object, then it already has
all those attributes/methods on it :)

I have done a first stab at lazy-instantiated geometries, which is
nearly working how I expect it to. I want to look at Gulopine's coercion
work first, but I'll attach it to a ticket by the end of the weekend so
you can look at it and see if thats the way we want to head.

Rob :)

Robert Coup

unread,
May 17, 2007, 1:41:31 AM5/17/07
to django-d...@googlegroups.com
Ok, my lazy-instantiated geometries patch is in there now (#4322)

It uses a python descriptor to store an internal GEOSGeometry object,
and creates a GEOS object when you assign a text/hex string to it. The
GEOS object is only created on first access, before that it lives as the
string retrieved from the db.

Because model.my_geometry_field now returns a GEOS object, doing area,
centroid, etc is as simple (and clear) as: model.my_geometry_field.area

Not sure if it will inspire other creative solutions, but have a look.

Rob :)

--
One Track Mind Ltd.
PO Box 1604, Shortland St, Auckland, New Zealand
Phone +64-9-966 0433 Mobile +64-21-572 632
Web http://www.onetrackmind.co.nz


Marty Alchin

unread,
May 17, 2007, 11:02:40 AM5/17/07
to Django developers
I like that approach to this rather tricky situation. With all my
recent work on DurationField, I took a chance at trying to make your
code into something generic enough to work for both needs, and
possibly others. I'm imagining a FileField that can act as a file-type
object, lazily opening the file the first time something calls
obj.file_field.read() or obj.file_field.write(). Just an example of
what's possible.

Anyway, I noticed that I'm having some trouble with it. First, it
looks like it's not truly lazy, since the object is instantiated
during __set__, rather than __get__. Since Model.__init__ uses
setattr(self, field.attname, val) to assign values to the instance,
the descriptor's __set__ method gets triggered as soon as the data is
retrieved from the database. At least, that's one it's doing for me.

The other issue I'm having is with that of GEOSGeometrys srid
attribute. I don't have GeoDjango installed, but I can see what you're
trying to do with srid there, and that makes it difficult to make into
a generic descriptor. It's easy to pass in an initialization function,
but the trick is knowing whether an object has already been
initialized within the context of that field. Currently I'm having to
call it every time __get__ is called, which can't be a good thing.
That's not a problem with the patch as is, of course, just a
difficulty I'm having in getting it to work with other Field
subclasses.

-Marty

On May 17, 1:41 am, Robert Coup <robert.c...@onetrackmind.co.nz>
wrote:

Robert Coup

unread,
May 17, 2007, 6:04:44 PM5/17/07
to django-d...@googlegroups.com
Marty Alchin wrote:
> I like that approach to this rather tricky situation. With all my
> recent work on DurationField, I took a chance at trying to make your
> code into something generic enough to work for both needs, and
> possibly others. I'm imagining a FileField that can act as a file-type
> object, lazily opening the file the first time something calls
> obj.file_field.read() or obj.file_field.write(). Just an example of
> what's possible.
>
Yeah, I just wonder whether for files/images its worth having a python
object that provides the methods, since there is other metadata besides
the python file object (like its name!)

eg:
my_model.attachment.file <- python file object
my_model.attachment.filename <- filename as stored in db
my_model.attachment.url <- url to file


> Anyway, I noticed that I'm having some trouble with it. First, it
> looks like it's not truly lazy, since the object is instantiated
> during __set__, rather than __get__. Since Model.__init__ uses
> setattr(self, field.attname, val) to assign values to the instance,
> the descriptor's __set__ method gets triggered as soon as the data is
> retrieved from the database. At least, that's one it's doing for me.
>

hmm. yes, i think that was a side-effect of the number of ways I worked
on it before finding something clean. But that was fairly dumb by all
accounts :P (Fixed now, new patch on the ticket).

Side effect is now that it's always lazy - if i pass in the text
"BANANA" it doesn't figure out its invalid before it's used later.

Ideal case:
- it's lazy when loaded from the DB
- it's not lazy the rest of the time, so if i set an invalid value i
find out straight away.

> The other issue I'm having is with that of GEOSGeometrys srid
> attribute. I don't have GeoDjango installed, but I can see what you're
> trying to do with srid there, and that makes it difficult to make into
> a generic descriptor. It's easy to pass in an initialization function,
> but the trick is knowing whether an object has already been
> initialized within the context of that field. Currently I'm having to
> call it every time __get__ is called, which can't be a good thing.
> That's not a problem with the patch as is, of course, just a
> difficulty I'm having in getting it to work with other Field
> subclasses.
>

I think in a few cases one might need to check whether things are valid,
or do some extra magic.

I changed it so that if __set__ gets a valid object (eg. just
instantiated from __get__) then it does the srid-init. Maybe __get__
could call out when its instantiating to a method on the field, which
can do extra initialisation.

Marty Alchin

unread,
May 17, 2007, 8:21:30 PM5/17/07
to django-d...@googlegroups.com
On 5/17/07, Robert Coup <rober...@onetrackmind.co.nz> wrote:
> Yeah, I just wonder whether for files/images its worth having a python
> object that provides the methods, since there is other metadata besides
> the python file object (like its name!)
>
> eg:
> my_model.attachment.file <- python file object
> my_model.attachment.filename <- filename as stored in db
> my_model.attachment.url <- url to file

Well yeah, I didn't mean have it be just a file object with no other
details. I was meaning it'd be a custom object that would have those
attributes, but also had standard file methods that lazily called file
system operations as necessary. Something like this:

my_model.attachment.filename
my_model.attachment.url
my_model.attachment.read()
my_model.attachment.write()

Such that when .read() or .write() (or whatever else) are called, it
open()s the file if it's not already open. That way, you could even do
things like write out PDFs to disk, using ReportLab.

p = canvas.Canvas(my_model.attachment)

I suppose adding .file wouldn't be the end of the world, but it'd be
more impressive without it, anyway.

> Side effect is now that it's always lazy - if i pass in the text
> "BANANA" it doesn't figure out its invalid before it's used later.
>
> Ideal case:
> - it's lazy when loaded from the DB
> - it's not lazy the rest of the time, so if i set an invalid value i
> find out straight away.

Well, I don't know what all formats can be used to instantiate
GEOSGeometry, but you might be able to at least put together a regex
to validate that it fits the format(s). But again, I don't know all
you're doing, so I can't say whether that would suit your needs well
enough.

> I think in a few cases one might need to check whether things are valid,
> or do some extra magic.
>
> I changed it so that if __set__ gets a valid object (eg. just
> instantiated from __get__) then it does the srid-init. Maybe __get__
> could call out when its instantiating to a method on the field, which
> can do extra initialisation.

Hrm, very interesting. So the srid magic happens every time a new
object is instantiated with __get__, as well as every time a new
object is set directly, but never when just retrieving a previously
cached object. I'll work with that a bit and see if that will do the
trick. Looks very slick though.

-Marty

Robert Coup

unread,
May 18, 2007, 9:44:01 PM5/18/07
to django-d...@googlegroups.com
Marty Alchin wrote:
> Well yeah, I didn't mean have it be just a file object with no other
> details. I was meaning it'd be a custom object that would have those
> attributes, but also had standard file methods that lazily called file
> system operations as necessary. Something like this:
>
> my_model.attachment.filename
> my_model.attachment.url
> my_model.attachment.read()
> my_model.attachment.write()
>
> Such that when .read() or .write() (or whatever else) are called, it
> open()s the file if it's not already open. That way, you could even do
> things like write out PDFs to disk, using ReportLab.
>
> p = canvas.Canvas(my_model.attachment)
>
> I suppose adding .file wouldn't be the end of the world, but it'd be
> more impressive without it, anyway.
>
Nah, yours sounds better, thinking it through. Clean and nice is a good
thing :)

I think eventually, eliminating all the get_XXX_something() methods for
fields (perhaps with the exception of display) would be an ideal case.


>> Side effect is now that it's always lazy - if i pass in the text
>> "BANANA" it doesn't figure out its invalid before it's used later.
>>
>> Ideal case:
>> - it's lazy when loaded from the DB
>> - it's not lazy the rest of the time, so if i set an invalid value i
>> find out straight away.
>>
>
> Well, I don't know what all formats can be used to instantiate
> GEOSGeometry, but you might be able to at least put together a regex
> to validate that it fits the format(s). But again, I don't know all
> you're doing, so I can't say whether that would suit your needs well
> enough.
>

To know whether its valid in WKB format I think its probably easier to
just try and parse it. Certainly WKT is regex'able to some extent.

What about having a _first_set attribute on the proxy object that only
allows laziness for the first time the value is set (ie. when it is set
from the DB row)? Then any later set()s get live validation and coercion
to geos objects.

The key goal is to avoid instantiating 300 geos objects when 300 rows
are retrieved from the database and the geometry field is never even
looked at. If a view is interacting with a geometry field, then its ok
to take the performance hit in return for all the benefits you get of
using a geos object.

Rob :)

Marty Alchin

unread,
May 18, 2007, 10:35:08 PM5/18/07
to django-d...@googlegroups.com
On 5/18/07, Robert Coup <rober...@onetrackmind.co.nz> wrote:
> I think eventually, eliminating all the get_XXX_something() methods for
> fields (perhaps with the exception of display) would be an ideal case.

Yeah, get_FIELD_display would be fine, since that's not really
specific to any field type.

> What about having a _first_set attribute on the proxy object that only
> allows laziness for the first time the value is set (ie. when it is set
> from the DB row)? Then any later set()s get live validation and coercion
> to geos objects.

Well, I did try something like that at first, but the problem is that
the proxy object is instantiated once, when assigned to the class. So
ultimately, all instances of the Model use the same instance of the
Proxy, which means that any attribute we set for one instance's field
would be set for all instances of that model. I haven't figured out a
way around that one yet.

-Marty

Jeremy Dunck

unread,
May 18, 2007, 11:04:16 PM5/18/07
to django-d...@googlegroups.com
On 5/18/07, Marty Alchin <gulo...@gmail.com> wrote:

> Well, I did try something like that at first, but the problem is that
> the proxy object is instantiated once, when assigned to the class. So
> ultimately, all instances of the Model use the same instance of the
> Proxy, which means that any attribute we set for one instance's field
> would be set for all instances of that model. I haven't figured out a
> way around that one yet.

You want Field.get_cache_name. Check out
fields.related.ReverseSingleRelatedObjectDescriptor for example usage.

Basically, each field gets a bag to store things in; it's owned by the
field. In ReverseSingleRelatedObjectDescriptor, it's just a related
model instance reference, but it could be a dict or whatever else is
useful.

Marty Alchin

unread,
May 28, 2007, 3:27:54 PM5/28/07
to django-d...@googlegroups.com
On 5/18/07, Robert Coup <rober...@onetrackmind.co.nz> wrote:
> What about having a _first_set attribute on the proxy object that only
> allows laziness for the first time the value is set (ie. when it is set
> from the DB row)? Then any later set()s get live validation and coercion
> to geos objects.

I submitted a new patch on #3982 that does just this, incorporating a
bit of your work on lazy instantiation, and making it more generic.
Basically, fields would have a simple way to create and manage lazy
attributes. I've already adapted DurationField to use this new
technique (#2443), and I've started work on getting FileField and
ImageField to use it as well. It'll be a bit of time before I can say
whether or not those will work properly, but it's looking promising.

The generic system I put in on #3982 allows fields to specify a class
that would be instantiated, as well as a separate function that would
handle the instantiation, and a function to be called after the object
is instantiated. The separate "creation" function is necessary in
order to have a helper funciton perform some extra trickery, but still
return an instance of the provided class. DurationField uses this
helper function to pass the value as seconds to timedelta, since
seconds are the second positional argument, not the first.

For your GeometryField, both the creation function and the init
function would be specified, since you need special handling for each
case. From what I read in your most recent patch, here's the basics of
how it could work for you.

class GeometryField(Field):
...
def contribute_to_class(self, cls, name):
super(GeometryField, self).contribute_to_class(cls, name)
setattr(cls, name, self.lazy_attribute(GEOSGeometry,
self.create, self.set_srid)

def create(self, value):
try:
return GEOSGeometry(value, "hex")
except GEOSException:
return GEOSGeometry(value, "wkt")

def set_srid(self, geom):
if geom.srid is None and self.srid is not None:
geom.set_srid(self.srid)

As you can see, it would greatly simplify the code necessary to
implement lazy instantiation of your field attributes.

I'm not tied to the self.lazy_attribute(cls, ...) syntax. It would be
just as easy to use AttributeProxy(self, cls, ...), it would just mean
that modules that will be using it would have to import AttributeProxy
as well as Field. Let the discussion ensue.

-Gul

Reply all
Reply to author
Forward
0 new messages