In the gis work, we'd like to contribute a fair bit of additional functionality to GIS fields and models with GIS fields.
So far, field contributions seem to be largely done by adding attributes to the model, such as FileField's get_%s_url.
I think this approach may just be legacy from the old codegen days, and I'm not sure this is a good approach when the number of contributions or additional attributes is large.
As an example, I'd like to add GEOS attributes to GIS fields, which includes things like dimensions, area, intersection, boundary, convex hull, etc.
It seems to me that keeping such things as attributes of the field would be desirable since they're related and since doing so would namespace them. Namespaces are good. :)
Am I missing a complication in the implementation? Of course I have some reluctance since this would make the interfaces uneven depending on whether fields pre-existed magic-removal...
On Tue, 2007-05-15 at 17:09 -0500, Jeremy Dunck wrote: > In the gis work, we'd like to contribute a fair bit of additional > functionality to GIS fields and models with GIS fields.
> So far, field contributions seem to be largely done by adding > attributes to the model, such as FileField's get_%s_url.
> I think this approach may just be legacy from the old codegen days, > and I'm not sure this is a good approach when the number of > contributions or additional attributes is large.
> As an example, I'd like to add GEOS attributes to GIS fields, which > includes things like dimensions, area, intersection, boundary, convex > hull, etc.
> It seems to me that keeping such things as attributes of the field > would be desirable since they're related and since doing so would > namespace them. Namespaces are good. :)
I agree with all of this. Once the model instance is created, the field attributes are just Python objects anyway, so you might as well use the full power and make your utility methods be methods on the field itself.
Yeah, this then goes along with the topic of coercion on model fields, since the current structure only allows basic Python types to be created once the model is done. While working with DurationField, I did manage to come up with a fairly simple way to do this, and it seems to pass all the test I've thrown at it so far. I don't expect it's complete, but it's attached to ticket #3982. I'll be updating DurationField shortly to use this, so #2443 will have an example of this in action soon.
Let me know if I'm on the right track.
-Gul
On May 15, 6:16 pm, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> On Tue, 2007-05-15 at 17:09 -0500, Jeremy Dunck wrote: > > In the gis work, we'd like to contribute a fair bit of additional > > functionality to GIS fields and models with GIS fields.
> > So far, field contributions seem to be largely done by adding > > attributes to the model, such as FileField's get_%s_url.
> > I think this approach may just be legacy from the old codegen days, > > and I'm not sure this is a good approach when the number of > > contributions or additional attributes is large.
> > As an example, I'd like to add GEOS attributes to GIS fields, which > > includes things like dimensions, area, intersection, boundary, convex > > hull, etc.
> > It seems to me that keeping such things as attributes of the field > > would be desirable since they're related and since doing so would > > namespace them. Namespaces are good. :)
> I agree with all of this. Once the model instance is created, the field > attributes are just Python objects anyway, so you might as well use the > full power and make your utility methods be methods on the field itself.
On May 15, 11:09 pm, "Jeremy Dunck" <jdu...@gmail.com> wrote:
> It seems to me that keeping such things as attributes of the field > would be desirable since they're related and since doing so would > namespace them. Namespaces are good. :)
Agreed. This did cross my mind some time ago actually. It makes more sense to have model_instance.file_field.url instead of model_instance.get_file_field_url().
Jeremy Dunck wrote: > In the gis work, we'd like to contribute a fair bit of additional > functionality to GIS fields and models with GIS fields.
> So far, field contributions seem to be largely done by adding > attributes to the model, such as FileField's get_%s_url.
> I think this approach may just be legacy from the old codegen days, > and I'm not sure this is a good approach when the number of > contributions or additional attributes is large.
> As an example, I'd like to add GEOS attributes to GIS fields, which > includes things like dimensions, area, intersection, boundary, convex > hull, etc.
If we store the geometry /as/ a GEOSGeometry object, then it already has all those attributes/methods on it :)
I have done a first stab at lazy-instantiated geometries, which is nearly working how I expect it to. I want to look at Gulopine's coercion work first, but I'll attach it to a ticket by the end of the weekend so you can look at it and see if thats the way we want to head.
>> In the gis work, we'd like to contribute a fair bit of additional >> functionality to GIS fields and models with GIS fields.
>> So far, field contributions seem to be largely done by adding >> attributes to the model, such as FileField's get_%s_url.
>> I think this approach may just be legacy from the old codegen days, >> and I'm not sure this is a good approach when the number of >> contributions or additional attributes is large.
>> As an example, I'd like to add GEOS attributes to GIS fields, which >> includes things like dimensions, area, intersection, boundary, convex >> hull, etc.
> If we store the geometry /as/ a GEOSGeometry object, then it already has > all those attributes/methods on it :)
> I have done a first stab at lazy-instantiated geometries, which is > nearly working how I expect it to. I want to look at Gulopine's coercion > work first, but I'll attach it to a ticket by the end of the weekend so > you can look at it and see if thats the way we want to head.
Ok, my lazy-instantiated geometries patch is in there now (#4322)
It uses a python descriptor to store an internal GEOSGeometry object, and creates a GEOS object when you assign a text/hex string to it. The GEOS object is only created on first access, before that it lives as the string retrieved from the db.
Because model.my_geometry_field now returns a GEOS object, doing area, centroid, etc is as simple (and clear) as: model.my_geometry_field.area
Not sure if it will inspire other creative solutions, but have a look.
Rob :)
-- One Track Mind Ltd. PO Box 1604, Shortland St, Auckland, New Zealand Phone +64-9-966 0433 Mobile +64-21-572 632 Web http://www.onetrackmind.co.nz
I like that approach to this rather tricky situation. With all my recent work on DurationField, I took a chance at trying to make your code into something generic enough to work for both needs, and possibly others. I'm imagining a FileField that can act as a file-type object, lazily opening the file the first time something calls obj.file_field.read() or obj.file_field.write(). Just an example of what's possible.
Anyway, I noticed that I'm having some trouble with it. First, it looks like it's not truly lazy, since the object is instantiated during __set__, rather than __get__. Since Model.__init__ uses setattr(self, field.attname, val) to assign values to the instance, the descriptor's __set__ method gets triggered as soon as the data is retrieved from the database. At least, that's one it's doing for me.
The other issue I'm having is with that of GEOSGeometrys srid attribute. I don't have GeoDjango installed, but I can see what you're trying to do with srid there, and that makes it difficult to make into a generic descriptor. It's easy to pass in an initialization function, but the trick is knowing whether an object has already been initialized within the context of that field. Currently I'm having to call it every time __get__ is called, which can't be a good thing. That's not a problem with the patch as is, of course, just a difficulty I'm having in getting it to work with other Field subclasses.
-Marty
On May 17, 1:41 am, Robert Coup <robert.c...@onetrackmind.co.nz> wrote:
> >> In the gis work, we'd like to contribute a fair bit of additional > >> functionality to GIS fields and models with GIS fields.
> >> So far, field contributions seem to be largely done by adding > >> attributes to the model, such as FileField's get_%s_url.
> >> I think this approach may just be legacy from the old codegen days, > >> and I'm not sure this is a good approach when the number of > >> contributions or additional attributes is large.
> >> As an example, I'd like to add GEOS attributes to GIS fields, which > >> includes things like dimensions, area, intersection, boundary, convex > >> hull, etc.
> > If we store the geometry /as/ a GEOSGeometry object, then it already has > > all those attributes/methods on it :)
> > I have done a first stab at lazy-instantiated geometries, which is > > nearly working how I expect it to. I want to look at Gulopine's coercion > > work first, but I'll attach it to a ticket by the end of the weekend so > > you can look at it and see if thats the way we want to head.
> Ok, my lazy-instantiated geometries patch is in there now (#4322)
> It uses a python descriptor to store an internal GEOSGeometry object, > and creates a GEOS object when you assign a text/hex string to it. The > GEOS object is only created on first access, before that it lives as the > string retrieved from the db.
> Because model.my_geometry_field now returns a GEOS object, doing area, > centroid, etc is as simple (and clear) as: model.my_geometry_field.area
> Not sure if it will inspire other creative solutions, but have a look.
> Rob :)
> -- > One Track Mind Ltd. > PO Box 1604, Shortland St, Auckland, New Zealand > Phone +64-9-966 0433 Mobile +64-21-572 632 > Webhttp://www.onetrackmind.co.nz
Marty Alchin wrote: > I like that approach to this rather tricky situation. With all my > recent work on DurationField, I took a chance at trying to make your > code into something generic enough to work for both needs, and > possibly others. I'm imagining a FileField that can act as a file-type > object, lazily opening the file the first time something calls > obj.file_field.read() or obj.file_field.write(). Just an example of > what's possible.
Yeah, I just wonder whether for files/images its worth having a python object that provides the methods, since there is other metadata besides the python file object (like its name!)
eg: my_model.attachment.file <- python file object my_model.attachment.filename <- filename as stored in db my_model.attachment.url <- url to file
> Anyway, I noticed that I'm having some trouble with it. First, it > looks like it's not truly lazy, since the object is instantiated > during __set__, rather than __get__. Since Model.__init__ uses > setattr(self, field.attname, val) to assign values to the instance, > the descriptor's __set__ method gets triggered as soon as the data is > retrieved from the database. At least, that's one it's doing for me.
hmm. yes, i think that was a side-effect of the number of ways I worked on it before finding something clean. But that was fairly dumb by all accounts :P (Fixed now, new patch on the ticket).
Side effect is now that it's always lazy - if i pass in the text "BANANA" it doesn't figure out its invalid before it's used later.
Ideal case: - it's lazy when loaded from the DB - it's not lazy the rest of the time, so if i set an invalid value i find out straight away.
> The other issue I'm having is with that of GEOSGeometrys srid > attribute. I don't have GeoDjango installed, but I can see what you're > trying to do with srid there, and that makes it difficult to make into > a generic descriptor. It's easy to pass in an initialization function, > but the trick is knowing whether an object has already been > initialized within the context of that field. Currently I'm having to > call it every time __get__ is called, which can't be a good thing. > That's not a problem with the patch as is, of course, just a > difficulty I'm having in getting it to work with other Field > subclasses.
I think in a few cases one might need to check whether things are valid, or do some extra magic.
I changed it so that if __set__ gets a valid object (eg. just instantiated from __get__) then it does the srid-init. Maybe __get__ could call out when its instantiating to a method on the field, which can do extra initialisation.
Rob :)
-- One Track Mind Ltd. PO Box 1604, Shortland St, Auckland, New Zealand Phone +64-9-966 0433 Mobile +64-21-572 632 Web http://www.onetrackmind.co.nz
On 5/17/07, Robert Coup <robert.c...@onetrackmind.co.nz> wrote:
> Yeah, I just wonder whether for files/images its worth having a python > object that provides the methods, since there is other metadata besides > the python file object (like its name!)
> eg: > my_model.attachment.file <- python file object > my_model.attachment.filename <- filename as stored in db > my_model.attachment.url <- url to file
Well yeah, I didn't mean have it be just a file object with no other details. I was meaning it'd be a custom object that would have those attributes, but also had standard file methods that lazily called file system operations as necessary. Something like this:
Such that when .read() or .write() (or whatever else) are called, it open()s the file if it's not already open. That way, you could even do things like write out PDFs to disk, using ReportLab.
p = canvas.Canvas(my_model.attachment)
I suppose adding .file wouldn't be the end of the world, but it'd be more impressive without it, anyway.
> Side effect is now that it's always lazy - if i pass in the text > "BANANA" it doesn't figure out its invalid before it's used later.
> Ideal case: > - it's lazy when loaded from the DB > - it's not lazy the rest of the time, so if i set an invalid value i > find out straight away.
Well, I don't know what all formats can be used to instantiate GEOSGeometry, but you might be able to at least put together a regex to validate that it fits the format(s). But again, I don't know all you're doing, so I can't say whether that would suit your needs well enough.
> I think in a few cases one might need to check whether things are valid, > or do some extra magic.
> I changed it so that if __set__ gets a valid object (eg. just > instantiated from __get__) then it does the srid-init. Maybe __get__ > could call out when its instantiating to a method on the field, which > can do extra initialisation.
Hrm, very interesting. So the srid magic happens every time a new object is instantiated with __get__, as well as every time a new object is set directly, but never when just retrieving a previously cached object. I'll work with that a bit and see if that will do the trick. Looks very slick though.
Marty Alchin wrote: > Well yeah, I didn't mean have it be just a file object with no other > details. I was meaning it'd be a custom object that would have those > attributes, but also had standard file methods that lazily called file > system operations as necessary. Something like this:
> Such that when .read() or .write() (or whatever else) are called, it > open()s the file if it's not already open. That way, you could even do > things like write out PDFs to disk, using ReportLab.
> p = canvas.Canvas(my_model.attachment)
> I suppose adding .file wouldn't be the end of the world, but it'd be > more impressive without it, anyway.
Nah, yours sounds better, thinking it through. Clean and nice is a good thing :)
I think eventually, eliminating all the get_XXX_something() methods for fields (perhaps with the exception of display) would be an ideal case.
>> Side effect is now that it's always lazy - if i pass in the text >> "BANANA" it doesn't figure out its invalid before it's used later.
>> Ideal case: >> - it's lazy when loaded from the DB >> - it's not lazy the rest of the time, so if i set an invalid value i >> find out straight away.
> Well, I don't know what all formats can be used to instantiate > GEOSGeometry, but you might be able to at least put together a regex > to validate that it fits the format(s). But again, I don't know all > you're doing, so I can't say whether that would suit your needs well > enough.
To know whether its valid in WKB format I think its probably easier to just try and parse it. Certainly WKT is regex'able to some extent.
What about having a _first_set attribute on the proxy object that only allows laziness for the first time the value is set (ie. when it is set from the DB row)? Then any later set()s get live validation and coercion to geos objects.
The key goal is to avoid instantiating 300 geos objects when 300 rows are retrieved from the database and the geometry field is never even looked at. If a view is interacting with a geometry field, then its ok to take the performance hit in return for all the benefits you get of using a geos object.
On 5/18/07, Robert Coup <robert.c...@onetrackmind.co.nz> wrote:
> I think eventually, eliminating all the get_XXX_something() methods for > fields (perhaps with the exception of display) would be an ideal case.
Yeah, get_FIELD_display would be fine, since that's not really specific to any field type.
> What about having a _first_set attribute on the proxy object that only > allows laziness for the first time the value is set (ie. when it is set > from the DB row)? Then any later set()s get live validation and coercion > to geos objects.
Well, I did try something like that at first, but the problem is that the proxy object is instantiated once, when assigned to the class. So ultimately, all instances of the Model use the same instance of the Proxy, which means that any attribute we set for one instance's field would be set for all instances of that model. I haven't figured out a way around that one yet.
On 5/18/07, Marty Alchin <gulop...@gmail.com> wrote:
> Well, I did try something like that at first, but the problem is that > the proxy object is instantiated once, when assigned to the class. So > ultimately, all instances of the Model use the same instance of the > Proxy, which means that any attribute we set for one instance's field > would be set for all instances of that model. I haven't figured out a > way around that one yet.
You want Field.get_cache_name. Check out fields.related.ReverseSingleRelatedObjectDescriptor for example usage.
Basically, each field gets a bag to store things in; it's owned by the field. In ReverseSingleRelatedObjectDescriptor, it's just a related model instance reference, but it could be a dict or whatever else is useful.
On 5/18/07, Robert Coup <robert.c...@onetrackmind.co.nz> wrote:
> What about having a _first_set attribute on the proxy object that only > allows laziness for the first time the value is set (ie. when it is set > from the DB row)? Then any later set()s get live validation and coercion > to geos objects.
I submitted a new patch on #3982 that does just this, incorporating a bit of your work on lazy instantiation, and making it more generic. Basically, fields would have a simple way to create and manage lazy attributes. I've already adapted DurationField to use this new technique (#2443), and I've started work on getting FileField and ImageField to use it as well. It'll be a bit of time before I can say whether or not those will work properly, but it's looking promising.
The generic system I put in on #3982 allows fields to specify a class that would be instantiated, as well as a separate function that would handle the instantiation, and a function to be called after the object is instantiated. The separate "creation" function is necessary in order to have a helper funciton perform some extra trickery, but still return an instance of the provided class. DurationField uses this helper function to pass the value as seconds to timedelta, since seconds are the second positional argument, not the first.
For your GeometryField, both the creation function and the init function would be specified, since you need special handling for each case. From what I read in your most recent patch, here's the basics of how it could work for you.
def set_srid(self, geom): if geom.srid is None and self.srid is not None: geom.set_srid(self.srid)
As you can see, it would greatly simplify the code necessary to implement lazy instantiation of your field attributes.
I'm not tied to the self.lazy_attribute(cls, ...) syntax. It would be just as easy to use AttributeProxy(self, cls, ...), it would just mean that modules that will be using it would have to import AttributeProxy as well as Field. Let the discussion ensue.