I had submitted a patch recently (#3982) that forces Django to use the to_python method of individual fields when retrieving objects from a database. I'm not being impatient, but I'd like to get some additional feedback on this, in case I'm going about it all wrong, or if I'm missing something important.
While I haven't traced through all the backend database driver code, Django's lack of to_python references leads me to believe that type coercion is handled entirely through the database driver. This works well for most data types, as there's a clear conversion between SQL data types and those of Python.
However, the problem comes in patches like #2443, where a new field type uses SQL and Python data types that don't have an appropriate implicit correlation. In that example, the field uses FloatField for its get_internal_type (for compatibility with all backends), but returns a native Python type of timedelta. Current code results in the field being coerced solely by its SQL data type, even though it defines its own to_python which would create a proper object.
My main questions are these:
* Am I right in thinking that Django's to_python method should be used for type coercion, rather than relying solely on backend drivers? * Are there other situations than database access where this type of coercion might not be happening correctly? * Are there other code locations that would need to be updated to use this functionality? * Are there deficiencies in the to_python methods of existing fields that are preventing us from using them?
As a temporary solution, the validate method of Django models does use to_python as part of its validation, and thus coerces the proper field types. However, it seems silly to require developers to use object.validate() immediately after loading an object from the database (before any modifications are made), just to get proper data types.
> * Am I right in thinking that Django's to_python method should be used > for type coercion, rather than relying solely on backend drivers?
Historically, to_python existed for the manipulator framework to convert text-based field input into a Python representation suitable for use on a backend.
However, manipulators are going away, and conversion of this type has been deferred to the field itself. The serializers have found a new use for to_python (converting serialized strings into Python values), but the domain remains the same - converting strings into data values.
Although some of the to_python methods are designed to convert non-str input into valid Python output, I don't believe it has ever been the intention that to_python be used for coercion. AFAIK, the database backends _should_ be providing data in a valid and useful Python format.
Coercion of the type you describe is not required for any field currently supported. I would be hesitant to include coercion as a core part of the framework unless there is a generic need to coerce database values into Python values in some way other than that performed by the database backend. Can you provide any use cases other than your DurationField that require this sort of functionality?
> * Are there other situations than database access where this type of > coercion might not be happening correctly?
The MySQL backend already contains code to coerce some date fields (and on mysql_old, boolean fields, too). However, this is handled as a data-correction interface on the MySQL backend, rather than a field-level fix.
On Apr 17, 9:03 pm, "Russell Keith-Magee" <freakboy3...@gmail.com> wrote:
> Coercion of the type you describe is not required for any field > currently supported. I would be hesitant to include coercion as a core > part of the framework unless there is a generic need to coerce > database values into Python values in some way other than that > performed by the database backend. Can you provide any use cases other > than your DurationField that require this sort of functionality?
At the moment, DurationField is the only thing I know of that isn't well-served by existing functionality, but I think the point is that DurationField proves that there are potentially new field types that don't line up 1-to-1 with database column names. I don't know what other field types there might be in the future, but it seems like a stretch to think that it's unnecessary just because I can't anticipate other specific uses. That may just be me, though.
I wouldn't be opposed to having this functionality somewhere outside the core, but I can't imagine how that would be possible. I don't see any other way to trigger field-specific code, except perhaps to have DurationField's contribute_to_class create a wrapper around the model's __init__, which would then coerce the fields after creating the object. I can't imagine you would consider that a more elegant solution than simply using to_python to do the same.
On the other hand, some completely separate framework could be used to do coercion, but that framework would still have to be called within the core, and would have ultimately the same functionality as to_python anyway. Even if to_python wasn't originally intended for this, it seems perfectly suited for this task.
> The MySQL backend already contains code to coerce some date fields > (and on mysql_old, boolean fields, too). However, this is handled as a > data-correction interface on the MySQL backend, rather than a > field-level fix.
And that works just fine in all situations where the expected Python type is easily identifiable from the the database column type. Unfortunately, DurationField proves this isn't always the case, and it seems strange to assume that nobody will ever want to do interesting things.
On Apr 17, 9:03 pm, "Russell Keith-Magee" <freakboy3...@gmail.com> wrote:
> Coercion of the type you describe is not required for any field > currently supported. I would be hesitant to include coercion as a core > part of the framework unless there is a generic need to coerce > database values into Python values in some way other than that > performed by the database backend. Can you provide any use cases other > than your DurationField that require this sort of functionality?
I did think of one other potential use case that's considerably less general, but still illustrates the potential use for using to_python. For one of my projects, I created a new field type I called SubclassField. Basically, it provided a way to store a reference to a particular class inside a model, given that the class was a subclass of some given parent class.
The real key here is that the available choices for this field are all subclasses of this base class. It essentially allows me to define a single field that can automatically update its available choices when I add a new subclass to the project. I know this is somewhat vague, but it's a highly specialized use, and it's a bit difficult to understand without knowing the full specifications of the project.
I'll try to simplify it a little bit to provide an example. The short story is that users are allowed to have an inventory of items, in a similar manner to how video game RPGs keep track of a player's possessions. In addition, each item has a set of attributes that are specific to that instance of the item. The real trick is that items consist of Python code, not arbitrary data, so the item classes have to exist in Python, not as a separate database table somewhere.
class Item(object): __metaclass__ = ItemBase # Items have declarative syntax like Django models
class Inventory(models.Model): user = models.ForeignKey(User, related_name='inventory') item_class = SubclassField(Item)
class ItemSettings(models.Model): item = models.ForeignKey(Inventory, related_name='settings') name = models.CharField(maxlength=255) value = models.CharField(maxlength=255)
class Sword(Item): pass # This would actually have lots of code
class Potion(Item): pass # This would actually have lots of code
This all works well in part, using the ItemBase metaclass to keep track of new Item subclasses, and adding them to the choices for the SubclassField. And it does work quite well for displaying a dropdown list in forms and saving objects, even directly in the admin itself. Unfortunately, when retrieving a record from the database, I'd like to be able to have Inventory.item_class be an actual class object that can be instantiated using the parameters found in the related ItemSettings model.
user = User.objects.get(username='Gulopine') for item in user.inventory: item = item.item_class(**dict((s.name, s.value) for s in item.settings))
Of course, in this example, I'm providing all this functionality in the __init__ of Inventory, so I'm able to retrieve the class, instantiate it with the available settings, and have item_class actually be a fully instantiated item instead, immediately upon creation of the class (which would also occur upon retrieval from the database). However, that approach is only feasible when I have control over a model's __init__ method, which isn't going to be the case with fields like DurationField (and perhaps others) that are meant for more general use.
I know that was a very contrived example that really doesn't apply outside of my own project, but it does illustrate another potential use of to_python to coerce fields, since it'd be more logical in my code to have item_class be made into a proper class as part of the field's code, rather than in the __init__ of the model. My to_python patch would allow Inventory's __init__ to have a fully formed Python class from the moment it begins executing.
I hope at least some of this made some sense to somebody. If not, I apologize.
A little more research shows that I could also accomplish what I'm looking for by tying into the pre_init or post_init signal, but then I'd need a customized function for each field that needs processing, on each model. This could be done, of course, and it would certainly keep this coercion out of the core, but I can't imagine it would be considered elegant (or even preferable) in any way. Someone prove me wrong.
I guess I'll sum up my thoughts on this another way. I don't see to_python as explicitly converting strings to other data types, but rather as converting a storage data type to a usable data type. Of course, that's not the way it is, but rather the way I think it should be.
> Coercion of the type you describe is not required for any field > currently supported. I would be hesitant to include coercion as a core > part of the framework unless there is a generic need to coerce > database values into Python values in some way other than that > performed by the database backend.
The GIS branch (GeoDjango) is interested in type coercion. Specifically, PostGIS returns geometries as hex strings to the client -- it would be preferable to have this come into the model as either a different type of string like WKT ("POLYGON ( .. )"), KML ("<Polygon>...</Polygon>") or as a GEOSGeometry() object. For example, say we have a 'ZipCode' model with a PolygonField ('poly'):
As it stands right now, to get the centroid of a polygon you'd have to do the following:
z = ZipCode.objects.get(code=77002) centroid = z.get_poly_geos().centroid
Instead, with type coercion, it could be:
z = ZipCode.objects.get(code=77002) centroid = z.poly.centroid
The latter seems more intuitive to me. Further, it would eliminate the need for the 'mixin' class completely, since those routines are for casting the hex as a geometry object and pulling out the needed value (e.g. get_GEOM_wkt, get_GEOM_area, etc.). While getting the KML or WKT can be obtained from PostGIS through an extra SELECT parameter, getting GEOSGeometry() python objects in the QuerySet would require some sort of type coercion.
In summary, use cases outside of Gulopine's problem would benefit from a method to perform type coercion. I'm open to suggestions if there's a better way to do this...
> On Apr 17, 9:03 pm, "Russell Keith-Magee" <freakboy3...@gmail.com> > wrote: > I don't know what > other field types there might be in the future, but it seems like a > stretch to think that it's unnecessary just because I can't anticipate > other specific uses. That may just be me, though.
The reason is YAGNI (You Ain't Gonna Need It). Unless there is actually a need to do something, you don't rearchitect your system to allow flexibility that _might_ be useful.
I just need to be convinced that it's actually a problem.
> The GIS branch (GeoDjango) is interested in type coercion. > Specifically, PostGIS returns geometries as hex strings to the client > -- it would be preferable to have this come into the model as either a > different type of string like WKT ("POLYGON ( .. )"), KML > ("<Polygon>...</Polygon>") or as a GEOSGeometry() object. For > example, say we have a 'ZipCode' model with a PolygonField ('poly'):
Ok. That's the sort of use case I was looking for.
I haven't looked at the GIS branch, and my GIS knowledge from university is getting _very_ rusty, so you'll need to help me out a little.
How do you handle data input of GIS types? What format is used during assignment?
Are you using to_python at all in the GIS branch?
To date, the database backend maintains the philosophy that whatever you put in (as a python type) is what you get back. If you put a boolean in, you get a boolean out; you put a DateTime in, you get a DateTime out. Is this still appropriate in the GIS setting?
Is there a use case for being able to get back data of more than one type, or is it possible to get a WKT or KML representation from a GEOSGeometry object?
Russell Keith-Magee wrote: > On 4/19/07, jbronn <jbr...@gmail.com> wrote:
>> The GIS branch (GeoDjango) is interested in type coercion. >> Specifically, PostGIS returns geometries as hex strings to the client >> -- it would be preferable to have this come into the model as either a >> different type of string like WKT ("POLYGON ( .. )"), KML >> ("<Polygon>...</Polygon>") or as a GEOSGeometry() object. For >> example, say we have a 'ZipCode' model with a PolygonField ('poly'):
> Ok. That's the sort of use case I was looking for.
> I haven't looked at the GIS branch, and my GIS knowledge from > university is getting _very_ rusty, so you'll need to help me out a > little.
> How do you handle data input of GIS types? What format is used during > assignment?
It'd be nice to be able to assign a GEOSGeometry object, well known text, and well known binary formats (and maybe KML/GML). The database natively takes well-known-binary (WKB) as a hex string and well-known-text (WKT) formats in assignment to a geometry field.
> To date, the database backend maintains the philosophy that whatever > you put in (as a python type) is what you get back. If you put a > boolean in, you get a boolean out; you put a DateTime in, you get a > DateTime out. Is this still appropriate in the GIS setting?
My idea for the internal representation of a model field would be a lazily-instantiated GEOSGeometry object, which would fill Justin's requirements. That would also allow conversion to WKT/GML/KML/etc via the GEOS library. And if you assigned WKT to a field it'd convert that to a GEOSGeometry object.
However, I don't think we'd want to instantiate a GEOSGeometry object via to_python for every model instance on the offchance the geometry field(s) are accessed, but I haven't done any benchmarking. Justin?
> How do you handle data input of GIS types? What format is used during > assignment?
During assignment, PostGIS returns in a format called HEXEWKB (hereinafter "HEX"), one of its 'canonical forms'. Essentially it's a hexadecimal string of the binary format (WKB, an OGC standard). For input you can input in either HEX or WKT (another OGC standard) since PostGIS accepts both.
> Are you using to_python at all in the GIS branch?
Not yet, like you said it's used with the forms and serialization, both of which I haven't gotten to yet. I tried it out before thinking it did coercion and discovered it had no effect as to what was returned from the database.
> Is there a use case for being able to get back data of more than one? > type, or is it possible to get a WKT or KML representation from a > GEOSGeometry object? >From GEOSGeometry it is simple to retrieve WKT and HEX, e.g., 'g.wkt'
or 'g.hex'. However, KML is implemented by PostGIS and can only be retrieved if you do "SELECT AsKML(the_geom)" on the database end. Further, GML (ancestor to KML, another OGC standard) and SVG can also be retrieved with the AsGML() and AsSVG(), respectively.
> My idea for the internal representation of a model field would be a > lazily-instantiated GEOSGeometry object, which would fill Justin's > requirements. That would also allow conversion to WKT/GML/KML/etc via > the GEOS library. And if you assigned WKT to a field it'd convert that > to a GEOSGeometry object.
+1 for lazy-instantiated model fields
> I haven't done any benchmarking. Justin?
Nothing scientific here, but I ran through the 150 state house legislature districts of Texas (4MB in SHP file) in the following manner
geoms = [h.get_poly_geos() for h in House.objects.all()]
That completed in 1.38 seconds. The database query itself took .21, making the overhead to be around 1.17s, significantly more than the database. But not too bad for processing a set of 'real-world' geometries covering a very large area. Definitely fast enough for lazy-instantiation.
Justin Bronn wrote: > >From GEOSGeometry it is simple to retrieve WKT and HEX, e.g., 'g.wkt' > or 'g.hex'. However, KML is implemented by PostGIS and can only be > retrieved if you do "SELECT AsKML(the_geom)" on the database end. > Further, GML (ancestor to KML, another OGC standard) and SVG can also > be retrieved with the AsGML() and AsSVG(), respectively.
Maybe we need to provide some query helpers around those cases? Another use case I can think of is simplifying geometries.
> Nothing scientific here, but I ran through the 150 state house > legislature districts of Texas (4MB in SHP file) in the following > manner
> geoms = [h.get_poly_geos() for h in House.objects.all()]
> That completed in 1.38 seconds. The database query itself took .21, > making the overhead to be around 1.17s, significantly more than the > database. But not too bad for processing a set of 'real-world' > geometries covering a very large area. Definitely fast enough for > lazy-instantiation.
But far too slow for always-on GEOSGeometry instantiation.
For non-GIS people that sounds really slow, but remember that often districts are based off natural features and have tens of thousands of points defining their boundaries. Coastlines are very good at upping the point count!
Rob :)
-- One Track Mind Ltd. PO Box 1604, Shortland St, Auckland, New Zealand Phone +64-9-966 0433 Mobile +64-21-572 632 Web http://www.onetrackmind.co.nz
I think this discussion could use a bit of revival, as it seems obvious to me that there are groups that could benefit from some sort of field-level coercion. In addition to my DurationField and the various possibilities available with GeoDjango, there are other things that would be useful in various contexts.
Consider a ColorField that could store its value as an INTEGER for efficiency, but allows Python to access it as a 3-tuple of (red, green, blue). This could be done for both assignment and retrieval, with the field dealing with everything in between using to_python and get_db_prep_save. In fact, something like this is almost possible already, with get_db_prep_save handling transmission into the database, and get_db_prep_lookup dealing with lookup types. In theory (with a good bit of work, I expect) it should be possible to search on such a color field with something like Model.objects.filter(color__red_gt=200) to get objects with a lot of red in their color. That's probably a bad example given the SQL that would be necessary to look up on part of an INTEGER while ignoring the rest of it, but it shows what kind of out-of-the-box thinking would be possible if the one missing component (to_python) were in place to handle coercion into a proper Python type.
Another example would be a variant of XMLField that gives its contents to Python as a minidom.Document object.
Now, I'll gladly admit that I'm throwing out possibilities that I, myself, may not ever need, and I also know that there are ways to work around this behavior outside the core of Django (like the mixin examples provided for GeoDjango), and that Django is about solving problems people actually encounter in the real world. However, for DurationField at least, I am indeed experiencing this problem in the real world. Maybe I'm just the first to consider storing audio information in Django (though I doubt it), but it's a real situation with real needs. YAGNI certainly doesn't apply here, as I definitely am Gonna Need It.
I would also argue that while Django's focus on real-world problem- solving is sound and just, it's getting a lot of criticism from people who think it's not powerful enough for complex applications. While in general I disagree with that criticism, things like this do demonstrate its refusal to deal with what might seem like a very simple, straightforward feature. I find it very odd that Django has the capability to add custom field types, while (by design, apparently) restricting their usefulness.
-Gul
On Apr 18, 8:26 pm, "Russell Keith-Magee" <freakboy3...@gmail.com> wrote:
> > The GIS branch (GeoDjango) is interested in type coercion. > > Specifically, PostGIS returns geometries as hex strings to the client > > -- it would be preferable to have this come into the model as either a > > different type of string like WKT ("POLYGON ( .. )"), KML > > ("<Polygon>...</Polygon>") or as a GEOSGeometry() object. For > > example, say we have a 'ZipCode' model with a PolygonField ('poly'):
> Ok. That's the sort of use case I was looking for.
> I haven't looked at the GIS branch, and my GIS knowledge from > university is getting _very_ rusty, so you'll need to help me out a > little.
> How do you handle data input of GIS types? What format is used during > assignment?
> Are you using to_python at all in the GIS branch?
> To date, the database backend maintains the philosophy that whatever > you put in (as a python type) is what you get back. If you put a > boolean in, you get a boolean out; you put a DateTime in, you get a > DateTime out. Is this still appropriate in the GIS setting?
> Is there a use case for being able to get back data of more than one > type, or is it possible to get a WKT or KML representation from a > GEOSGeometry object?
On Thu, 2007-05-10 at 08:14 -0700, Gulopine wrote: > I think this discussion could use a bit of revival, as it seems > obvious to me that there are groups that could benefit from some sort > of field-level coercion. In addition to my DurationField and the > various possibilities available with GeoDjango, there are other things > that would be useful in various contexts.
> Consider a ColorField that could store its value as an INTEGER for > efficiency, but allows Python to access it as a 3-tuple of (red, > green, blue). This could be done for both assignment and retrieval, > with the field dealing with everything in between using to_python and > get_db_prep_save. In fact, something like this is almost possible > already, with get_db_prep_save handling transmission into the > database, and get_db_prep_lookup dealing with lookup types. In theory > (with a good bit of work, I expect) it should be possible to search on > such a color field with something like > Model.objects.filter(color__red_gt=200) to get objects with a lot of > red in their color. That's probably a bad example given the SQL that > would be necessary to look up on part of an INTEGER while ignoring the > rest of it, but it shows what kind of out-of-the-box thinking would be > possible if the one missing component (to_python) were in place to > handle coercion into a proper Python type.
> Another example would be a variant of XMLField that gives its contents > to Python as a minidom.Document object.
Something your explanation hasn't addressed so far is actually creating the right Python types in the models and this surprises me a little. If you're doing all this thinking about how to map from database fields to special types, surely the Model.__init__ changes need to come into the picture so that accessing MyModel.foo returns the right Python type, too.
Luckily, we've thought about this before. Most of the core changes needed to add extra field types, particularly those that subclass existing database types, is essentially. It's about three lines of code (there was a patch published on this list or django-users last year, from memory). You basically just have to change the place in Model.__init__() that assigns the database-retrieved value to a Python instance attribute with a call to an init()-like method that acts like you are describing for to_python() -- if such a function exists.
Normally I would point you to the exact thread, but I can't seem to search it out. In any case, the changes are really trivial. I think there was some reason you could just use the __init__ method on the class as the "converter to Python instance", but I may be misremembering
The only vaguely tricky part we didn't solve last time was getting this to work with serialisation in a reasonably transparent fashion.
The idea was to push pretty much all the work off to the sub-classes. So the sub-classes special method (be it called to_python() or whatever) would be given the database value and return an instance that proxies for that value.
In short, I think your assessment of the situation as far as creating new data types that map to database fields is broadly correct.
I haven't looked at the patch you mentioned at the start of this thread in detail and won't have time to do so for quite a while yet. I am completely buried in Django work in my spare moments and have slightly less than zero free time. At a glance, I think the patch needs some thinking about whether it's in the right place, since it's clearly doing a bit more work each time than the current implementation and function calls and pop() calls are not free. Not saying it's wrong, per se, just that it's not a no-brainer, either.
I'll be the first to admit that I don't know all the internals of Django, and I especially don't know why certain decisions were made in the past. But I appreciate knowing that you understand my reasoning, and from some your rephrasings, it's clear to me that you really do. I'm also glad to hear I'm at least somewhat on the right track.
As for Model.__init__, the third question in my original post was whether there were other places that would need updating, so I'm not surprised to realize that I missed something important. And yes, the more I look into this topic, the more I see many inherent flaws with the patches I've submitted so far.
As for serialization, I haven't worked much with that, but as I work more on DurationField, I'll look more into it to see if I can come up with something that'll work for that as well. I guess I still have to get used to the notion of working with things I don't personally have a need for, if I want to start suggesting changes that will impact larger groups of people.
Given the resistance I've had so far with the coercion patch, I've starting working on a signal-based approach instead, to see if I can possibly get DurationField working without any patches to other bits of Django's core. Just to make sure I test everything I should be, I see the following situations that need to be tested (please let me know if there are more situations I should be aware of):
* Instantiating an object directly in Python: obj = MyModel(...) * Storing the object in the database: obj.save() * Retrieving the object from the database: MyMymodel.objects.get(...) * Serializing the objects: data = serializers.serialize('xml', Mymodel.objects.all()) * Deserializing the objects: objs = serializers.serialize('xml', data)
Even as I sit here going back and forth between this post and my DurationField code, I find that a signal-based approach does in fact satisfy all those situations, without the coercion patch. At least, it does for the XML serializer. The JSON one breaks, but I'm using old Django code here at work, so I'll test it again when I get home before I submit another patch.
Anyway, I guess I'll just finish by apologizing for being quite as heated as I was when I sent my last, and now that I found that the post_init signal seems to do a good job (after all, it pretty much extends Model.__init__, just like you mentioned), I'll be much less aggressive on the coercion front. Once I get it tested with Django SVN at home, I'll put it up so the GeoDjango guys can look at my technique and see if it would work for their field types as well.
-Gul
On May 10, 10:25 pm, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> On Thu, 2007-05-10 at 08:14 -0700, Gulopine wrote: > > I think this discussion could use a bit of revival, as it seems > > obvious to me that there are groups that could benefit from some sort > > of field-level coercion. In addition to my DurationField and the > > various possibilities available with GeoDjango, there are other things > > that would be useful in various contexts.
> > Consider a ColorField that could store its value as an INTEGER for > > efficiency, but allows Python to access it as a 3-tuple of (red, > > green, blue). This could be done for both assignment and retrieval, > > with the field dealing with everything in between using to_python and > > get_db_prep_save. In fact, something like this is almost possible > > already, with get_db_prep_save handling transmission into the > > database, and get_db_prep_lookup dealing with lookup types. In theory > > (with a good bit of work, I expect) it should be possible to search on > > such a color field with something like > > Model.objects.filter(color__red_gt=200) to get objects with a lot of > > red in their color. That's probably a bad example given the SQL that > > would be necessary to look up on part of an INTEGER while ignoring the > > rest of it, but it shows what kind of out-of-the-box thinking would be > > possible if the one missing component (to_python) were in place to > > handle coercion into a proper Python type.
> > Another example would be a variant of XMLField that gives its contents > > to Python as a minidom.Document object.
> Something your explanation hasn't addressed so far is actually creating > the right Python types in the models and this surprises me a little. If > you're doing all this thinking about how to map from database fields to > special types, surely the Model.__init__ changes need to come into the > picture so that accessing MyModel.foo returns the right Python type, > too.
> Luckily, we've thought about this before. Most of the core changes > needed to add extra field types, particularly those that subclass > existing database types, is essentially. It's about three lines of code > (there was a patch published on this list or django-users last year, > from memory). You basically just have to change the place in > Model.__init__() that assigns the database-retrieved value to a Python > instance attribute with a call to an init()-like method that acts like > you are describing for to_python() -- if such a function exists.
> Normally I would point you to the exact thread, but I can't seem to > search it out. In any case, the changes are really trivial. I think > there was some reason you could just use the __init__ method on the > class as the "converter to Python instance", but I may be misremembering
> The only vaguely tricky part we didn't solve last time was getting this > to work with serialisation in a reasonably transparent fashion.
> The idea was to push pretty much all the work off to the sub-classes. So > the sub-classes special method (be it called to_python() or whatever) > would be given the database value and return an instance that proxies > for that value.
> In short, I think your assessment of the situation as far as creating > new data types that map to database fields is broadly correct.
> I haven't looked at the patch you mentioned at the start of this thread > in detail and won't have time to do so for quite a while yet. I am > completely buried in Django work in my spare moments and have slightly > less than zero free time. At a glance, I think the patch needs some > thinking about whether it's in the right place, since it's clearly doing > a bit more work each time than the current implementation and function > calls and pop() calls are not free. Not saying it's wrong, per se, just > that it's not a no-brainer, either.
On Fri, 2007-05-11 at 06:46 -0700, Gulopine wrote:
[...]
> Given the resistance I've had so far with the coercion patch, I've > starting working on a signal-based approach instead, to see if I can > possibly get DurationField working without any patches to other bits > of Django's core.
My understanding of this whole thread is that the initial resistance was because we really do want to understand the use-cases before making any architecture change. I was thinking along very similar lines to Russell (we're going to have surgery to have the mind meld broken). If I'd answered before he did, the conversation would have gone along very similar lines. As the use cases came out, it started to sound reasonable. Then there's the question of whether the implementation is the right approach. As Russell mentioned (and I think I hinted at it, too), we don't want to bog down the common cases just to benefit the corners. So you're not home yet, but it's an idea worth playing around with some more.
For me, it's maybe slightly easier to think about the impact, because I've thought about how to add new Field sub-classes before. The sort of "standard example" I keep in my head is an old one of Jacob Kaplan-Moss's: what if you want to store a Sudoku game in the database? You can store a string of 81 characters, but that's not very useful when you're working in the admin interface or wanting to use it in a Python program. It's also not a field we're ever going to add to core. So if we can make that use-case possible without negatively impacting other code, I'm pretty sure most other cases (like your DurationField) become possible, too.
> Just to make sure I test everything I should be, I > see the following situations that need to be tested (please let me > know if there are more situations I should be aware of):
> * Instantiating an object directly in Python: obj = MyModel(...) > * Storing the object in the database: obj.save() > * Retrieving the object from the database: MyMymodel.objects.get(...) > * Serializing the objects: data = serializers.serialize('xml', > Mymodel.objects.all()) > * Deserializing the objects: objs = serializers.serialize('xml', data)
To the list, I would add: using the object with newforms and whether or not it will be possible to use the object in the admin interface (and I would concentrate on newforms-admin here, since we aren't adding new functionality to the existing admin).
> Even as I sit here going back and forth between this post and my > DurationField code, I find that a signal-based approach does in fact > satisfy all those situations, without the coercion patch. At least, it > does for the XML serializer. The JSON one breaks, but I'm using old > Django code here at work, so I'll test it again when I get home before > I submit another patch.
> Anyway, I guess I'll just finish by apologizing for being quite as > heated as I was when I sent my last, and now that I found that the > post_init signal seems to do a good job (after all, it pretty much > extends Model.__init__, just like you mentioned), I'll be much less > aggressive on the coercion front. Once I get it tested with Django SVN > at home, I'll put it up so the GeoDjango guys can look at my technique > and see if it would work for their field types as well.
My usual advice in these sorts of situations is "take a deep breath. Repeat." Your tone wasn't horribly inappropriate or anything (at least, not to my eyes). You are working on one patch. A few of us are trying to juggle multiple threads of development as well as make sure every reasonable request on the mailing list gets a fair hearing. So things may move a little slower than you like sometimes.
However, it definitey wasn't crazy to bring up the "am I thinking along the right lines here?" thread initially, because, as it turned out, there were some questions that had to be asked and answered. That is what code- and design-review is all about. If you can't find flaws in your own design, you haven't thought about it hard enough (and we'll help you), but if you can defend a design or implementation against all the questions and improve it when necessary, it makes life much easier for everybody.
Seriously, I doubt anybody in the thread was really too worried by your posts. So long as you realise the response was almost a standard process that every proposed design feature goes through it -- either explicitly on the mailing list or in our heads as we design things.
On May 11, 10:06 am, Malcolm Tredinnick <malc...@pointy-stick.com> wrote:
> My understanding of this whole thread is that the initial resistance was > because we really do want to understand the use-cases before making any > architecture change. [...] So if we > can make that use-case possible without negatively impacting other code, > I'm pretty sure most other cases (like your DurationField) become > possible, too.
And that's really all I meant when I mentioned the resistance. I wasn't offended by the logic or anything, it just spurred me to take a different approach what would only add overhead in the cases that use it. By using a post_init hook in the field's contribute_to_class, it gets processed for all models that need it, but none of the ones that don't. Again, I'll put up a new DurationField patch for review when I get home this afternoon.
One option to avoid field authors from having to use signals themselves would be to add another special method name for these types of coercion functions, and have ModelBase register those with post_init if it finds any during the class creation process. It ends up only being a a couple extra lines in the field (overriding contribute_to_class and adding the signal hook), but if a particular name can be used to identify these hooks, authors wouldn't have to worry about it, and the overhead would only be in class creation and the models that actually use those fields. I don't know of signals are the best approach yet, but I do agree that it's best to keep that out of QuerySet iterators where I have it in the current patch.
If I had to pick a name for a special method for that purpose, I suppose I'd probably go with "coerce" as it seems most appropriate for what it's doing, and it doesn't have any current meaning to Django. I'll look into how much code it would take to get something like that working, as a proof-of-concept anyway.
> To the list, I would add: using the object with newforms and whether or > not it will be possible to use the object in the admin interface (and I > would concentrate on newforms-admin here, since we aren't adding new > functionality to the existing admin).
I haven't worked with newforms-admin yet, but I'll make sure it satisfies those requirements as well. Once we do get things finalized for this, I'll try write up a wiki article describing the process as well as these tests, to make sure other field authors (which there are bound to be, once it's easier to do) go through the appropriate cases.
> My usual advice in these sorts of situations is "take a deep breath. > Repeat." Your tone wasn't horribly inappropriate or anything (at least, > not to my eyes). You are working on one patch. A few of us are trying to > juggle multiple threads of development as well as make sure every > reasonable request on the mailing list gets a fair hearing. So things > may move a little slower than you like sometimes.
Well, I'm not really concerned with speed, to be honest. I fully understand that you guys have a lot to deal with (and thank you for doing it!), and I'm really just trying to do my part to contribute as well. It can be tough to juggle my desire to help relieve you guys of some work, while still trying to avoid adding the extra work explaining these types of things to me. I guess I just took some of Russell's words to mean that there wasn't any interest in even looking for more use cases, so thanks for setting me straight.