Type coercion in Django

154 views
Skip to first unread message

Gulopine

unread,
Apr 17, 2007, 10:23:30 AM4/17/07
to Django developers
Hi all,

I had submitted a patch recently (#3982) that forces Django to use the
to_python method of individual fields when retrieving objects from a
database. I'm not being impatient, but I'd like to get some additional
feedback on this, in case I'm going about it all wrong, or if I'm
missing something important.

While I haven't traced through all the backend database driver code,
Django's lack of to_python references leads me to believe that type
coercion is handled entirely through the database driver. This works
well for most data types, as there's a clear conversion between SQL
data types and those of Python.

However, the problem comes in patches like #2443, where a new field
type uses SQL and Python data types that don't have an appropriate
implicit correlation. In that example, the field uses FloatField for
its get_internal_type (for compatibility with all backends), but
returns a native Python type of timedelta. Current code results in the
field being coerced solely by its SQL data type, even though it
defines its own to_python which would create a proper object.

My main questions are these:

* Am I right in thinking that Django's to_python method should be used
for type coercion, rather than relying solely on backend drivers?
* Are there other situations than database access where this type of
coercion might not be happening correctly?
* Are there other code locations that would need to be updated to use
this functionality?
* Are there deficiencies in the to_python methods of existing fields
that are preventing us from using them?

As a temporary solution, the validate method of Django models does use
to_python as part of its validation, and thus coerces the proper field
types. However, it seems silly to require developers to use
object.validate() immediately after loading an object from the
database (before any modifications are made), just to get proper data
types.

Thoughts?

-Gul

Russell Keith-Magee

unread,
Apr 17, 2007, 9:03:23 PM4/17/07
to django-d...@googlegroups.com
On 4/17/07, Gulopine <gulo...@gmail.com> wrote:

> My main questions are these:
>
> * Am I right in thinking that Django's to_python method should be used
> for type coercion, rather than relying solely on backend drivers?

Historically, to_python existed for the manipulator framework to
convert text-based field input into a Python representation suitable
for use on a backend.

However, manipulators are going away, and conversion of this type has
been deferred to the field itself. The serializers have found a new
use for to_python (converting serialized strings into Python values),
but the domain remains the same - converting strings into data values.

Although some of the to_python methods are designed to convert non-str
input into valid Python output, I don't believe it has ever been the
intention that to_python be used for coercion. AFAIK, the database
backends _should_ be providing data in a valid and useful Python
format.

Coercion of the type you describe is not required for any field
currently supported. I would be hesitant to include coercion as a core
part of the framework unless there is a generic need to coerce
database values into Python values in some way other than that
performed by the database backend. Can you provide any use cases other
than your DurationField that require this sort of functionality?

> * Are there other situations than database access where this type of
> coercion might not be happening correctly?

The MySQL backend already contains code to coerce some date fields
(and on mysql_old, boolean fields, too). However, this is handled as a
data-correction interface on the MySQL backend, rather than a
field-level fix.

Yours,
Russ Magee %-)

Gulopine

unread,
Apr 18, 2007, 7:15:07 AM4/18/07
to Django developers
On Apr 17, 9:03 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

> Coercion of the type you describe is not required for any field
> currently supported. I would be hesitant to include coercion as a core
> part of the framework unless there is a generic need to coerce
> database values into Python values in some way other than that
> performed by the database backend. Can you provide any use cases other
> than your DurationField that require this sort of functionality?

At the moment, DurationField is the only thing I know of that isn't
well-served by existing functionality, but I think the point is that
DurationField proves that there are potentially new field types that
don't line up 1-to-1 with database column names. I don't know what
other field types there might be in the future, but it seems like a
stretch to think that it's unnecessary just because I can't anticipate
other specific uses. That may just be me, though.

I wouldn't be opposed to having this functionality somewhere outside
the core, but I can't imagine how that would be possible. I don't see
any other way to trigger field-specific code, except perhaps to have
DurationField's contribute_to_class create a wrapper around the
model's __init__, which would then coerce the fields after creating
the object. I can't imagine you would consider that a more elegant
solution than simply using to_python to do the same.

On the other hand, some completely separate framework could be used to
do coercion, but that framework would still have to be called within
the core, and would have ultimately the same functionality as
to_python anyway. Even if to_python wasn't originally intended for
this, it seems perfectly suited for this task.

> The MySQL backend already contains code to coerce some date fields
> (and on mysql_old, boolean fields, too). However, this is handled as a
> data-correction interface on the MySQL backend, rather than a
> field-level fix.

And that works just fine in all situations where the expected Python
type is easily identifiable from the the database column type.
Unfortunately, DurationField proves this isn't always the case, and it
seems strange to assume that nobody will ever want to do interesting
things.

-Gul

Gulopine

unread,
Apr 18, 2007, 10:46:03 AM4/18/07
to Django developers
On Apr 17, 9:03 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:
> Coercion of the type you describe is not required for any field
> currently supported. I would be hesitant to include coercion as a core
> part of the framework unless there is a generic need to coerce
> database values into Python values in some way other than that
> performed by the database backend. Can you provide any use cases other
> than your DurationField that require this sort of functionality?

I did think of one other potential use case that's considerably less
general, but still illustrates the potential use for using to_python.
For one of my projects, I created a new field type I called
SubclassField. Basically, it provided a way to store a reference to a
particular class inside a model, given that the class was a subclass
of some given parent class.

The real key here is that the available choices for this field are all
subclasses of this base class. It essentially allows me to define a
single field that can automatically update its available choices when
I add a new subclass to the project. I know this is somewhat vague,
but it's a highly specialized use, and it's a bit difficult to
understand without knowing the full specifications of the project.

I'll try to simplify it a little bit to provide an example. The short
story is that users are allowed to have an inventory of items, in a
similar manner to how video game RPGs keep track of a player's
possessions. In addition, each item has a set of attributes that are
specific to that instance of the item. The real trick is that items
consist of Python code, not arbitrary data, so the item classes have
to exist in Python, not as a separate database table somewhere.

class Item(object):
__metaclass__ = ItemBase # Items have declarative syntax like
Django models

class Inventory(models.Model):
user = models.ForeignKey(User, related_name='inventory')
item_class = SubclassField(Item)

class ItemSettings(models.Model):
item = models.ForeignKey(Inventory, related_name='settings')
name = models.CharField(maxlength=255)
value = models.CharField(maxlength=255)

class Sword(Item):
pass # This would actually have lots of code

class Potion(Item):
pass # This would actually have lots of code

This all works well in part, using the ItemBase metaclass to keep
track of new Item subclasses, and adding them to the choices for the
SubclassField. And it does work quite well for displaying a dropdown
list in forms and saving objects, even directly in the admin itself.
Unfortunately, when retrieving a record from the database, I'd like to
be able to have Inventory.item_class be an actual class object that
can be instantiated using the parameters found in the related
ItemSettings model.

user = User.objects.get(username='Gulopine')
for item in user.inventory:
item = item.item_class(**dict((s.name, s.value) for s in
item.settings))

Of course, in this example, I'm providing all this functionality in
the __init__ of Inventory, so I'm able to retrieve the class,
instantiate it with the available settings, and have item_class
actually be a fully instantiated item instead, immediately upon
creation of the class (which would also occur upon retrieval from the
database). However, that approach is only feasible when I have control
over a model's __init__ method, which isn't going to be the case with
fields like DurationField (and perhaps others) that are meant for more
general use.

I know that was a very contrived example that really doesn't apply
outside of my own project, but it does illustrate another potential
use of to_python to coerce fields, since it'd be more logical in my
code to have item_class be made into a proper class as part of the
field's code, rather than in the __init__ of the model. My to_python
patch would allow Inventory's __init__ to have a fully formed Python
class from the moment it begins executing.

I hope at least some of this made some sense to somebody. If not, I
apologize.

-Gul

Gulopine

unread,
Apr 18, 2007, 1:24:29 PM4/18/07
to Django developers
A little more research shows that I could also accomplish what I'm
looking for by tying into the pre_init or post_init signal, but then
I'd need a customized function for each field that needs processing,
on each model. This could be done, of course, and it would certainly
keep this coercion out of the core, but I can't imagine it would be
considered elegant (or even preferable) in any way. Someone prove me
wrong.

I guess I'll sum up my thoughts on this another way. I don't see
to_python as explicitly converting strings to other data types, but
rather as converting a storage data type to a usable data type. Of
course, that's not the way it is, but rather the way I think it should
be.

-Gul

jbronn

unread,
Apr 18, 2007, 6:11:30 PM4/18/07
to Django developers
> Coercion of the type you describe is not required for any field
> currently supported. I would be hesitant to include coercion as a core
> part of the framework unless there is a generic need to coerce
> database values into Python values in some way other than that
> performed by the database backend.

The GIS branch (GeoDjango) is interested in type coercion.
Specifically, PostGIS returns geometries as hex strings to the client
-- it would be preferable to have this come into the model as either a
different type of string like WKT ("POLYGON ( .. )"), KML
("<Polygon>...</Polygon>") or as a GEOSGeometry() object. For
example, say we have a 'ZipCode' model with a PolygonField ('poly'):

from django.contrib.gis.db import models

class ZipCode(models.Model, models.GeoMixin):
code = models.IntegerField()
poly = models.PolygonField()
objects = models.GeoManager()

As it stands right now, to get the centroid of a polygon you'd have to
do the following:

z = ZipCode.objects.get(code=77002)
centroid = z.get_poly_geos().centroid

Instead, with type coercion, it could be:

z = ZipCode.objects.get(code=77002)
centroid = z.poly.centroid

The latter seems more intuitive to me. Further, it would eliminate
the need for the 'mixin' class completely, since those routines are
for casting the hex as a geometry object and pulling out the needed
value (e.g. get_GEOM_wkt, get_GEOM_area, etc.). While getting the KML
or WKT can be obtained from PostGIS through an extra SELECT parameter,
getting GEOSGeometry() python objects in the QuerySet would require
some sort of type coercion.

In summary, use cases outside of Gulopine's problem would benefit from
a method to perform type coercion. I'm open to suggestions if there's
a better way to do this...

Regards,
-Justin Bronn

Russell Keith-Magee

unread,
Apr 18, 2007, 8:02:39 PM4/18/07
to django-d...@googlegroups.com
On 4/18/07, Gulopine <gulo...@gmail.com> wrote:
>
> On Apr 17, 9:03 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
> wrote:
> I don't know what
> other field types there might be in the future, but it seems like a
> stretch to think that it's unnecessary just because I can't anticipate
> other specific uses. That may just be me, though.

The reason is YAGNI (You Ain't Gonna Need It). Unless there is
actually a need to do something, you don't rearchitect your system to
allow flexibility that _might_ be useful.

I just need to be convinced that it's actually a problem.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,
Apr 18, 2007, 8:26:28 PM4/18/07
to django-d...@googlegroups.com
On 4/19/07, jbronn <jbr...@gmail.com> wrote:
>
> The GIS branch (GeoDjango) is interested in type coercion.
> Specifically, PostGIS returns geometries as hex strings to the client
> -- it would be preferable to have this come into the model as either a
> different type of string like WKT ("POLYGON ( .. )"), KML
> ("<Polygon>...</Polygon>") or as a GEOSGeometry() object. For
> example, say we have a 'ZipCode' model with a PolygonField ('poly'):

Ok. That's the sort of use case I was looking for.

I haven't looked at the GIS branch, and my GIS knowledge from
university is getting _very_ rusty, so you'll need to help me out a
little.

How do you handle data input of GIS types? What format is used during
assignment?

Are you using to_python at all in the GIS branch?

To date, the database backend maintains the philosophy that whatever
you put in (as a python type) is what you get back. If you put a
boolean in, you get a boolean out; you put a DateTime in, you get a
DateTime out. Is this still appropriate in the GIS setting?

Is there a use case for being able to get back data of more than one
type, or is it possible to get a WKT or KML representation from a
GEOSGeometry object?

Yours,
Russ Magee %-)

Robert Coup

unread,
Apr 18, 2007, 8:48:36 PM4/18/07
to django-d...@googlegroups.com
Russell Keith-Magee wrote:
> On 4/19/07, jbronn <jbr...@gmail.com> wrote:
>
>> The GIS branch (GeoDjango) is interested in type coercion.
>> Specifically, PostGIS returns geometries as hex strings to the client
>> -- it would be preferable to have this come into the model as either a
>> different type of string like WKT ("POLYGON ( .. )"), KML
>> ("<Polygon>...</Polygon>") or as a GEOSGeometry() object. For
>> example, say we have a 'ZipCode' model with a PolygonField ('poly'):
>>
>
> Ok. That's the sort of use case I was looking for.
>
> I haven't looked at the GIS branch, and my GIS knowledge from
> university is getting _very_ rusty, so you'll need to help me out a
> little.
>
> How do you handle data input of GIS types? What format is used during
> assignment?
>
It'd be nice to be able to assign a GEOSGeometry object, well known
text, and well known binary formats (and maybe KML/GML). The database
natively takes well-known-binary (WKB) as a hex string and
well-known-text (WKT) formats in assignment to a geometry field.

> To date, the database backend maintains the philosophy that whatever
> you put in (as a python type) is what you get back. If you put a
> boolean in, you get a boolean out; you put a DateTime in, you get a
> DateTime out. Is this still appropriate in the GIS setting?
>
My idea for the internal representation of a model field would be a
lazily-instantiated GEOSGeometry object, which would fill Justin's
requirements. That would also allow conversion to WKT/GML/KML/etc via
the GEOS library. And if you assigned WKT to a field it'd convert that
to a GEOSGeometry object.

However, I don't think we'd want to instantiate a GEOSGeometry object
via to_python for every model instance on the offchance the geometry
field(s) are accessed, but I haven't done any benchmarking. Justin?

Rob :)

Justin Bronn

unread,
Apr 18, 2007, 11:12:27 PM4/18/07
to Django developers
> How do you handle data input of GIS types? What format is used during
> assignment?

During assignment, PostGIS returns in a format called HEXEWKB
(hereinafter "HEX"), one of its 'canonical forms'. Essentially it's a
hexadecimal string of the binary format (WKB, an OGC standard). For
input you can input in either HEX or WKT (another OGC standard) since
PostGIS accepts both.

> Are you using to_python at all in the GIS branch?

Not yet, like you said it's used with the forms and serialization,
both of which I haven't gotten to yet. I tried it out before thinking
it did coercion and discovered it had no effect as to what was
returned from the database.

> Is there a use case for being able to get back data of more than one?


> type, or is it possible to get a WKT or KML representation from a
> GEOSGeometry object?

>From GEOSGeometry it is simple to retrieve WKT and HEX, e.g., 'g.wkt'
or 'g.hex'. However, KML is implemented by PostGIS and can only be
retrieved if you do "SELECT AsKML(the_geom)" on the database end.
Further, GML (ancestor to KML, another OGC standard) and SVG can also
be retrieved with the AsGML() and AsSVG(), respectively.

> My idea for the internal representation of a model field would be a
> lazily-instantiated GEOSGeometry object, which would fill Justin's
> requirements. That would also allow conversion to WKT/GML/KML/etc via
> the GEOS library. And if you assigned WKT to a field it'd convert that
> to a GEOSGeometry object.

+1 for lazy-instantiated model fields

> I haven't done any benchmarking. Justin?

Nothing scientific here, but I ran through the 150 state house
legislature districts of Texas (4MB in SHP file) in the following
manner

geoms = [h.get_poly_geos() for h in House.objects.all()]

That completed in 1.38 seconds. The database query itself took .21,
making the overhead to be around 1.17s, significantly more than the
database. But not too bad for processing a set of 'real-world'
geometries covering a very large area. Definitely fast enough for
lazy-instantiation.

Robert Coup

unread,
Apr 18, 2007, 11:37:00 PM4/18/07
to django-d...@googlegroups.com
Justin Bronn wrote:
> >From GEOSGeometry it is simple to retrieve WKT and HEX, e.g., 'g.wkt'
> or 'g.hex'. However, KML is implemented by PostGIS and can only be
> retrieved if you do "SELECT AsKML(the_geom)" on the database end.
> Further, GML (ancestor to KML, another OGC standard) and SVG can also
> be retrieved with the AsGML() and AsSVG(), respectively.
>
>
Maybe we need to provide some query helpers around those cases? Another
use case I can think of is simplifying geometries.

I've done it before via extra manager methods:

def with_kml_geometry(self, qs):
return self.get_query_set().extra(select={
'geom_kml': 'AsKML(geom)'
})


>> I haven't done any benchmarking. Justin?
>>
> Nothing scientific here, but I ran through the 150 state house
> legislature districts of Texas (4MB in SHP file) in the following
> manner
>
> geoms = [h.get_poly_geos() for h in House.objects.all()]
>
> That completed in 1.38 seconds. The database query itself took .21,
> making the overhead to be around 1.17s, significantly more than the
> database. But not too bad for processing a set of 'real-world'
> geometries covering a very large area. Definitely fast enough for
> lazy-instantiation.
>

But far too slow for always-on GEOSGeometry instantiation.

For non-GIS people that sounds really slow, but remember that often
districts are based off natural features and have tens of thousands of
points defining their boundaries. Coastlines are very good at upping the
point count!

Rob :)

--
One Track Mind Ltd.
PO Box 1604, Shortland St, Auckland, New Zealand
Phone +64-9-966 0433 Mobile +64-21-572 632
Web http://www.onetrackmind.co.nz


Gulopine

unread,
May 10, 2007, 11:14:12 AM5/10/07
to Django developers
I think this discussion could use a bit of revival, as it seems
obvious to me that there are groups that could benefit from some sort
of field-level coercion. In addition to my DurationField and the
various possibilities available with GeoDjango, there are other things
that would be useful in various contexts.

Consider a ColorField that could store its value as an INTEGER for
efficiency, but allows Python to access it as a 3-tuple of (red,
green, blue). This could be done for both assignment and retrieval,
with the field dealing with everything in between using to_python and
get_db_prep_save. In fact, something like this is almost possible
already, with get_db_prep_save handling transmission into the
database, and get_db_prep_lookup dealing with lookup types. In theory
(with a good bit of work, I expect) it should be possible to search on
such a color field with something like
Model.objects.filter(color__red_gt=200) to get objects with a lot of
red in their color. That's probably a bad example given the SQL that
would be necessary to look up on part of an INTEGER while ignoring the
rest of it, but it shows what kind of out-of-the-box thinking would be
possible if the one missing component (to_python) were in place to
handle coercion into a proper Python type.

Another example would be a variant of XMLField that gives its contents
to Python as a minidom.Document object.

Now, I'll gladly admit that I'm throwing out possibilities that I,
myself, may not ever need, and I also know that there are ways to work
around this behavior outside the core of Django (like the mixin
examples provided for GeoDjango), and that Django is about solving
problems people actually encounter in the real world. However, for
DurationField at least, I am indeed experiencing this problem in the
real world. Maybe I'm just the first to consider storing audio
information in Django (though I doubt it), but it's a real situation
with real needs. YAGNI certainly doesn't apply here, as I definitely
am Gonna Need It.

I would also argue that while Django's focus on real-world problem-
solving is sound and just, it's getting a lot of criticism from people
who think it's not powerful enough for complex applications. While in
general I disagree with that criticism, things like this do
demonstrate its refusal to deal with what might seem like a very
simple, straightforward feature. I find it very odd that Django has
the capability to add custom field types, while (by design,
apparently) restricting their usefulness.

-Gul

On Apr 18, 8:26 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

Malcolm Tredinnick

unread,
May 10, 2007, 10:25:15 PM5/10/07
to django-d...@googlegroups.com
On Thu, 2007-05-10 at 08:14 -0700, Gulopine wrote:
> I think this discussion could use a bit of revival, as it seems
> obvious to me that there are groups that could benefit from some sort
> of field-level coercion. In addition to my DurationField and the
> various possibilities available with GeoDjango, there are other things
> that would be useful in various contexts.
>
> Consider a ColorField that could store its value as an INTEGER for
> efficiency, but allows Python to access it as a 3-tuple of (red,
> green, blue). This could be done for both assignment and retrieval,
> with the field dealing with everything in between using to_python and
> get_db_prep_save. In fact, something like this is almost possible
> already, with get_db_prep_save handling transmission into the
> database, and get_db_prep_lookup dealing with lookup types. In theory
> (with a good bit of work, I expect) it should be possible to search on
> such a color field with something like
> Model.objects.filter(color__red_gt=200) to get objects with a lot of
> red in their color. That's probably a bad example given the SQL that
> would be necessary to look up on part of an INTEGER while ignoring the
> rest of it, but it shows what kind of out-of-the-box thinking would be
> possible if the one missing component (to_python) were in place to
> handle coercion into a proper Python type.
>
> Another example would be a variant of XMLField that gives its contents
> to Python as a minidom.Document object.

Something your explanation hasn't addressed so far is actually creating
the right Python types in the models and this surprises me a little. If
you're doing all this thinking about how to map from database fields to
special types, surely the Model.__init__ changes need to come into the
picture so that accessing MyModel.foo returns the right Python type,
too.

Luckily, we've thought about this before. Most of the core changes
needed to add extra field types, particularly those that subclass
existing database types, is essentially. It's about three lines of code
(there was a patch published on this list or django-users last year,
from memory). You basically just have to change the place in
Model.__init__() that assigns the database-retrieved value to a Python
instance attribute with a call to an init()-like method that acts like
you are describing for to_python() -- if such a function exists.

Normally I would point you to the exact thread, but I can't seem to
search it out. In any case, the changes are really trivial. I think
there was some reason you could just use the __init__ method on the
class as the "converter to Python instance", but I may be misremembering

The only vaguely tricky part we didn't solve last time was getting this
to work with serialisation in a reasonably transparent fashion.

The idea was to push pretty much all the work off to the sub-classes. So
the sub-classes special method (be it called to_python() or whatever)
would be given the database value and return an instance that proxies
for that value.

In short, I think your assessment of the situation as far as creating
new data types that map to database fields is broadly correct.

I haven't looked at the patch you mentioned at the start of this thread
in detail and won't have time to do so for quite a while yet. I am
completely buried in Django work in my spare moments and have slightly
less than zero free time. At a glance, I think the patch needs some
thinking about whether it's in the right place, since it's clearly doing
a bit more work each time than the current implementation and function
calls and pop() calls are not free. Not saying it's wrong, per se, just
that it's not a no-brainer, either.

Regards,
Malcolm

Gulopine

unread,
May 11, 2007, 9:46:16 AM5/11/07
to Django developers
I'll be the first to admit that I don't know all the internals of
Django, and I especially don't know why certain decisions were made in
the past. But I appreciate knowing that you understand my reasoning,
and from some your rephrasings, it's clear to me that you really do.
I'm also glad to hear I'm at least somewhat on the right track.

As for Model.__init__, the third question in my original post was
whether there were other places that would need updating, so I'm not
surprised to realize that I missed something important. And yes, the
more I look into this topic, the more I see many inherent flaws with
the patches I've submitted so far.

As for serialization, I haven't worked much with that, but as I work
more on DurationField, I'll look more into it to see if I can come up
with something that'll work for that as well. I guess I still have to
get used to the notion of working with things I don't personally have
a need for, if I want to start suggesting changes that will impact
larger groups of people.

Given the resistance I've had so far with the coercion patch, I've
starting working on a signal-based approach instead, to see if I can
possibly get DurationField working without any patches to other bits
of Django's core. Just to make sure I test everything I should be, I
see the following situations that need to be tested (please let me
know if there are more situations I should be aware of):

* Instantiating an object directly in Python: obj = MyModel(...)
* Storing the object in the database: obj.save()
* Retrieving the object from the database: MyMymodel.objects.get(...)
* Serializing the objects: data = serializers.serialize('xml',
Mymodel.objects.all())
* Deserializing the objects: objs = serializers.serialize('xml', data)

Even as I sit here going back and forth between this post and my
DurationField code, I find that a signal-based approach does in fact
satisfy all those situations, without the coercion patch. At least, it
does for the XML serializer. The JSON one breaks, but I'm using old
Django code here at work, so I'll test it again when I get home before
I submit another patch.

Anyway, I guess I'll just finish by apologizing for being quite as
heated as I was when I sent my last, and now that I found that the
post_init signal seems to do a good job (after all, it pretty much
extends Model.__init__, just like you mentioned), I'll be much less
aggressive on the coercion front. Once I get it tested with Django SVN
at home, I'll put it up so the GeoDjango guys can look at my technique
and see if it would work for their field types as well.

-Gul

On May 10, 10:25 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Malcolm Tredinnick

unread,
May 11, 2007, 10:06:23 AM5/11/07
to django-d...@googlegroups.com
On Fri, 2007-05-11 at 06:46 -0700, Gulopine wrote:
[...]

> Given the resistance I've had so far with the coercion patch, I've
> starting working on a signal-based approach instead, to see if I can
> possibly get DurationField working without any patches to other bits
> of Django's core.

My understanding of this whole thread is that the initial resistance was
because we really do want to understand the use-cases before making any
architecture change. I was thinking along very similar lines to Russell
(we're going to have surgery to have the mind meld broken). If I'd
answered before he did, the conversation would have gone along very
similar lines. As the use cases came out, it started to sound
reasonable. Then there's the question of whether the implementation is
the right approach. As Russell mentioned (and I think I hinted at it,
too), we don't want to bog down the common cases just to benefit the
corners. So you're not home yet, but it's an idea worth playing around
with some more.

For me, it's maybe slightly easier to think about the impact, because
I've thought about how to add new Field sub-classes before. The sort of
"standard example" I keep in my head is an old one of Jacob
Kaplan-Moss's: what if you want to store a Sudoku game in the database?
You can store a string of 81 characters, but that's not very useful when
you're working in the admin interface or wanting to use it in a Python
program. It's also not a field we're ever going to add to core. So if we
can make that use-case possible without negatively impacting other code,
I'm pretty sure most other cases (like your DurationField) become
possible, too.

> Just to make sure I test everything I should be, I
> see the following situations that need to be tested (please let me
> know if there are more situations I should be aware of):
>
> * Instantiating an object directly in Python: obj = MyModel(...)
> * Storing the object in the database: obj.save()
> * Retrieving the object from the database: MyMymodel.objects.get(...)
> * Serializing the objects: data = serializers.serialize('xml',
> Mymodel.objects.all())
> * Deserializing the objects: objs = serializers.serialize('xml', data)

To the list, I would add: using the object with newforms and whether or
not it will be possible to use the object in the admin interface (and I
would concentrate on newforms-admin here, since we aren't adding new
functionality to the existing admin).

> Even as I sit here going back and forth between this post and my
> DurationField code, I find that a signal-based approach does in fact
> satisfy all those situations, without the coercion patch. At least, it
> does for the XML serializer. The JSON one breaks, but I'm using old
> Django code here at work, so I'll test it again when I get home before
> I submit another patch.
>
> Anyway, I guess I'll just finish by apologizing for being quite as
> heated as I was when I sent my last, and now that I found that the
> post_init signal seems to do a good job (after all, it pretty much
> extends Model.__init__, just like you mentioned), I'll be much less
> aggressive on the coercion front. Once I get it tested with Django SVN
> at home, I'll put it up so the GeoDjango guys can look at my technique
> and see if it would work for their field types as well.

My usual advice in these sorts of situations is "take a deep breath.
Repeat." Your tone wasn't horribly inappropriate or anything (at least,
not to my eyes). You are working on one patch. A few of us are trying to
juggle multiple threads of development as well as make sure every
reasonable request on the mailing list gets a fair hearing. So things
may move a little slower than you like sometimes.

However, it definitey wasn't crazy to bring up the "am I thinking along
the right lines here?" thread initially, because, as it turned out,
there were some questions that had to be asked and answered. That is
what code- and design-review is all about. If you can't find flaws in
your own design, you haven't thought about it hard enough (and we'll
help you), but if you can defend a design or implementation against all
the questions and improve it when necessary, it makes life much easier
for everybody.

Seriously, I doubt anybody in the thread was really too worried by your
posts. So long as you realise the response was almost a standard process
that every proposed design feature goes through it -- either explicitly
on the mailing list or in our heads as we design things.

Best wishes,
Malcolm.


Gulopine

unread,
May 11, 2007, 11:04:41 AM5/11/07
to Django developers
On May 11, 10:06 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

> My understanding of this whole thread is that the initial resistance was
> because we really do want to understand the use-cases before making any
> architecture change. [...] So if we

> can make that use-case possible without negatively impacting other code,
> I'm pretty sure most other cases (like your DurationField) become
> possible, too.

And that's really all I meant when I mentioned the resistance. I
wasn't offended by the logic or anything, it just spurred me to take a
different approach what would only add overhead in the cases that use
it. By using a post_init hook in the field's contribute_to_class, it
gets processed for all models that need it, but none of the ones that
don't. Again, I'll put up a new DurationField patch for review when I
get home this afternoon.

One option to avoid field authors from having to use signals
themselves would be to add another special method name for these types
of coercion functions, and have ModelBase register those with
post_init if it finds any during the class creation process. It ends
up only being a a couple extra lines in the field (overriding
contribute_to_class and adding the signal hook), but if a particular
name can be used to identify these hooks, authors wouldn't have to
worry about it, and the overhead would only be in class creation and
the models that actually use those fields. I don't know of signals are
the best approach yet, but I do agree that it's best to keep that out
of QuerySet iterators where I have it in the current patch.

If I had to pick a name for a special method for that purpose, I
suppose I'd probably go with "coerce" as it seems most appropriate for
what it's doing, and it doesn't have any current meaning to Django.
I'll look into how much code it would take to get something like that
working, as a proof-of-concept anyway.

> To the list, I would add: using the object with newforms and whether or
> not it will be possible to use the object in the admin interface (and I
> would concentrate on newforms-admin here, since we aren't adding new
> functionality to the existing admin).

I haven't worked with newforms-admin yet, but I'll make sure it
satisfies those requirements as well. Once we do get things finalized
for this, I'll try write up a wiki article describing the process as
well as these tests, to make sure other field authors (which there are
bound to be, once it's easier to do) go through the appropriate cases.

> My usual advice in these sorts of situations is "take a deep breath.
> Repeat." Your tone wasn't horribly inappropriate or anything (at least,
> not to my eyes). You are working on one patch. A few of us are trying to
> juggle multiple threads of development as well as make sure every
> reasonable request on the mailing list gets a fair hearing. So things
> may move a little slower than you like sometimes.

Well, I'm not really concerned with speed, to be honest. I fully
understand that you guys have a lot to deal with (and thank you for
doing it!), and I'm really just trying to do my part to contribute as
well. It can be tough to juggle my desire to help relieve you guys of
some work, while still trying to avoid adding the extra work
explaining these types of things to me. I guess I just took some of
Russell's words to mean that there wasn't any interest in even looking
for more use cases, so thanks for setting me straight.

-Gul

Reply all
Reply to author
Forward
0 new messages