Changes to database-to-python conversions

368 views
Skip to first unread message

Marc Tamlyn

unread,
Aug 26, 2014, 4:35:52 AM8/26/14
to django-d...@googlegroups.com
Hi all,

This thread relates to proposed (and nearly RFC) changes in https://github.com/django/django/pull/3047. It contains backwards incompatible changes relevant to third party database backend authors, and to anyone using SubfieldBase in custom fields.

tl;dr
Deprecation of SubfieldBase, removal of resolve_columns and convert_values in favour of a more general converter based approach and public API field.from_db_value. Now works seamlessly with aggregation, .values() and raw queries.

Long version
Database backend story
Not all database drivers return all field types in the same format. Psycopg2 is generally pretty well behaved, but mysql, oracle and sqlite all have some idiosyncrasies about certain fields, usually relating to dates, decimals or booleans. Furthermore, sometimes these only happened when aggregates were involved. There were two hooks provided to deal with these, firstly Query.convert_values calling DatabaseOperations.convert_values, which was called by aggregates code, and secondly the optional method SQLCompiler.resolve_columns which was called during normal queryset evaluation if present. On oracle and in gis, SQLCompiler.resolve_columns invoked Query.convert_values which in turn invoked DatabaseOperations.convert_values.

This change based on whether SQLCompiler.resolve_columns existed or not caused at least https://code.djangoproject.com/ticket/21565

Custom field story
Django provides a metaclass called SubfieldBase, which basically means that that field will have it's to_python() method called whenever a value is assigned to the model, including when it is loaded from the database. This really is an abuse of to_python() whose primary purpose is to convert strings from serialisation and forms into the relevant python object. It also provided no way to change the behaviour based on the backend, but crucially was not called by aggregation or values() calls. (https://code.djangoproject.com/ticket/14462)

Proposed changes
The new proposed code allows backends and fields to provide converter functions to transform the value from the database to the model layer. DatabaseBackend converters are run first and for internal types will normalise the values. A custom field can then convert the resulting object again if needs be. This code is run in an efficient manner and the same way in all parts of the ORM - queries, aggregates, .values(), .dates() etc. Due to changes in the signatures and for performance reasons, all the hooks have changed. The new API is summaries below:

"Private" API
SQLCompiler.get_converters(fields)
SQLCompiler.apply_converters(row, converters)

"Semi-Private" API
DatabaseOperations.get_db_converters(internal_type) - returns a list of backend converter functions convert(value, field)
Field.get_db_converters(connection) - returns a list of field converter functions convert(value, field)

Public API
Field.from_db_value(value, connection) - public documented hook to replace SubfieldBase.

A note on gis
This has a 147 line negative diff in contrib.gis, removing a bunch of duplicated code, an entire compiler module for mysql, GeoValuesQuerySet, and custom code in SQLDateCompiler. Overall it is a pretty substantial cleanup and the proposed API is nicely overridable by the gis compiler.

Status of patch
I'm still ironing out a few minor issues with the CI but Anssi is happy with the general format of the patch and it has no serious performance regressions. All being well, I hope to merge the patch by this time next week. It will be needed by two upcoming new cross-database fields to come out of the django.contrib.postgres project - UUIDField and DurationField as these field types will need real work done in non-postgres backend converters.

Comments welcome!

Marc

Anssi Kääriäinen

unread,
Aug 26, 2014, 4:55:38 AM8/26/14
to django-d...@googlegroups.com
The way database values are converted to Python values really needs to
be cleaned up. Currently there are multiple different ways to do value
conversions, but for example for .values() calls none of the ways apply.
There are somewhat complicated hacks to make gis and date queries work
correctly.

The proposed way is simple yet powerful enough to work for all the cases
mentioned. There is only very minor overhead if a given query doesn't
need conversions. When conversions are needed, the conversions are done
in an efficient manner.

In short: big +1 from me. My only worry is what 3rd party backend
maintainers think of this change.

- Anssi

Josh Smeaton

unread,
Aug 26, 2014, 8:43:08 AM8/26/14
to django-d...@googlegroups.com
I'm a big fan of the resolve_columns and convert_values cleanup. The large swaths of removed code in GIS is also very welcome. Great stuff!

I suspect I'll be able to make a few improvements in the aggregation/annotation patch I've been working on once this lands.

Aymeric Augustin

unread,
Aug 26, 2014, 5:04:17 PM8/26/14
to django-d...@googlegroups.com
I support deleting django.contrib.gis.db.models.sql.compiler.SQLDateTimeCompiler. There’s some code in Django I’m really ashamed of; this class ranks high on the list.

I’ve been frustrated by inconsistencies in this area in the past but never investigated. Overall, your approach looks reasonable and the new API is useful. Two questions:

- Do we now have a consistent set of conversion methods and a clear view of which one does what?

- Did you consider adding an example of switching from SubfieldBase to from_db_value in the docs?

--
Aymeric.

Marc Tamlyn

unread,
Aug 26, 2014, 6:02:47 PM8/26/14
to django-d...@googlegroups.com
> Do we now have a consistent set of conversion methods and a clear view of which one does what?

This does need further investigation and testing. In particular I know we are being a bit heavy handed at the moment with a consequential performance hit. The issue is that some backends require converters only in certain cases - most notably on aggregation or in .dates(). By always running the code, we are getting more consistency but at a hit of performance. I intend to work through the built in converters and ensure we have a proper test suite for them at some point, but there are two things to consider - one is upcoming changes to aggregates and the other is that the diff is already 1k lines.


> Did you consider adding an example of switching from SubfieldBase to from_db_value in the docs?

It's "remove the metaclass, add from_db_value which does a similar thing to to_python but just handles the db case". Someone can always diff 1.7 and 1.8 docs ;)



--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/071D2D94-60FF-44C4-90CB-F9CCEB1252C8%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.

Shai Berger

unread,
Aug 26, 2014, 8:38:52 PM8/26/14
to django-d...@googlegroups.com
Big +1.

I suspect this also opens the door to a significant speed improvement for
Oracle's number column handling, but I'll have to investigate further (ask me
about numbers_as_strings if you're interested in details).

Thanks for the great work,

Shai.
Reply all
Reply to author
Forward
0 new messages