Hi all,
This thread relates to proposed (and nearly RFC) changes in
https://github.com/django/django/pull/3047. It contains backwards incompatible changes relevant to third party database backend authors, and to anyone using SubfieldBase in custom fields.
tl;dr
Deprecation of SubfieldBase, removal of resolve_columns and convert_values in favour of a more general converter based approach and public API field.from_db_value. Now works seamlessly with aggregation, .values() and raw queries.
Long version
Database backend story
Not all database drivers return all field types in the same format. Psycopg2 is generally pretty well behaved, but mysql, oracle and sqlite all have some idiosyncrasies about certain fields, usually relating to dates, decimals or booleans. Furthermore, sometimes these only happened when aggregates were involved. There were two hooks provided to deal with these, firstly Query.convert_values calling DatabaseOperations.convert_values, which was called by aggregates code, and secondly the optional method SQLCompiler.resolve_columns which was called during normal queryset evaluation if present. On oracle and in gis, SQLCompiler.resolve_columns invoked Query.convert_values which in turn invoked DatabaseOperations.convert_values.
Custom field story
Django provides a metaclass called SubfieldBase, which basically means that that field will have it's to_python() method called whenever a value is assigned to the model, including when it is loaded from the database. This really is an abuse of to_python() whose primary purpose is to convert strings from serialisation and forms into the relevant python object. It also provided no way to change the behaviour based on the backend, but crucially was not called by aggregation or values() calls. (
https://code.djangoproject.com/ticket/14462)
Proposed changes
The new proposed code allows backends and fields to provide converter functions to transform the value from the database to the model layer. DatabaseBackend converters are run first and for internal types will normalise the values. A custom field can then convert the resulting object again if needs be. This code is run in an efficient manner and the same way in all parts of the ORM - queries, aggregates, .values(), .dates() etc. Due to changes in the signatures and for performance reasons, all the hooks have changed. The new API is summaries below:
"Private" API
SQLCompiler.get_converters(fields)
SQLCompiler.apply_converters(row, converters)
"Semi-Private" API
DatabaseOperations.get_db_converters(internal_type) - returns a list of backend converter functions convert(value, field)
Field.get_db_converters(connection) - returns a list of field converter functions convert(value, field)
Public API
Field.from_db_value(value, connection) - public documented hook to replace SubfieldBase.
A note on gis
This has a 147 line negative diff in contrib.gis, removing a bunch of duplicated code, an entire compiler module for mysql, GeoValuesQuerySet, and custom code in SQLDateCompiler. Overall it is a pretty substantial cleanup and the proposed API is nicely overridable by the gis compiler.
Status of patch
I'm still ironing out a few minor issues with the CI but Anssi is happy with the general format of the patch and it has no serious performance regressions. All being well, I hope to merge the patch by this time next week. It will be needed by two upcoming new cross-database fields to come out of the django.contrib.postgres project - UUIDField and DurationField as these field types will need real work done in non-postgres backend converters.
Comments welcome!
Marc