[Django] #18757: Examine ways to get rid of DB backend convert_values()

17 views
Skip to first unread message

Django

unread,
Aug 13, 2012, 2:37:12 AM8/13/12
to django-...@googlegroups.com
#18757: Examine ways to get rid of DB backend convert_values()
----------------------------------------------+--------------------
Reporter: akaariai | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Database layer (models, ORM) | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
The convert_values() method is responsible for converting database values
to Python values. The conversion is used at least for aggregates, GIS and
Oracle's query compiler.

The problem with current implementation is two-fold:
1. The model fields have no way of specifying new converters. This is
problematic for 3rd party fields. They would need to do some kind of a
dynamic patch to the backend's convert_values() method if they need
convert_values() support.
2. The different backends have slightly differing implementation for the
same cases.

In #13844 it was suggested that Field.to_python() method could be reused.
It seems it does mostly the correct thing, but it is meant for validation,
not converting DB values to Python. So, going that route would need at
least some more investigation.

One possible solution is something like this: A field has convert_values()
method. The field can (and likely will) call a backend specific converter
method. So, a custom field could do the following:
1. Add converter methods to backends it supports:
{{{
def psycopg2_converter(self, value):
return value.lower()
try:
from django.db.backends.postgresql_psycopg2.operations import
DatabaseOperations
DatabaseOperations.convert_myfield_value = psycopg2_converter
# Not sure if this exact syntax works.
except ImportError:
pass
# Or, if there is need for just a generic converter:

from django.db.backends import BaseDatabaseOperations
BaseDatabaseOperations.convert_myfield_value = my_generic_converter
}}}
2. The convert_values method of the field itself would be:
{{{
def convert_values(self, connection, value):
return connection.convert_myfield_value(value)
}}}

A little complicated, but it is general. I think it would be fast enough.

Most fields could just do the conversion directly in
field.convert_values() without messing with the !DatabaseOperations at
all. Core fields would not need to do any dynamic installation, as we
could of course just install the converter into the source code.

The above proposal is just one way to achieve what is needed. I do think
there is some need for improvement here, but I am not at all sure the
suggestion in this post is the way forward.

--
Ticket URL: <https://code.djangoproject.com/ticket/18757>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Nov 5, 2012, 10:58:43 AM11/5/12
to django-...@googlegroups.com
#18757: Examine ways to get rid of DB backend convert_values()
-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: nobody
Type: | Status: new
Cleanup/optimization | Version: master
Component: Database layer | Resolution:
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by akaariai):

* needs_better_patch: => 0
* needs_docs: => 0
* needs_tests: => 0
* stage: Unreviewed => Accepted


Comment:

I am going to mark this accepted. Not necessarily what I suggested in the
description, but some way of getting rid of the convert_values seems like
a good idea. This is one area where non-core fields and core-fields are
not equal.

--
Ticket URL: <https://code.djangoproject.com/ticket/18757#comment:1>

Django

unread,
Aug 12, 2014, 6:50:49 AM8/12/14
to django-...@googlegroups.com
#18757: Examine ways to get rid of DB backend convert_values()
-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: mjtamlyn
Type: | Status: assigned

Cleanup/optimization | Version: master
Component: Database layer | Resolution:
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by mjtamlyn):

* owner: nobody => mjtamlyn
* status: new => assigned


Comment:

`convert_values` is currently run under the following conditions:

- When aggregates are run. This means sqlite, oracle, mssql, and gis all
have special cases, and the default backend has handling for floats and
IntegerFields (which is not always called in a `super()` for sqlite and
oracle...)
- When the compiler defines `resolve_columns` and this calls
`convert_values`. This means oracle, all gis.

It is worth noting that the work done in `resolve_columns` on mysql
probably should be in `convert_values` (make boolean fields into a boolean
value).

----

Like with many things, we need to support both third party backends and
third party fields appropriately here. So we need to make sure that
mssql's use of `convert_values` is still possible to handle SQLServer's
weird datetime formatting, and also support a third party (or builtin)
field which has its own storage. There are two possible steps - one is
normalisation of the returned data so all backends operate the same, and
the second is further conversion from the "backend" version to the full
model version (this would solve #14462). Both steps should be possible at
both levels in my opinion - so the field level method should still receive
the connection object.

On top of this, all data conversion at the moment is done using
`get_internal_type` so for backwards compatibility we need to make sure we
respect any field with that set to a real internal type. This means we
cannot move all of the conversion currently done to field level (at least
outside of gis I think).
----

A possible plan of action:

- make `SQLCompiler.results_iter` always call
`SQLCompiler.resolve_columns`, and make sure subclasses (which are still
needed) call the `super()` as needed.
- For each field and row call `field.from_db_value(value, connection)`. By
default this would look like:
{{{
def field.from_db_value(value, connection):
internal_type = self.get_internal_type()
converter_name = 'convert_%s_value' % internal_type.lower()
func = getattr(connection.ops, converter_name, None)
if func:
value = func(value)
return value
}}}

An individual field can override `from_db_value` if it wants to do custom
conversions, and backends add `convert_myfield_value` methods rather than
`convert_values` with a stupid number of if statements in it. Individual
`convert_myfield_value` methods should be coded defensively in case
conversion is already done - there are cases where you get a different
value depending on whether it is an aggregate or a direct value.

I think this will allow me to remove some complexity from gis and mysql at
least, and remove `convert_values` as it is all together. It would be
backwards incompat for third party backends using `convert_values`,
although this could easily be included with a deprecation warning in the
default `from_db_value` if needed (if hasattr....). Given how mssql's
`convert_values` is written anyway, I think this will be an improvement
anyway.

POC coming up soon hopefully!

--
Ticket URL: <https://code.djangoproject.com/ticket/18757#comment:2>

Django

unread,
Aug 12, 2014, 7:34:25 AM8/12/14
to django-...@googlegroups.com
#18757: Examine ways to get rid of DB backend convert_values()
-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: mjtamlyn
Type: | Status: assigned
Cleanup/optimization | Version: master
Component: Database layer | Resolution:
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: | Patch needs improvement: 0
Has patch: 0 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by akaariai):

I have tested a bit in this area and we really can't call
field.from_db_value for every field. It is just plain too expensive. We
are talking about 2-3x performance slowdown for models with ten fields. A
single method call is somewhere around 5% slowdown in model initialization
speed. For 10 fields the above field.from_db_value() definition will cause
3 calls per field, so that is 5% * 3 * 10 = 150% slowdown.

Instead we need to do the following:
1) Collect converters for each field before we start iterating the
result set.
2) Not all fields define a converter - in fact almost all fields in core
do not need a converter at all.
3) At the same time we also collect converters from the database
backend.
4) When we have collected the converters, start iterating through the
result set. Pass the values through the found converters.

This is needed so that if a field doesn't define any converter, then there
is no overhead for that field. For this reason I would also try to avoid
doing backend specific conversion in the field's from_db method - this
means that if any backend needs from_db support, then all backends need to
pay the overhead of calling the method for no benefit at all.

I don't think the new way needs to be backwards compatible to the old
convert_values way. We just need to ensure the old way works for backwards
compatibility period.

So, alternate proposal:
1) Add field.get_db_value_converters(connection) method. The method
returns a list of converters or None if no converters are needed.
2) Add backend.get_value_converter(field) method. By default this method
returns [backend.convert_values] if that is defined for the backend,
otherwise it returns None.
3) Collect all converters before iterating results.
4) When iterating through the results, the converters will be called
with just value as argument (we could likely also add connection here)
5) Deprecate SubfieldBase, deprecate backend.convert_values.

As for how to do the convert_values collection, look for
https://github.com/akaariai/django/tree/custom_lookups for one
implementation. Unfortunately that implementation is mixed with unrelated
changes.

--
Ticket URL: <https://code.djangoproject.com/ticket/18757#comment:3>

Django

unread,
Aug 12, 2014, 8:23:37 AM8/12/14
to django-...@googlegroups.com
#18757: Examine ways to get rid of DB backend convert_values()
-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: mjtamlyn
Type: | Status: assigned
Cleanup/optimization | Version: master
Component: Database layer | Resolution:
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: | Patch needs improvement: 1
Has patch: 1 | UI/UX: 0

Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by mjtamlyn):

* needs_better_patch: 0 => 1
* has_patch: 0 => 1


Comment:

POC at https://github.com/django/django/pull/3047

I agree with your comments akaariai - will try to update to use that
approach. Most of the code is currently in removing `convert_values`
rather than in the details of what happens afterwards. If I get the POC
all green, the performance *should* be easier.

For my own notes, I believe #21565 should get fixed by this as well, and
so I should write a failing test for it. At least if I can get a working
spatialite setup anyway...

--
Ticket URL: <https://code.djangoproject.com/ticket/18757#comment:4>

Django

unread,
Sep 3, 2014, 3:36:23 PM9/3/14
to django-...@googlegroups.com
#18757: Examine ways to get rid of DB backend convert_values()
-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: mjtamlyn
Type: | Status: closed
Cleanup/optimization | Version: master
Component: Database layer | Resolution: fixed

(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: | Patch needs improvement: 1
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by Marc Tamlyn <marc.tamlyn@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"e9103402c0fa873aea58a6a11dba510cd308cb84"]:
{{{
#!CommitTicketReference repository=""
revision="e9103402c0fa873aea58a6a11dba510cd308cb84"
Fixed #18757, #14462, #21565 -- Reworked database-python type conversions

Complete rework of translating data values from database

Deprecation of SubfieldBase, removal of resolve_columns and
convert_values in favour of a more general converter based approach and
public API Field.from_db_value(). Now works seamlessly with aggregation,
.values() and raw queries.

Thanks to akaariai in particular for extensive advice and inspiration,
also to shaib, manfre and timograham for their reviews.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18757#comment:5>

Django

unread,
Sep 19, 2014, 8:17:41 AM9/19/14
to django-...@googlegroups.com
#18757: Examine ways to get rid of DB backend convert_values()
-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: mjtamlyn
Type: | Status: closed
Cleanup/optimization | Version: master
Component: Database layer | Resolution: fixed
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: | Patch needs improvement: 1
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by Tim Graham <timograham@…>):

In [changeset:"185ab9ffefcf81378d7da02dddca0a59487b9613"]:
{{{
#!CommitTicketReference repository=""
revision="185ab9ffefcf81378d7da02dddca0a59487b9613"
Fixed Oracle GIS failures introduced by e9103402c0; refs #18757.

Thanks Marc Tamlyn for the patch.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18757#comment:6>

Django

unread,
Sep 22, 2014, 8:10:22 AM9/22/14
to django-...@googlegroups.com
#18757: Examine ways to get rid of DB backend convert_values()
-------------------------------------+-------------------------------------
Reporter: akaariai | Owner: mjtamlyn
Type: | Status: closed
Cleanup/optimization | Version: master
Component: Database layer | Resolution: fixed
(models, ORM) | Triage Stage: Accepted
Severity: Normal | Needs documentation: 0
Keywords: | Patch needs improvement: 1
Has patch: 1 | UI/UX: 0
Needs tests: 0 |
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by Marc Tamlyn <marc.tamlyn@…>):

In [changeset:"c6fd1e904cb15da1a627843c79b89b19beabe2a1"]:
{{{
#!CommitTicketReference repository=""
revision="c6fd1e904cb15da1a627843c79b89b19beabe2a1"
Fixed Oracle GIS gml() test failure introduced by e910340; refs #18757.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18757#comment:7>

Reply all
Reply to author
Forward
0 new messages