Removing global database converters and adapters for datetimes

183 views
Skip to first unread message

Aymeric Augustin

unread,
Apr 13, 2015, 2:13:51 PM4/13/15
to django-d...@googlegroups.com
Hello,

On databases that don't support time zones -- SQLite, MySQL, Oracle (*) -- the
implementation of time zone support I added in Django 1.4 defines:

1. A global converter, e.g. `parse_datetime_with_timezone_support` for SQLite,
   which processes values returned by the database for datetime columns and
   returns `datetime.datetime` values. When USE_TZ is True, these are aware
   datetimes with a UTC timezone.

   This was part of the original patch:

2. A global adapter, e.g. `adapt_datetime_with_timezone_support` for SQLite,
   which processes `datetime.datetime` values sent to the database. When
   USE_TZ is True, these are converted to UTC, then the tzinfo attribute is
   stripped.

   This was introduced to fix #17755:

(*) I believe that Oracle could support time zones like PostgreSQL but Django
    doesn't take advantage of this capability at this time.

This is a poor implementation because converters and adapters are global in
the sqlite3 and MySQLdb modules. Other packages such as iPython are affected

I'll keep cx_Oracle outside of this discussion because it has a per-connection
`outputtypehandler` instead of global converters and because Django's Oracle
backend wraps all parameters instead of relying on global adapters.

I think it would be more appropriate to handle all conversions in the ORM.

With the introduction of DatabaseOperations.get_db_converters in Django 1.8,
it seems possible to use per-connection converters instead of global
converters.

With this change, `cursor.execute()` will return naive datetimes where it used
to return aware datetimes. That's backwards incompatible, but that's the goal:
Django shouldn't alter globally the behavior of sqlite3 or MySQLdb cursors.

Replacing adapters is a bit more tricky. Removing them causes the two
regression tests for #17755 to fail. I haven't determined how I can process
parameters passed to QuerySet.raw(), but since that's part of the ORM, I'm
optimistic.

With this change, `cursor.execute()` will ignore the time zone of datetimes in
parameters. This looks acceptable for naive datetimes because Django has been
raising a warning for five versions in that case. However it may result in
silent data corruption for aware datetimes with a time zone other than UTC.
Once again that's inherent to the change. Besides documenting it, I don't
have anything to suggest.

In addition to making Django a better citizen in the Python ecosystem, these
changes will make it possible to interact with third-party databases where
data isn't stored in UTC when USE_TZ is True, a big flaw in the current
support for time zones: https://code.djangoproject.com/ticket/23820. This will
be a trivial patch once the global converters and adapters are gone.

Do you think the backwards-incompatibilities are acceptable?

-- 
Aymeric.



Carl Meyer

unread,
Apr 13, 2015, 3:11:24 PM4/13/15
to django-d...@googlegroups.com
On 04/13/2015 12:13 PM, Aymeric Augustin wrote:
[snip]
> Do you think the backwards-incompatibilities are acceptable?

I think so, yes. It really isn't good behavior for Django to be
automatically installing such global adapters and converters.

Perhaps the release notes documenting this backwards-incompatible change
could also provide sample code for users to install these
adapters/converters themselves, in order to restore the previous
behavior, if they are using datetimes in raw SQL queries with
cursor.execute()?

It would also be possible to put the silent-data-loss aspect of this
change through a deprecation path, by having the existing adapters raise
a deprecation warning on receiving any non-UTC aware datetime for a
couple releases (presuming the ORM is updated to always send only UTC,
the adapter should only receive non-UTC on direct cursor.execute()).

Carl

signature.asc

Marc Tamlyn

unread,
Apr 14, 2015, 4:01:16 AM4/14/15
to django-d...@googlegroups.com
I think this is likely a good plan. There are a number of other places where we globally register converters/adapters (at least in postgres land) rather than explicitly registering them on each connection. I'm not sure if they're an issue in the same way though.

Marc


--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/552C14C9.90200%40oddbird.net.
For more options, visit https://groups.google.com/d/optout.

Aymeric Augustin

unread,
May 2, 2015, 3:06:57 PM5/2/15
to django-d...@googlegroups.com
On 13 avr. 2015, at 20:13, Aymeric Augustin <aymeric....@polytechnique.org> wrote:

> Replacing adapters is a bit more tricky. Removing them causes the two
> regression tests for #17755 to fail. I haven't determined how I can process
> parameters passed to QuerySet.raw(), but since that's part of the ORM, I'm
> optimistic.

Well… I was optimistic :-) I need help.

I’m looking for a way to fix https://code.djangoproject.com/ticket/17755
without registering adapters at the level of the DB-API module (sqlite3,
MySQLdb, etc.)

In a regular QuerySet / Query, a parameter is tied to a field (or equivalent,
like an expression, which has an output field). The field's get_db_prep_value
method is used to transform the parameter to an appropriate value for the
DB-API module. In general, datetime objects are handled by a DateTimeField,
whose get_db_prep_value method calls connection.ops.value_to_db_datetime, and
everything is fine.

Unfortunately, a RawQuerySet / RawQuery cannot use the same logic because its
parameters aren't tied to fields. For lack of a better idea, I'm proposing to
add a best-effort conversion function to BaseDatabaseOperations:

def value_to_db(self, value):
"""
Transforms a value to something compatible with the backend driver.

This method only depends on the type of the value. It's designed for
cases where the target type isn't known. As a consequence it may not
work perfectly in all circumstances.
"""
if isinstance(value, datetime.datetime): # must be before date
return self.value_to_db_datetime(value)
elif isinstance(value, datetime.date):
return self.value_to_db_date(value)
elif isinstance(value, datetime.time):
return self.value_to_db_time(value)
elif isinstance(value, decimal.Decimal):
return self.value_to_db_decimal(value)
else:
return value

Then I will invoke this method on all parameters, most likely in
RawQuery._execute_query. This is the last step in the call chain:

Model.objects.raw
-> RawQuerySet.__iter__
-> RawQuery.__iter__
-> RawQuery._execute_query
-> cursor.execute

Does that make sense?

I know that I'm duplicating the functionality of database adapters, but I
don't have much choice if I don't want to touch the default global adapters.

--
Aymeric.




Aymeric Augustin

unread,
May 3, 2015, 8:07:08 AM5/3/15
to django-d...@googlegroups.com
Hi Carl,

2015-04-13 21:11 GMT+02:00 Carl Meyer <ca...@oddbird.net>:
It would also be possible to put the silent-data-loss aspect of this
change through a deprecation path, by having the existing adapters raise
a deprecation warning on receiving any non-UTC aware datetime for a
couple releases (presuming the ORM is updated to always send only UTC,
the adapter should only receive non-UTC on direct cursor.execute()).

I have implemented this. It works and I think that's the best compromise for
backwards compatibility.


There are a few side effects for adapters — the Python => DB side — but I find
them acceptable considering the small cross section of people who will be
affected by this change.

From 1.4 to 1.8, on SQLite and MySQL, Django has raised a warning when
receiving a naive datetime. 1.9 and 2.0 will raise a warning when receiving an
aware datetime. That isn't very nice to our users.

Writing backend-independent code becomes a bit tricky. In the case we're
discussing here, passing naive datetimes in UTC regardless of whether the
backend supports time zones should work. As far as I can tell, the PostgreSQL
backend won't complain, even though it's designed for aware datetimes.

If users take advantage of the ability to configure the time zone for a given
database, the shim won't work correctly. Since it's global, it doesn't know
about the current connection. It's hardcoded to use UTC.

Overall I find it logical that connection.cursor() be unaffected by Django's
time zone management. If we follow this line of thought, we can ask why the
PostgreSQL backend adds the UTC tzinfo at the level of the cursor. Is that
worth changing? I've brough up that question in the pull request comments.

--
Aymeric.
Reply all
Reply to author
Forward
0 new messages