The GSoC proposal "Multiple timezone support for datetime representation" wasn't picked up in 2011 and 2010. Although I'm not a student and the summer is over, I'd like to tackle this problem, and I would appreciate it very much if a core developer accepted to mentor me during this work, GSoC-style.
Here is my proposal, following the GSoC guidelines. I apologize for the wall of text; this has been discussed many times in the past 4 years and I've tried to address as many concerns and objections as possible.
Definition of success
---------------------
The goal is to resolve ticket #2626 in Django 1.4 or 1.5 (depending on when 1.4 is released).
Design specification
--------------------
Some background on timezones in Django and Python
.................................................
Currently, Django stores datetime objects in local time in the database, local time being defined by the TIME_ZONE setting. It retrieves them as naive datetime objects. As a consequence, developers work with naive datetime objects in local time.
This approach sort of works when all the users are in the same timezone and don't care about data loss (inconsistencies) when DST kicks in or out. Unfortunately, these assumptions aren't true for many Django projects: for instance, one may want to log sessions (login/logout) for security purposes: that's a 24/7 flow of important data. Read tickets #2626 and #10587 for more details.
Python's standard library provides limited support for timezones, but this gap is filled by pytz <http://pytz.sourceforge.net/>. If you aren't familiar with the topic, strongly recommend reading this page before my proposal. It explains the problems of working in local time and the limitations of Python's APIs. It has a lot of examples, too.
Django should use timezone-aware UTC datetimes internally
.........................................................
Example : datetime.datetime(2011, 09, 23, 8, 34, 12, tzinfo=pytz.utc)
In my opinion, the problem of local time is strikingly similar to the problem character encodings. Django uses only unicode internally and converts at the borders (HTTP requests/responses and database). I propose a similar solution: Django should always use UTC internally, and conversion should happen at the borders, i.e. when rendering the templates and processing POST data (in form fields/widgets). I'll discuss the database in the next section.
Quoting pytz' docs: "The preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans." I think we can trust pytz' developers on this topic.
Note that a timezone-aware UTC datetime is different from a naive datetime. If we were using naive datetimes, and assuming we're using pytz, a developer could write:
mytimezone.localize(datetime_django_gave_me)
which is incorrect, because it will interpret the naive datetime as local time in "mytimezone". With timezone-aware UTC datetime, this kind of errors can't happen, and the equivalent code is:
datetime_django_gave_me.astimezone(mytimezone)
Django should store datetimes in UTC in the database
....................................................
This horse has been beaten to death on this mailing-list so many times that I'll keep the argumentation short. If Django handles everything as UTC internally, it isn't useful to convert to anything else for storage, and re-convert to UTC at retrieval.
In order to make the database portable and interoperable:
- in databases that support timezones (at least PostgreSQL), the timezone should be set to UTC, so that the data is unambiguous;
- in databases that don't (at least SQLite), storing data in UTC is the most reasonable choice: if there's a "default timezone", that's UTC.
I don't intend to change the storage format of datetimes. It has been proposed on this mailing-list to store datetimes with original timezone information. However, I suspect that in many cases, datetimes don't have a significant "original timezone" by themselves. Furthermore, there are many different ways to implemented this outside of Django's core. One is to store a local date + a local time + a place or timezone + is_dst flag and skip datetime entirely. Another is to store an UTC datetime + a place or timezone. In the end, since there's no obvious and consensual way to implement this idea, I've chosen to exclude it from my proposal. See the "Timezone-aware storage of DateTime" thread on this mailing list for a long and non-conclusive discussion of this idea.
I'm expecting to take some flak because of this choice :) Indeed, if you're writing a multi-timezone calendaring application, my work isn't going to resolve all your problems — but it won't hurt either. It may even provide a saner foundation to build upon. Once again, there's more than one way to solve this problem, and I'm afraid that choosing one would offend some people sufficiently to get the entire proposal rejected.
Django should convert between UTC and local time in the templates and forms
...........................................................................
I regard the problem of local time (in which time zone is my user?) as very similar to internationalization (which language does my user read?), and even more to localization (in which country does my user live?), because localization happens both on output and on input.
I want controllable conversion to local time when rendering a datetime in a template. I will introduce:
- a template tag, {% localtime on|off %}, that works exactly like {% localize on|off %}; it will be available with {% load tz %};
- two template filters, {{ datetime|localtime }} and {{ datetime|utctime }}, that work exactly like {{ value|localize }} and {{ value|unlocalize }}.
I will convert datetimes to local time when rendering a DateTimeInput widget, and also handle SplitDateTimeWidget and SplitHiddenDateTimeWidget which are more complicated.
Finally, I will convert datetimes entered by end-users in forms to UTC. I can't think of cases where you'd want an interface in local time but user input in UTC. As a consequence, I don't plan to introduce the equivalent of the `localize` keyword argument in form fields, unless someone brings up a sufficiently general use case.
How to set each user's timezone
...............................
Internationalization and localization are based on the LANGUAGES setting. There's a widely accepted standard to select automatically the proper language and country, the Accept-Language header.
Unfortunately, some countries like the USA have more than one timezone, so country information isn't enough to select a timezone. To the best of my knowledge, there isn't a widely accepted way to determine the timezones of the end users on the web.
I intend to use the TIME_ZONE setting by default and to provide an equivalent of `translation.activate()` for setting the timezone. With this feature, developers can implement their own middleware to set the timezone for each user, for instance they may want to use <http://pytz.sourceforge.net/#country-information>.
This means I'll have to introduce another thread local. I know this is frowned upon. I'd be very interested if someone has a better idea.
It might be no longer necessary to set os.environ['TZ'] and run time.tzset() at all. That would avoid a number of problems and make Windows as well supported as Unix-based OSes — there's a bunch of tickets in Trac about this.
I'm less familiar with this part of the project and I'm interested in advice about how to implement it properly.
Backwards compatibility
.......................
Most previous attempts to resolve have stumbled upon this problem.
I propose to introduce a USE_TZ settings (yes, I know, yet another setting) that works exactly like USE_L10N. If set to False, the default, you will get the legacy (current) behavior. Thus, existing websites won't be affected. If set to True, you will get the new behavior described above.
I will also explain in the release notes how to migrate a database — which means shifting all datetimes to UTC. I will attempt to develop a script to automate this task.
Dependency on pytz
..................
I plan to make pytz a mandatory dependency when USE_TZ is True. This would be similar to the dependency on on gettext when USE_I18N is True.
pytz gets a new release every time the Olson database is updated. For this reason, it's better not to copy it in Django, unlike simplejson and unittest2.
It was split from Zope some time ago. It's a small amount of clean code and it could be maintained within Django if it was abandoned (however unlikely that sounds).
Miscellaneous
.............
The following items have caused bugs in the past and should be checked carefully:
- caching: add timezone to cache key? See #5691.
- functions that use LocalTimezone: naturaltime, timesince, timeuntil, dateformat.
- os.environ['TZ']. See #14264.
- time.tzset() isn't supported on Windows. See #7062.
Finally, my proposal shares some ideas with https://github.com/brosner/django-timezones; I didn't find any documentation, but I intend to review the code.
About me
--------
I've been working with Django since 2008. I'm doing a lot of triage in Trac, I've written some patches (notably r16349, r16539, r16548, also some documentation improvements and bug fixes), and I've helped to set up continuous integration (especially for Oracle). In my day job, I'm producing enterprise software based on Django with a team of ten developers.
Work plan
---------
Besides the research that's about 50% done, and discussion that's going to take place now, I expect the implementation and tests to take me around 80h. Given how much free time I can devote to Django, this means three to six months.
Here's an overview of my work plan:
- Implement the USE_TZ flag and database support — this requires checking the capabilities of each supported database in terms of datetime types and time zone support. Write tests, especially to ensure backwards compatibility. Write docs. (20h)
- Implement timezone localization in templates. Write tests. Write docs. (10h)
- Implement timezone localization in widgets and forms. Check the admin thoroughly. Write tests. Write docs. (15h)
- Implement the utilities to set the user's timezone. Write tests. Write docs. (15h)
- Reviews, etc. (20h)
What's next?
------------
Constructive criticism, obviously :) Remember that the main problems here are backwards-compatibility and keeping things simple.
Best regards,
--
Aymeric.
Annex: Research notes
---------------------
Wiki
....
Relevant tickets
................
#2626: canonical ticket for this issue
#2447: dupe, an alternative solution
#8953: dupe, not much info
#10587: dupe, a fairly complete proposal, but doesn't address backwards compatibility for existing data
Relevant related tickets
........................
#14253: how should "now" behave in the admin when "client time" != "server time"?
Irrelevant related tickets
..........................
#11385: make it possible to enter data in a different timezone in DateTimeField
#12666: timezone in the 'Date:' headers of outgoing emails - independant resolution
Relevant threads
................
2011-05-31 Timezone-aware storage of DateTime
http://groups.google.com/group/django-developers/browse_thread/thread/76e2b486d561ab79
2010-08-16 Datetimes with timezones for mysql
https://groups.google.com/group/django-developers/browse_thread/thread/5e220687b7af26f5
2009-03-23 Django internal datetime handling
https://groups.google.com/group/django-developers/browse_thread/thread/ca023360ab457b91
2008-06-25 Proposal: PostgreSQL backends should *stop* using settings.TIME_ZONE
http://groups.google.com/group/django-developers/browse_thread/thread/b8c885389374c040
2007-12-02 Timezone aware datetimes and MySQL (ticket #5304)
https://groups.google.com/group/django-developers/browse_thread/thread/a9d765f83f552fa4
Relevant related threads
........................
2009-11-24 Why not datetime.utcnow() in auto_now/auto_now_add
http://groups.google.com/group/django-developers/browse_thread/thread/4ca560ef33c88bf3
Irrelevant related threads
..........................
2011-07-25 "c" date formating and Internet usage
https://groups.google.com/group/django-developers/browse_thread/thread/61296125a4774291
2011-02-10 GSoC 2011 student contribution
https://groups.google.com/group/django-developers/browse_thread/thread/0596b562cdaeac97/585ce1b04632198a?#585ce1b04632198a
2010-11-04 Changing settings per test
https://groups.google.com/group/django-developers/browse_thread/thread/65aabb45687e572e
2009-09-15 What is the status of auto_now and auto_now_add?
https://groups.google.com/group/django-developers/browse_thread/thread/cd1a76bca6055179
2009-03-09 TimeField broken in Oracle
https://groups.google.com/group/django-developers/browse_thread/thread/bba2f80a2ca9b068
2009-01-12 Rolling back tests -- status and open issues
https://groups.google.com/group/django-developers/browse_thread/thread/1e4f4c840b180895
2008-08-05 Transactional testsuite
https://groups.google.com/group/django-developers/browse_thread/thread/49aa551ad41fb919
First, thanks for all your triage work on track, it's very much
appreciated. This is a large proposal, but I think it covers the issue
pretty completely.
I agree with your approach of using UTC internally and converting on output.
My main concern with your proposal is that it may require two
different code paths for a lot of code. I have no objection to the
dependency on pytz, but I don't want this to result in pytz-aware code
that diverges significantly from the non-pytz default path.
I would like this new code to eventually become the default setting,
but acknowledge that there doesn't seem to be a good way to do that
and maintain our current backwards compatibility (and external
dependency) policy.
This isn't my area of expertise, so I'm not a good mentor candidate,
but I'd be excited to help you with this project where I can.
I'm wholeheartedly +1 on this.
Best,
-Paul
Here are some answers to your remarks.
On 3 sept. 2011, at 20:19, Paul McMillan wrote:
> My main concern with your proposal is that it may require two
> different code paths for a lot of code.
Currently, my best guess is:
- limited changes in the models layer
- some new code in the forms and template layers (if USE_TZ: …)
- most of the duplication will be in the tests, because I have to test everything with and without USE_TZ.
Of course, I can't be sure before actually writing the code :)
> I would like this new code to eventually become the default setting,
> but acknowledge that there doesn't seem to be a good way to do that
> and maintain our current backwards compatibility (and external
> dependency) policy.
I'm going to use the same trick used by USE_L10N and explained by Anssi: set USE_TZ to False in global_settings.py and to True in the template of settings.py. This preserves backwards compatibility but the new code is the default for new projects.
On 3 sept. 2011, at 21:19, Anssi Kääriäinen wrote:
> - Performance: The USE_L10N setting made rendering tables consisting
> of numbers much slower (see #14290 for example). It would be good if
> there wasn't any major regressions due to this.
With USE_TZ, rendering a datetime will be slower because of the conversion to local time. I have no idea of the performance of pytz. I'll do some benchmarks.
> - Concurrency: My quick reading of the documentation of pytz
> suggests that this should not be a problem. But your implementation
> would be thread safe, right? The main problem here is localizing the
> timestamps.
Certainly, the implementation must be thread safe, because requests may run concurrently in different threads for users in different timezones. I'm pretty sure it will work out of the box, and it will be even more isolated if I can avoid setting os.environ['TZ'] and calling time.tzset().
> - I want to save the following information in the database: The
> meeting starts at 08:00 localtime 2011-09-04. Localtime is based on
> some other information, maybe the location of the meeting room.
> (...) it would be very valuable if there were a way to save just a
> plain datetime information in the database if the user so chooses.
In my opinion, and more importantly in PostgreSQL's implementation, a DateTime represents a "point in time", in an absolute sense — ignoring relativistic effects :)
For your example, I support Daniel's suggestion: use a DateField and a TimeField, plus a way to obtain the time zone (like a foreign key to the meeting room). That's exactly the information you said you wanted to store :)
I've hesitated a lot to add a "naive=False/True" keyword argument to DateTimeField. Right now, I'm still convinced that mixing naive and aware DateTimes in the same application a recipe for disaster and that Django shouldn't facilitate it.
On 4 sept. 2011, at 00:48, Mikhail Korobov wrote:
> I have one concern though. Can you please explain why is USE_TZ option better than introducing e.g. UtcDateTimeField?
In short, Django shouldn't provide two different ways to represent a "point in time". Quoting the Zen of Python: "There should be one-- and preferably only one --obvious way to do it."
> USE_TZ=True will break all existing code (including external apps) which relies on django 1.3 documented DateTimeField behavior, this can be scary and will introduce a lot of "if getattr(settings, USE_TZ, False): #..." statements in external apps for backwards compatibility.
This is an interesting question. For backwards compatibility, if a DateTimeField is set to a naive datetime (like datetime.datetime.now()) and USE_TZ is True, the value should be interpreted in local time and a warning should be raised.
With this rule, I believe that most code should work just fine with or without USE_TZ, thanks to duck typing. Basically, what works on a naive datetime also works on an aware datetime, including timedelta arithmetic.
Of course, applications dedicated to dealing with timezones may break. If you're using one, you should probably stick with USE_TZ = False.
> Good UtcDateTimeField implementation can be released as a separate package (and then eventually included in django itself). This way existing django projects will be able to use it without waiting for a release and backwards compatibility won't be broken. Are there obstacles in django itself that prevent this option?
In short, my proposal can't be implemented outside of core because it requires a modification of the template engine.
If it could, I'm sure someone would have done it already — ticket #2626 has been open for five years.
In fact, I tried to implement these ideas without modifying Django in my current project. The result is way too messy for publication, let alone for eventual inclusion in Django. My entry point in the template engine is django.utils.formats.localize, and I'm monkey patching it to convert datetimes to local time. This means I'm mixing localization and timezones. I have to set TIME_ZONE = 'UTC' to get UTC in the database and introduce another setting, LOCAL_TIME_ZONE, for the local time of the project. Finally my developers must never forget to use custom DateTimeFields in models and forms, as well as a custom ModelAdmin base class, or UTC data shows up in the admin. It's totally unwieldy.
On 4 sept. 2011, at 12:34, Daniel Swarbrick wrote:
> re storing UTC in Postgres: this is somewhat moot,
> since Postgres stores timestamps internally as a UTC Julian date
> regardless of which timezone the client connection uses
Under PostgreSQL, I assume I have to use a TIMESTAMP WITH TIME ZONE and set the connection's timezone to UTC if I want non-Django applications using the same database to obtain correct results, regardless of their connection's timezone. To be honest, this is pure speculation; I must check this for each database engine.
Best regards,
--
Aymeric Augustin.
This isn't gonna work because your new code will have a hard dependency on pytz.
-Paul
Django uses PIL for ImageField, but it isn't a hard dependency because
it's only imported within functions that actually use it. We can do
the same for pytz if it's only used in a few places.
If a module uses pytz in many functions, I suggest this pattern:
from django.conf import settings
if settings.USE_TZ:
import pytz
And it's a bug to hit code that depends on pytz when USE_TZ is False.
If we want to make some functions (like timezone conversion helpers)
available to anyone who has pytz, regardless of the value of USE_TZ,
we could use this pattern:
try:
import pytz
except ImportError:
if settings.USE_TZ:
raise
--
Aymeric Augustin.
Given the positive feedback received here and on IRC, I've started the implementation.
Being most familiar with mercurial, I've forked the Bitbucket mirror. This page that compares my branch to trunk:
https://bitbucket.org/aaugustin/django/compare/..django/django
I've read a lot of code in django.db, and also the documentation of PostgreSQL, MySQL and SQLite regarding date/time types.
I've written some tests that validate the current behavior of Django. Their goal is to guarantee backwards-compatibility when USE_TZ = False.
At first they failed because runtests.py doesn't set os.environ['TZ'] and doesn't call time.tzset() , so the tests ran with my system local time. I fixed that in setUp and tearDown. Maybe we should call them in runtests.py too for consistency?
By the way, since everything is supposed to be in UTC internally when USE_TZ is True, it is theoretically to get rid of os.environ['TZ'] and time.tzset(). They are only useful to make timezone-dependant functions respect the TIME_ZONE setting. However, for backwards compatibility (in particular with third-party apps), it's better to keep them and interpret naive datetimes in the timezone defined by settings.TIME_ZONE (instead of rejecting them outright). For this reason, I've decided to keep os.environ['TZ'] and time.tzset() even when USE_TZ is True.
Best regards,
--
Aymeric Augustin.
This week, I've gathered all the information I need about how the database engines and adapters supported by Django handle datetime objects. I'm attaching my findings.
The good news is that the database representations currently used by Django are already optimal for my proposal. I'll store data in UTC:
- with an explicit timezone on PostgreSQL,
- without timezone on SQLite and MySQL because the database engine doesn't support it,
- without timezone on Oracle because the database adapter doesn't support it.
Currently, Django sets the "supports_timezones feature" to True for SQLite. I'm skeptical about this choice. Indeed, the time zone is stored: SQLite just saves the output of "<datetime>.isoformat(), which includes the UTC offset for aware datetime objects. However, the timezone information is ignored when reading the data back from the database, thus yielding incorrect data when it's different from the local time defined by settings.TIME_ZONE.
As far as I can tell, the "supports_timezones" and the "needs_datetime_string_cast" database features are incompatible, at least with the current implementation of "typecast_timestamp". There's a comment about this problem that dates back to the merge of magic-removal, possibly before:
https://code.djangoproject.com/browser/django/trunk/django/db/backends/util.py?annotate=blame#L79
SQLite is the only engine who has these two flags set to True. I think "supports_timezones" should be False. Does anyone know why it's True? Is it just an historical artifact?
Finally, I have read the document that describes "to_python", "value_to_string", and r"get_(db_)?prep_(value|save|lookup)". The next step is to adjust these functions in DateFieldField, depending on the value of settings.USE_TZ.
This week, I've been working on a related topic that I had missed entirely in my initial proposal: serialization.
Developers will obtain aware datetimes from Django when USE_TZ = True. We must ensure that they serialize correctly.
Currently, the serialization code isn't very consistent with datetimes:
- JSON: the serializer uses the '%Y-%m-%d %H:%M:%S' format, losing microseconds and timezone information. This dates back to the initial commit at r3237. See also #10201.
- XML: the serializer delegates to DateTimeField.value_to_string, who also uses the '%Y-%m-%d %H:%M:%S' format.
- YAML: the serializer handles datetimes natively, and it includes microseconds and UTC offset in the output.
I've hesitated between converting datetimes to UTC or rendering them as-is with an UTC offset. The former would be more consistent with the database and it's recommended in YAML. But the latter avoids modifying the data: not only is it faster, but it's also more predictable. Serialization isn't just about storing the data for further retrieval, it can be used to print arbitrary data in a different format. Finally, when the data comes straight from the database (the common case), it will be in UTC anyway.
Eventually, I've decided to serialize aware datetimes without conversion. The implementation is here:
https://bitbucket.org/aaugustin/django/compare/..django/django
Here are the new serialization formats for datetimes:
- JSON: as described in the specification at http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf > 15.9.1.15 Date Time String Format.
- XML: as produced by datetime.isoformat(), ISO8601.
- YAML: unchanged, compatible with http://yaml.org/type/timestamp.html — the canonical representation uses 'T' as separator and is in UTC, but it's also acceptable to use a space and include an offset like pyyaml does.
These formats follow the best practices described in http://www.w3.org/TR/NOTE-datetime.
This fix is backwards-incompatible for the JSON and XML serializers: it includes fractional seconds and timezone information, and it uses the normalized separator, 'T', between the date and time parts. However, I've made sure that existing fixtures will load properly with the new code. I'll mention all this in the release notes.
Unrelatedly, I have switched the SQLite backend to supports_timezones = False, because it really doesn't make sense to write the UTC offset but ignore it when reading back the data.
Best regards,
--
Aymeric Augustin.
> <DATABASE-NOTES.html>
>
> On 11 sept. 2011, at 23:18, Aymeric Augustin wrote:
>
>> Hello,
>>
>> Given the positive feedback received here and on IRC, I've started the implementation.
>>
>> Being most familiar with mercurial, I've forked the Bitbucket mirror. This page that compares my branch to trunk:
>> https://bitbucket.org/aaugustin/django/compare/..django/django
>>
>> I've read a lot of code in django.db, and also the documentation of PostgreSQL, MySQL and SQLite regarding date/time types.
>>
>> I've written some tests that validate the current behavior of Django. Their goal is to guarantee backwards-compatibility when USE_TZ = False.
>>
>> At first they failed because runtests.py doesn't set os.environ['TZ'] and doesn't call time.tzset() , so the tests ran with my system local time. I fixed that in setUp and tearDown. Maybe we should call them in runtests.py too for consistency?
>>
>> By the way, since everything is supposed to be in UTC internally when USE_TZ is True, it is theoretically to get rid of os.environ['TZ'] and time.tzset(). They are only useful to make timezone-dependant functions respect the TIME_ZONE setting. However, for backwards compatibility (in particular with third-party apps), it's better to keep them and interpret naive datetimes in the timezone defined by settings.TIME_ZONE (instead of rejecting them outright). For this reason, I've decided to keep os.environ['TZ'] and time.tzset() even when USE_TZ is True.
>>
>> Best regards,
>>
>> --
>> Aymeric Augustin.
>>
>>
This week, I've finished the work on serialization by making the deserializers capable of handling UTC offsets. I had to rewrite DateTimeField.to_python to extract and interpret timezone offsets. Still, deserialization of aware datetimes doesn't work with PyYAML: http://pyyaml.org/ticket/202
I also implemented the storage and retrieval of aware datetime objects in PostgreSQL, MySQL and Oracle. Conversions happen:
- on storage, in `connection.ops.value_to_db_datetime`, called from `get_db_prep_value`;
- on retrieval, in the database adapter's conversion functions.
The code is rather straightforward. When USE_TZ is True, naive datetimes are interpreted as local time in TIME_ZONE, for backwards compatibility with existing applications.
SQLite is more tricky because it uses `django.db.backends.util.typecast_timestamp` to convert string to datetimes. However:
- this function is used elsewhere of Django, in combination with the `needs_datetime_string_cast` flag.
- it performs essentially the same operations as `DateTimeField.to_python`.
I'll review the history of this code, and I'll try to refactor it.
Besides adding support for SQLite, I still have to:
- check that datetimes behave correctly when they're used as query arguments, in aggregation functions, etc.
- optimize django.utils.tzinfo: fix #16899, use pytz.utc as the UTC timezone class when pytz is available, etc.
I won't have much time for this project next week. See you in two weeks for the next check-in!
Best regards,
--
Aymeric Augustin.
I've implemented the storage and retrieval of aware datetime objects in SQLite. This involved some refactoring, because SQLite returns datetimes as strings, and I didn't want to duplicate the parsing code. All database backends are now covered.
Although it isn't strictly necessary for my project, I suspect the django.db.backends.util.typecast_* functions could use further refactoring. They date back to r3, and the database adapters have become better at converting data to Python objects since then. This deserves more research.
I've also committed a patch for #16906 that I wanted to merge in my branch before further work, and fixed some related tickets (#16899, #16923, #4076).
I've had other priorities since my last check-in; I hope I'll have more time for this project now.
Best regards,
--
Aymeric Augustin.
I was getting weary of archeology in database adaptation code, so I've been working on the template layer this week.
It is now possible to control the localization of datetimes in templates. I also introduced a "current time zone" (before someone asks: yes, it's a thread local) and the possibility to change it in Python and template code. This works with and without pytz. The next step in this area is to handle conversions is forms and widgets.
I still have to figure out where it's better to use the "default time zone" (settings.TIME_ZONE) and where it's better to use "current time zone" to interpret naive datetimes. Depending on the answer, I may have to make the model layer aware of the "current time zone".
As usual, if you're curious: https://bitbucket.org/aaugustin/django/compare/..django/django
Best regards,
--
Aymeric Augustin.
> On Oct 16, 11:48 am, Aymeric Augustin
> <aymeric.augus...@polytechnique.org> wrote:
>> Hello,
>>
>> I've implemented the storage and retrieval of aware datetime objects in SQLite. This involved some refactoring, because SQLite returns datetimes as strings, and I didn't want to duplicate the parsing code. All database backends are now covered.
>
> Is it worth looking at registering a custom converter for SQLite
> (http://docs.python.org/library/sqlite3.html#converting-sqlite-values-
> to-custom-python-types) to handle the date objects?
My sentence wasn't totally accurate: Django's currently using converters for datetimes, and I changed them to return aware datetimes.
> Converters can be
> registered per-column - the SELECT query would need to be tweaked to
> hint to the SQLite libs to call the registered converter. I'm not sure
> if that would be too intrusive to the existing Django query builder.
I didn't know that. It could be useful to convert results of aggregation. At the moment, I don't know how they are converted to Python types.
This check-in is highly topical, as clocks rolled back one hour in most parts of Europe this morning :)
I've implemented conversions in forms and widgets, and in the admin too. I've made template filters convert their input to the current time zone when appropriate. I've added a context processor and a template tag that give access to the current time zone in templates.
I've written docs.
As a sanity check, I created a demo app: https://bitbucket.org/aaugustin/django-tz-demo
This means I've reached the point where the branch is usable. Try it! Clone https://bitbucket.org/aaugustin/django, put it in PYTHONPATH, set USE_TZ=True, and you're good to go. If you have questions, I'm mYk in #django-dev on Freenode.
I still have a few details to iron out before the branch is ready for merging:
- flesh out the docs with some code samples
- write the release notes
- check how values used as parameters and computed values behave in the ORM
- check how the timesince and timeuntil template filters behave
- ensure the cache depends on the current timezone
- investigate this comment in django/contrib/gis/gdal/field.py: # TODO: Adapt timezone information.
- fix bugs found by reviewers :)
Additional work could include:
- performance tests
- refactoring django.utils.timezone and django.utils.tzinfo -- tzinfo supports functions such as naturaltime, timesince, timeuntil, dateformat, but it wasn't compatible with the features I needed for the time zone support, so I wrote a new module.
- refactoring datetime parsing / formatting utilities throughout Django, and in particular django.db.backends.util.typecast_* -- I keep finding bugs such as #17134
Initially, I considered writing a script to automatically shift timestamps in the database backends that need it, to ease the migration of existing projects to USE_TZ=True. Upon further thought, I'm reluctant to mess with people's databases, even with a big fat disclaimer. If I write such a script, I'll upload it to djangosnippets.
Best regards,
--
Aymeric Augustin.
On 30 oct. 2011, at 17:07, ptone wrote:
> I do think a better label than "value" could be used for the
> thread.local for the current timezone, as this is really a thread
> global - something less ambiguous and conflict prone would be better
> IMO.
I used this name for consistency with django.utils.translation.trans_real.
Thread locals store data as attributes, and there is only one value to store in timezone._active, hence the generic name. Since it's purely internal to django.utils.timezone, there's no risk of conflicts.
I think it's fine as is.
> Also an explicit link to the activate command from the section
> where you introduce the current time zone concept.
Good point. I've added a link.
> Perhaps with a simple code example of how you would set
> current time per user in a view.
Yes, I was precisely adding an example to "Selecting the current time zone" :) I'll push it tonight.
> Was a threading.local the only way to do this - I'm sure you've given
> it some thought - there are relatively few uses of thread locals in
> Django - translation being one. If the timezone were stored on the
> current request, would that not be available to most places in django
> code that need TZ within the current thread?
One of the design principles of Django is loose coupling [1]. For instance, this means that "the template system knows nothing about Web requests" -- thanks to whoever wrote the exact quote I needed :)
The template and the form layers use the current time zone. (I believe nothing else does.) So, storing the current timezone in the request doesn't really help, since those two layers don't have access to the request.
More generally, I agree that global state should be avoided. Thread locals are just a symptom: if you want thread-safe global state, you have to use a thread local. [2]
Here, I'm using a small bit of global state to store the current time zone. It's stored exactly like the current locale. I'm not making the situation significantly worse; if we refactor the storage of the locale, it'll be trivial to refactor the storage of the timezone in the same way.
Finally, one of the problems of thread locals is that they don't provide strong isolation between threads (everything is in the same memory space after all). From this perspective, storing the current timezone in a thread local isn't nearly as bad as storing the entire request object -- think request.POST['password'].
> I think a very brief outline of the steps required for migration may
> be worthwhile in the release notes.
I agree. I've added some instructions at the bottom of the documentation page, and I'll point there from the release notes.
Thanks for your feedback!
--
Aymeric.
[1] https://docs.djangoproject.com/en/1.3/misc/design-philosophies/#loose-coupling
[2] http://groups.google.com/group/django-users/msg/7e7e75325bd086a1
I'm glad to announce that the time zone support branch is ready for review.
Review the changes: https://bitbucket.org/aaugustin/django/compare/..django/django
Clone the branch: hg clone https://bitbucket.org/aaugustin/django
Play with the demo app: hg clone https://bitbucket.org/aaugustin/django-tz-demo
The docs are supposed to explain everything you need, and I described many design decisions in my weekly check-ins, so I'm purposefully keeping this email short on details :)
Finally here's a teaser: http://myks.org/stuff/django-tz-demo-2.png
----
During this project, most things went according to the initial plan.
There is only one outstanding issue that I know of. `QuerySet.dates()` operates in the database timezone, ie. UTC, while an end user would expect it to take into account its current time zone. This affects `date_hierarchy` in the admin and the date based generic views. The same problem also exists for the `year`, `month`, `day`, and `week_day` lookups.
Under PostgreSQL, provided the current time zone has a valid name (e.g. it's provided by pytz), we could temporarily set the connection's timezone to this time zone and restore it to UTC afterwards. With other backends, I can't think of a simpler solution than fetching all values and doing the aggregation in Python. Neither of these thrills me, so for the time being, I've documented this as a known limitation.
Best regards,
--
Aymeric.
Finally got around to this! My review follows. I didn't follow the
original discussions very closely, so forgive any questions that have
been answered already. On the other hand, the code and comments probably
ought to answer the kind of questions I have, without needing to search
mailing list archives, so it might be useful that I haven't been
following that closely.
The code looks in very good shape. I've got quite a lot of things listed
below, but the vast majority of my comments are small or very small nits
that can be easily corrected, with the possible exception of my
questions regarding django.utils.timezone.LocalTimezone
== django/contrib/admin/util.py ==
> elif isinstance(field, models.DateField):
> return formats.localize(timezone.aslocaltime(value))
> elif isinstance(field, models.DateField) or isinstance(field,
models.TimeField):
> return formats.localize(value)
Isn't the first clause in the second elif rendered useless by the
previous elif?
== django/db/backends/mysql/base.py ==
> # Finally, MySQLdb always returns naive datetime objects.
Is it possible that a future version of MySQLdb could change this? If
so, would it be better to not make this assumption, and future proof the
code? We could perhaps just add an assertion for the assumption we make,
so that we don't end up silently corrupting data with a future version
of MySQLdb that acts differently, because at the moment the function
datetime_or_None_with_timezone_support just wipes out any existing
tzinfo field.
There is a similar issue in the Oracle backend I think.
> raise ValueError("MySQL backend does not support timezone-aware
datetimes.")
This seems to be misleading - shouldn't the message add the caveat "if
USE_TZ == False" or something?
== django/db/backends/oracle/base.py ==
> raise ValueError("Oracle backend does not support timezone-aware
datetimes.")
As above, should add "if USE_TZ == False"
> def output_type_handler(cursor, name, defaultType,
> size, precision, scale):
tiny nit - is there a reason for camelCase just for the defaultType
variable?
== django/db/backends/sqlite3/base.py ==
> supports_timezones = False
What exactly is meant by this? The timezone support in the sqlite
backend appears to be very similar to that in MySQL.
If SQLite genuinely does not support time zones in some important way,
this should probably be noted in the release notes, and definitely in
docs/ref/databases.txt
== django/utils/timezone.py ==
> import time as _time
There doesn't seem to be any clash to avoid, so I don't see the need for
importing like this.
However, my biggest question is about the LocalTimezone class. We now
have two implementations of this, one in django.utils.tzinfo, which
seems to have more extensive tests, and a new one in
django.utils.timezone. The old one is still in use, both by old code
e.g. in django/contrib/syndication/views.py and new code e.g.
django/utils/dateformat.py under DateFormat.__init__
The new one is used in other places.
There is no explanation of the need for these two, or when they should
be used, which to me seems like a minimum requirement.
Perhaps we just need to combine the two LocalTimezone implementations in
tzinfo.py? If we can't do that for some backward compatibility reasons,
can we have some explanation, and preferably a deprecation path for the
older code? Finally, do we need to address code that uses the old
LocalTimezone in some way - should it be unconditionally using our own
LocalTimezone instead of something from pytz when available?
> # This function exists for consistency with get_default_timezone_name
> def get_default_timezone_name():
I'm guessing the comment should refer to get_current_timezone_name instead?
> def is_aware()
It is probably worth adding a comment indicating that the logic is taken
straight from Python docs, with a link -
http://docs.python.org/library/datetime.html#available-types
== docs/ ==
Some instances of 'time-zone' can be found, instead of 'time zone'
== docs/howto/custom-template-tags.txt ==
> Filters and time zones
I think this section needs a 'versionadded' directive.
> to to your filter
^ typo
== docs/ref/models/querysets.txt ==
> When :doc:`time zone support </topics/i18n/timezones>` is enabled, the
> database stores datetimes in UTC, which means the aggregation is
> performed in UTC. This is a known limitation of the current
> implementation.
This implies that when time zone support is not enabled, datetimes are
not stored in UTC - which is false at least for Postgres. Perhaps this
could be reworded in some way?
Similarly with the next warning about 'year', 'month' etc. lookups.
> - It includes the time zone for aware datetime objects. It raises
> an exception for aware time objects.
First instance of 'aware' should be 'naive'. Second sentence should end
'aware datetime or time objects' for completeness.
== docs/ref/utils.txt ==
Needs a versionadded directive for django.utils.dateparse and
django.utils.timezone
== docs/topics/i18n/timezones.txt ==
> Provided the current time zone raises an exception for datetimes that
> don't exist or are ambiguous because they fall in a DST transition
> (pytz_ does), such datestimes will be reported as invalid values.
There is a typo - datestimes - but I found the paragraph confusing due
it starting with 'Providing'. Perhaps this would be better:
> If the current time zone raises an exception for datetimes that
> don't exist or are ambiguous because they fall in a DST transition
> (the timezones provided by pytz_ do this), such datetimes will be
> reported as invalid values.
== tests ==
I haven't actually reviewed any of the test code - I ran out of energy,
and test code is so dull, sorry!
I did run coverage for the tests and found:
- without pytz installed, the new LocalTimezone class
> During this project, most things went according to the initial plan.
>
> There is only one outstanding issue that I know of.
> `QuerySet.dates()` operates in the database timezone, ie. UTC, while
> an end user would expect it to take into account its current time
> zone. This affects `date_hierarchy` in the admin and the date based
> generic views. The same problem also exists for the `year`, `month`,
> `day`, and `week_day` lookups.
>
> Under PostgreSQL, provided the current time zone has a valid name
> (e.g. it's provided by pytz), we could temporarily set the
> connection's timezone to this time zone and restore it to UTC
> afterwards. With other backends, I can't think of a simpler solution
> than fetching all values and doing the aggregation in Python. Neither
> of these thrills me, so for the time being, I've documented this as
> a known limitation.
You could argue a case for saying that the behaviour has some
advantages: if one admin user filters a list of objects by date, and
sends the link to a colleague, they would probably expect the colleague
to end up with the same list of objects.
But even if that isn't very convincing, I'm happy with this limitation
given the implementation difficulties of fixing it.
Great work, BTW.
Luke
--
"Trouble: Luck can't last a lifetime, unless you die young."
(despair.com)
Luke Plant || http://lukeplant.me.uk/
Thanks for the review! I've updated the branch with your suggestions.
Here are a few comments -- when I have something to say other than
"fixed, thanks".
2011/11/8 Luke Plant <L.Pla...@cantab.net>:
> == django/contrib/admin/util.py ==
>
>> elif isinstance(field, models.DateField):
>> return formats.localize(timezone.aslocaltime(value))
>> elif isinstance(field, models.DateField) or isinstance(field,
> models.TimeField):
>> return formats.localize(value)
>
> Isn't the first clause in the second elif rendered useless by the
> previous elif?
It's a bug: the first elif should test for models.DateTimeField. The
problem didn't appear in the tests because DateTimeField is a subclass
of DateField.
> == django/db/backends/mysql/base.py ==
>
>> # Finally, MySQLdb always returns naive datetime objects.
>
> Is it possible that a future version of MySQLdb could change this? If
> so, would it be better to not make this assumption, and future proof the
> code? We could perhaps just add an assertion for the assumption we make,
> so that we don't end up silently corrupting data with a future version
> of MySQLdb that acts differently, because at the moment the function
> datetime_or_None_with_timezone_support just wipes out any existing
> tzinfo field.
I'm now checking if the value is naive before overwriting its tzinfo.
If the database adapter returns an aware datetime (in a future
version), it's passed through unchanged.
> There is a similar issue in the Oracle backend I think.
The issue existed in SQLite, MySQL and Oracle.
>> raise ValueError("MySQL backend does not support timezone-aware
> datetimes.")
>
> This seems to be misleading - shouldn't the message add the caveat "if
> USE_TZ == False" or something?
I've updated the message for SQLite, MySQL and Oracle.
> == django/db/backends/sqlite3/base.py ==
>
>> supports_timezones = False
>
> What exactly is meant by this? The timezone support in the sqlite
> backend appears to be very similar to that in MySQL.
>
> If SQLite genuinely does not support time zones in some important way,
> this should probably be noted in the release notes, and definitely in
> docs/ref/databases.txt
All backends but PostgreSQL have supports_timezones = False. It means
that the database engine (SQLite, MySQL) or its adapter (cx_Oracle)
doesn't support timezones. This flag existed before my branch, and it
was True for SQLite. I switched it to False because SQLite doesn't
actually support timezones. This shouldn't affect anything besides
Django's own test suite -- database features aren't part of the public
API (AFAIK).
I've added a paragraph about this change to the release notes.
> == django/utils/timezone.py ==
>
>> import time as _time
>
> There doesn't seem to be any clash to avoid, so I don't see the need for
> importing like this.
This code is taken straight from Python's docs, and I wanted to keep
it as close as possible to the original for clarity and ease of
maintenance -- see #16899 for example.
I can't use the original code as is, because the timezone isn't known
at compile time, only after the settings have been loaded. But I still
think it's worth minimizing the diff.
> However, my biggest question is about the LocalTimezone class. We now
> have two implementations of this, one in django.utils.tzinfo, which
> seems to have more extensive tests, and a new one in
> django.utils.timezone. The old one is still in use, both by old code
> e.g. in django/contrib/syndication/views.py and new code e.g.
> django/utils/dateformat.py under DateFormat.__init__
>
> The new one is used in other places.
Yes. I spent a lot of time on this problem, and I eventually decided
to add a second implementation of the LocalTimezone, because there are
too many issues in the "old" implementation to build more code upon
it. The "new" implementation focuses on correctness and simplicity.
Here are the problems of the "old" implementation.
Its __init__ method takes an argument, which is problematic for
pickling and deepcopying. From Python's docs:
> Special requirement for pickling: A tzinfo subclass must have an __init__() method that can be called with no arguments, else it can be pickled but possibly not unpickled again. This is a technical requirement that may be relaxed in the future.
Adding a __getinitargs__ method resolved this problem, but it's magic,
and we're still doing something that's officially discouraged.
Also, I don't like the hack in the "old" implementation to support
post-2037 dates. Sure, it doesn't crash, but it can return wrong data,
because DST doesn't apply during the same period every year. So much
for refusing the temptation to guess in the face of ambiguity :)
This inaccuracy is irrelevant for syndication (dates are in the past)
and acceptable for display purposes in dateformat (naturaltime will
return "in X years", an offset of 1 hour doesn't matter). But when
we're saving data in the database, we can't take the risk to corrupt
it.
> Perhaps we just need to combine the two LocalTimezone implementations in
> tzinfo.py? If we can't do that for some backward compatibility reasons,
> can we have some explanation, and preferably a deprecation path for the
> older code? Finally, do we need to address code that uses the old
> LocalTimezone in some way - should it be unconditionally using our own
> LocalTimezone instead of something from pytz when available?
There are two problems in deprecating the "old" implementation:
- the output of self.tzname() would change — it depends on the
argument taken by __init__;
- post-2037 dates would no longer be supported.
To sum up:
- the "old" implementation isn't suitable for the needs of my branch;
- there's a lot of history, and we can't deprecate it without creating
regressions: https://code.djangoproject.com/log/django/trunk/django/utils/tzinfo.py
> There is no explanation of the need for these two, or when they should
> be used, which to me seems like a minimum requirement.
I've added a comment.
>> - It includes the time zone for aware datetime objects. It raises
>> an exception for aware time objects.
>
> First instance of 'aware' should be 'naive'. Second sentence should end
> 'aware datetime or time objects' for completeness.
The current wording describes what my branch does:
- naive datetime => output without TZ info
- aware datetime => output with TZ info
- naive time => output without TZ info
- aware time => exception
> == tests ==
>
> I haven't actually reviewed any of the test code - I ran out of energy,
> and test code is so dull, sorry!
>
> I did run coverage for the tests and found:
>
> - without pytz installed, the new LocalTimezone class
You apparently ran out of energy mid-sentence :) Do you remember what
you were going to say?
Thanks again for the review!
--
Aymeric.
>> I haven't actually reviewed any of the test code - I ran out of energy,
>> and test code is so dull, sorry!
>>
>> I did run coverage for the tests and found:
>>
>> - without pytz installed, the new LocalTimezone class
>
> You apparently ran out of energy mid-sentence :) Do you remember what
> you were going to say?
At that point I realised that a previous comment in my email was wrong
(I thought I'd found a bug in the new version of LocalTimezone), went
back and fixed it, but never deleted that line! Please ignore that half
finished sentence :-)
Luke
--
"Where a person wishes to attract, they should always be ignorant."
(Jane Austen)
Luke Plant || http://lukeplant.me.uk/
Luke
On 10/11/11 07:46, Aymeric Augustin wrote:
> Hi Luke,
>
> Thanks for the review! I've updated the branch with your suggestions.
>
> Here are a few comments -- when I have something to say other than
> "fixed, thanks".
--
"In your presence there is fullness of joy; at your right hand are
pleasures forevermore" Psalm 16:11
Luke Plant || http://lukeplant.me.uk/
So, at least according to this simple test there is nothing to
worry about performance-wise.
This part of the documentation raises one question:"""
Unfortunately, during DST transitions, some datetimes don't exist or
are
ambiguous. In such situations, pytz_ raises an exception. Other
:class:`~datetime.tzinfo` implementations, such as the local time zone
used as
a fallback when pytz_ isn't installed, may raise an exception or
return
inaccurate results. That's why you should always create aware datetime
objects
when time zone support is enabled.
"""
Won't this lead to web applications where errors suddenly start
popping up all over the place, but only twice (or once?) a year?
A setting "ON_NAIVE_DATETIMES=[do_nothing|log|error]" could be nice,
at least for testing. This way you could spot the errors early on. I
don't know if that would be too expensive performance wise, though.
One idea is to use "SELECT col AT TIME ZONE 'GMT+2' as col". This way
the session time zone doesn't need to be changed. For example:
> SET TIME ZONE 'UTC';
> create table testt(dt timestamptz);
> insert into testt values(now());
> select dt at time zone 'GMT+2' as dt from testt;
2011-11-18 07:06:47.834771
> select dt at time zone 'Europe/Helsinki' as dt from testt;
2011-11-18 11:06:47.834771
Oracle seems to have a similar feature [1]. I don't know if other
databases offer similar functionality. MySQL has probably support for
something similar when using TIMESTAMP columns.
But as said, I don't see this as a major issue. It would be nice to
support this some day.
I hope to see this in trunk soon.