Timezone-aware storage of DateTime

2,406 views
Skip to first unread message

Daniel Swarbrick

unread,
May 31, 2011, 9:23:35 AM5/31/11
to Django developers
I can almost hear the collective sigh as this topic once again rears
up ;-)

I am currently developing an app with Django that uses the SQLite
backend, and I noticed that Django stores DateTime fields as naive
(eg. non TZ-aware), local timestamps, making these databases non-
portable to servers running in different timezones.

I know this problem has been discussed several times before, but there
seems to have been some confusion in the past about how Postgres
stores TIMESTAMP values. Postgres's documentation states:

"For TIMESTAMP WITH TIME ZONE, the internally stored value is always
in UTC... An input value that has an explicit time zone specified is
converted to UTC using the appropriate offset for that time zone. If
no time zone is stated in the input string, then it is assumed to be
in the time zone indicated by the system's [or client connection's]
timezone parameter, and is converted to UTC using the offset for the
timezone zone."

It goes on:

"When a TIMESTAMP WITH TIME ZONE value is output, it is always
converted from UTC to the current timezone zone, and displayed as
local time in that zone."

Ok, so we've established that although Postgres TIMESTAMP WITH TIME
ZONE is TZ-aware, the internal storage is UTC. This pretty much
follows the rule for filesystem timestamps too - convert everything
back to UTC for internal storage.

MySQL has two column types capable of storing a DateTime - the
DATETIME and TIMESTAMP types. The DATETIME type can be likened to a
Python naive datetime (or a Postgres TIMESTAMP (without time zone)),
whereas the MySQL TIMESTAMP type once again stores values internally
as UTC, and when values are retrieved (SELECTed), they are displayed
as a local time for the client's current timezone.

Let me demonstrate. This was done on a server running the UTC+12
timezone.

mysql> SELECT @@global.time_zone, @@session.time_zone;
+--------------------+---------------------+
| @@global.time_zone | @@session.time_zone |
+--------------------+---------------------+
| SYSTEM | SYSTEM |
+--------------------+---------------------+

mysql> create table foo ( col_dt datetime, col_ts timestamp );

mysql> insert into foo values (now(), now());

mysql> select * from foo;
+---------------------+---------------------+
| col_dt | col_ts |
+---------------------+---------------------+
| 2011-06-01 00:19:43 | 2011-06-01 00:19:43 |
+---------------------+---------------------+

Now, if I change the client session's timezone...

mysql> set session time_zone = "+02:00";

mysql> select * from foo;
+---------------------+---------------------+
| col_dt | col_ts |
+---------------------+---------------------+
| 2011-06-01 00:19:43 | 2011-05-31 14:19:43 |
+---------------------+---------------------+

The naive DATETIME value is unchanged, and technically now incorrect
for the client session's timezone. The TIMESTAMP value has been
correctly adjusted however, to represent the same point in time, but
in the client session's new timezone.

In django.db.backends.postgresql_psycopg2.creation, we can see that a
DateTime field type is mapped to a TZ-aware "timestamp with time zone"
Postgres column type. However, in django.db.backends.mysql.creation,
we can see that the DateTime field type is mapped to a naive
"datetime" MySQL column type. Why is this?

Ok, the MySQL documentation states that a DATETIME column type can
store dates in the range '1000-01-01' to '9999-12-31'. However the
TIMESTAMP column type can only store dates in the range '1970-01-01
00:00:01' UTC to '2038-01-19 03:14:07' UTC, like a true Unix epoch
timestamp (seemingly regardless whether it's on 32 bit or 64 bit
platform).

Postgres on the other hand (even when using a "timestamp with time
zone") has a broader range of 4713 BC to 294276 AD. Ok, so it's fairly
obvious, MySQL's timestamp support is inferior to that of Postgres.
Let's not get sidetracked though.

Coming back to the original topic of timestamps in SQLite... One only
has to Google for "sqlite timestamp timezone", to see how often
developers are bitten by naive timestamp storage - even on iPhones! It
seems that a lot of apps store datetimes in SQLite as local, naive
values, assuming that that DB is never going to be moved to a
different timezone.

The way I see it, there are a few options for storage of timestamps in
SQLite (whose docs clearly acknowledge that it does not officially
support a timestamp column type).

1. Store timestamps as a Unix epoch value (integer). This is fast, and
will ensure correct date sorting. On the other hand, it's not really
human-friendly if manually browsing table data in SQLite.

2. Store timestamps as a string, but converted to UTC, eg. "2011-05-31
12:19:43". Since the TZ is not obvious from that string, Django would
need to be aware that it was UTC, and apply the appropriate offset to
convert to a local time. This means adding a dependency like pytz to
Django.

3. Store timestamps as a string, but append a timezone, eg.
"2011-06-01 00:19:43 +12:00". This would appear to be the safest,
least disruptive solution, since apps that don't use multiple
timezones would not notice any difference. The Python datetime values
would still be in the local timezone, but would no longer be naive.
Apps that needed to work with multiple timezones would now have the
information required to convert these local times back to a UTC value,
which is a lot safer for doing date arithmetic (especially when
crossing daylight saving transitions). One caveat however - SQL
sorting of timestamps that used a variety of UTC offsets would not
necessarily be correct, since it will be a simply string-sorting
algorithm.

Incidentally, "2011-05-31 12:19:43 +02:00" is not necessarily the same
as "2011-05-31 12:19:43" in "Europe/Berlin" timezone. The Olson/tzdata
timezone names are more than just a UTC offset - they also infer
whether daylight saving is active.

In the absence of SQLite handling the conversion and internal storage
as UTC itself automatically, I think it's important that Django add
that crucial UTC offset information to the timestamp-strings, or store
all timestamps as UTC. At least then the information is normalized,
and individual app developers can decide if they want to add full
timezone conversion machinery to their apps.

VernonCole

unread,
Jun 1, 2011, 9:16:13 AM6/1/11
to Django developers
On May 31, 7:23 am, Daniel Swarbrick <daniel.swarbr...@gmail.com>
wrote:
> I can almost hear the collective sigh as this topic once again rears
> up ;-)
>
Thanks for bringing it up. I lurk both here and on db-sig, and intend
to use your well done research to once again attempt an update of the
db-api PEP, which also does not address time zone aware time. I like
the idea of standardizing a cross-platform TZ aware data column.
[...]
> The way I see it, there are a few options for storage of timestamps in
> SQLite (whose docs clearly acknowledge that it does not officially
> support a timestamp column type).
>
> 1. Store timestamps as a Unix epoch value (integer). This is fast, and
> will ensure correct date sorting. On the other hand, it's not really
> human-friendly if manually browsing table data in SQLite.

Not only is the epoch limited in range, I would be wary of using any
storage based on the python time module (as opposed to datetime). We
have discovered enough subtle bugs in the Windows and IronPython
implementations that we gave up trying to get an accurate test of the
python time option on adodbapi. Things that worked correctly for me in
North America would fail for Mark Hammond in Australia. It is a maze
of twisty little passages. We simply recommend that everyone use
datetime.

On the other hand, I found it necessary to convert date-time values to
ISO format strings in order to feed them to Microsoft ACCESS databases
in some cases, and that works well. But again, as with SQLite, the DB
has no concept of time zones. In the absence of actual time zone
support in the database, any action taken by django is going to
inconvenience somebody, and will likely not be compatible with non-
django use of the same database. Perhaps it would work to store the
pickled tz-aware datetime on brain-damaged databases. But, is it
reasonable to use an application-specific method to extend the
capability of a database engine?
--
Vernon Cole

Stephen Burrows

unread,
Jun 2, 2011, 12:34:54 AM6/2/11
to Django developers
Django actually already adds support for some capabilities to certain
database backends, if I'm not mistaken - such as cascades through GFKs
on delete.

In terms of time zones, could django "assume"/ensure that the datetime
stored in the backend is UTC, then handle the conversion to the local
timezone itself?

Daniel Swarbrick

unread,
Jun 4, 2011, 4:54:58 PM6/4/11
to Django developers
On Jun 1, 3:16 pm, VernonCole <vernondc...@gmail.com> wrote:
> On the other hand, I found it necessary to convert date-time values to
> ISO format strings in order to feed them to Microsoft ACCESS databases
> in some cases, and that works well. But again, as with SQLite, the DB
> has no concept of time zones. In the absence of actual time zone
> support in the database, any action taken by django is going to
> inconvenience somebody, and will likely not be compatible with non-
> django use of the same database.  Perhaps it would work to store the
> pickled tz-aware datetime on brain-damaged databases.  But, is it
> reasonable to use an application-specific method to extend the
> capability of a database engine?

The problem with storing pickled datetime objects is that they will
most likely not be SQL-sortable. Consider the case of needing to sort
a queryset by a timestamp field. This has to be achievable by the SQL
backend, especially if it is a large dataset. Requiring Python to
unpickle (or otherwise deserialize) a field for each row in a dataset,
before doing a list sort, will not be efficient. It would also make it
near impossible to do a "delete from foo where timestamp_field <
'2011-01-01 12:34:56'"

Whilst the idea of storing well formatted timestamp strings that
include a UTC offset, it also fails when it comes to doing an SQL
"order by". For example:

"2011-06-03 01:01:00 +02"
"2011-06-03 01:02:00 +08"
"2011-06-03 01:03:00 +12"

Using an ascending alphabetical sort, the timestamps will be listed in
the order above. However, if we convert them to UTC, it becomes clear
that the last value in the list is actually the earliest time, when
normalized to UTC.

"2011-06-03 01:01:00 +02" (2011-06-02 23:01:00 UTC)
"2011-06-03 01:02:00 +08" (2011-06-02 17:02:00 UTC)
"2011-06-03 01:03:00 +12" (2011-06-02 13:03:00 UTC)

So whilst appending the UTC offset on the end is a convenient way of
storing it in the same field as the timestamp itself, it leads to
sorting errors. I think for this reason, the best option for SQLite
may be to always store timestamps as UTC, and require Django to
convert them to a local time (based on settings.TIME_ZONE perhaps as a
default). That would make the timestamps in the above example sort
correctly. That particular example is perhaps a bit extreme, with
wildly different UTC offsets in the same table, but consider daylight
saving transitions, where, for example, the offset may change from UTC
+01 to UTC+02. That would mean that timestamps close to midnight could
potentially be incorrectly sorted. Even worse, if you were grouping by
a date_trunc function, grouping by month for example, you could end up
with records being grouped into the wrong month. Then there is the fun
and games when daylight saving finishes, and there is a one hour
period where naive, local-TZ timestamps actually repeat, as wall clock
jumps back one hour.

At least with UTC timestamps, such phenomenon don't occur - same UTC+0
offset, all year round.

Daniel Swarbrick

unread,
Jun 4, 2011, 5:35:49 PM6/4/11
to Django developers
An alternative to perhaps consider is an expanded range Unix epoch-
like timestamp. The Postgres timestamp type is stored as a 64 bit
integer, with 1 microsecond resolution.

SQLite supports storing integers with up to 8 bytes, so it would be
feasible to mimic the internal storage of a Postgres timestamp value
in SQLite, and it's only a question of mathematics to convert that to
a Python datetime.

Considering that the SQLite shell is fairly limited, and not that nice
to use, I don't think it's terribly important how the values look in a
raw SQL select (eg. whether they're human readable or not).... so long
as they sort correctly, are easy to convert to/from Python types, and
can also be handled by SQLite's date/time functions (and general math
operations as needed).

I suspect that Postgres's timestamp implementation is based on the
Julian day, fit into a 64 bit integer. The Julian day is is a measure
of "the interval of time in days and fractions of a day since January
1, 4713 BC Greenwich noon" - the same earliest possible date in
Postgres.

(http://en.wikipedia.org/wiki/Julian_day)

Daniel Greenfeld

unread,
Jun 5, 2011, 5:16:46 PM6/5/11
to django-d...@googlegroups.com
I think this topic is not realistic.

If you store DateTime in another format then what the database is designed to deliver, then you don't just lose sorting and search capabilities, you also lose the ability for that data to be read and understood trivially by other tools besides Django. Which means the task of moving data from one database to another, or applying non-Django tools to the data suddenly has become much more complicated - to the point that non-Django people looking at the database will rightly wonder why the framework is obfuscating data.

Daniel Greenfeld

Daniel Swarbrick

unread,
Jun 5, 2011, 7:06:38 PM6/5/11
to Django developers
On Jun 5, 11:16 pm, Daniel Greenfeld <pyda...@gmail.com> wrote:
> If you store DateTime in another format then what the database is designed
> to deliver, then you don't just lose sorting and search capabilities, you
> also lose the ability for that data to be read and understood trivially by
> other tools besides Django.

The problem in the case of SQLite is that it is not _designed_ to
deliver any kind of timestamp. It simply gets stored as a completely
unintelligent ascii string. Sorting of such fields is alphanumeric,
not chronological, and any date/time functions in SQLite are
essentially doing a strptime() on the string before processing it. I
have no idea why the developers of SQLite chose to omit such an
important field type.

Julian dates are supported by many platforms, including SQLite:

sqlite> select julianday("2011-06-04 12:45:55");
2455717.03188657

...which can easily be stored in SQLite's REAL column type. An IEEE
float can give you about 1 µs resolution for Julian dates. And Julian
dates can be converted back to ISO timestamps:

sqlite> select datetime(2455717.03188657);
2011-06-04 12:45:55

Any DB client that did not have its own Julian date functions can
simply include the conversion in the select query, eg. "select col1,
col2, datetime(col3) from foo".

I realize it's a pretty tall order to expect Django to start storing
DateTime fields as Julian dates in SQLite, because of the backwards
compatibility implications. The cool thing is however, that SQLite's
datetime() function seems to handle a variety of input formats anyway:

sqlite> select datetime(2455717.03188657);
2011-06-04 12:45:55
sqlite> select datetime("2455717.03188657");
2011-06-04 12:45:55
sqlite> select datetime("2011-06-04 12:45:55");
2011-06-04 12:45:55

As a slight digression, I did some experimenting with Python's sqlite3
API to see how it would handle a non-naive datetime object. First, I
inserted a naive datetime, then two non-naive datetimes, two seconds
apart, but in different timezones. The unsorted order was:

sqlite> select * from test;
2011-06-05 00:20:02.788742
2011-06-05 00:20:02.788742+02:00
2011-06-04 17:20:04.793494-05:00

And here we can see the alphanumeric sorting incorrectly place the
chronologically-last timestamp at the start of the list:

sqlite> select * from test order by col1;
2011-06-04 17:20:04.793494-05:00
2011-06-05 00:20:02.788742
2011-06-05 00:20:02.788742+02:00

I think this clearly indicates that datetime objects *must* be
normalized to a single timezone for SQLite, and that timezone should
ideally be UTC. If all timestamps are in the same timezone, they'll at
least sort correctly (albeit still using an alphanumeric sort).
Alternatively, get true numeric/chronological sorting by using a REAL
column type and storing Julian dates. It'll be more storage-efficient
than an ascii string too.

Luke Plant

unread,
Jun 7, 2011, 10:30:23 AM6/7/11
to django-d...@googlegroups.com
On 05/06/11 22:16, Daniel Greenfeld wrote:
> I think this topic is not realistic.

Exactly!

There seems to be the assumption in other messages in this thread that
Django 'owns' the database. That is not the philosophy Django takes - it
interfaces with the DB. In some cases, Django 'owns' the data and would
have the liberty of choosing/changing the format e.g. in tables provided
by contrib apps, such as the session data which stores pickled Python
objects, or the format of the password in the auth.User table. But
globally changing the format of how datetimes are stored in SQLite is
just not an option (at least not until Django 2.0), unless we can
guarantee backwards compatibility with existing data and other apps use
of the data.

Regards,

Luke

--
"We may not return the affection of those who like us, but we
always respect their good judgement." -- Libbie Fudim

Luke Plant || http://lukeplant.me.uk/

Daniel Swarbrick

unread,
Jun 8, 2011, 5:17:52 AM6/8/11
to Django developers
On Jun 7, 4:30 pm, Luke Plant <L.Plant...@cantab.net> wrote:
> There seems to be the assumption in other messages in this thread that
> Django 'owns' the database. That is not the philosophy Django takes - it

In the case of SQLite, it just plain sucks, because the DB is too
stupid to support a true timestamp data type. On the other hand, this
should have been foreseen when designing the Python SQLite API. I've
not read anything about SQLite supporting true timestamps any time
soon, so IMHO, something has to take ownership of this problem.

Normally I would use Postgres for my Django apps, but I'm developing
one in particular that will run on embedded Linux boxes, and they
don't have the resource to run Postgres.

> globally changing the format of how datetimes are stored in SQLite is
> just not an option (at least not until Django 2.0), unless we can
> guarantee backwards compatibility with existing data and other apps use
> of the data.

Backwards compatibility is always going to be a thorny issue for
things that were not designed properly in the first place...

As I suspected would be the case, I will probably roll my own custom
field that handles this correctly, and maybe in a few years time
somebody else will raise this issue... again.

Sam Bull

unread,
Jun 27, 2011, 3:12:52 PM6/27/11
to django-d...@googlegroups.com
On 2011-06-02, at 12:34 AM, Stephen Burrows wrote:

> Django actually already adds support for some capabilities to certain
> database backends, if I'm not mistaken - such as cascades through GFKs
> on delete.
>
> In terms of time zones, could django "assume"/ensure that the datetime
> stored in the backend is UTC, then handle the conversion to the local
> timezone itself?

Isn't this a viable solution? It's the universal, cross-timezone-friendly datetime objects that we care about, not the timezones they were in, right?

- If the DB supports timezone-aware datetime fields, just pass the datetime object through as is.

- If the DB doesn't support timezone-aware datetime fields, convert to UTC and store the datetime as UTC

- If the datetime has tz info, convert the time to UTC using that

- If the datetime doesn't have tz info, convert using the default timezone specified in the settings

When retrieving the data from a non-timezone-aware db, convert them from UTC to the project's timezone.

Sam

Stephen Burrows

unread,
Jun 28, 2011, 9:26:23 AM6/28/11
to Django developers
I agree that it would be nice to be able to store tz-aware datetimes -
but if django were going to do so, it ought to store the datetimes
with the timezone information intact, rather than converting
everything to the project's timezone. So, if a conversion to UTC were
to take place, there would need to be a separate field in the database
to store the timezone.

Sam Bull

unread,
Jun 28, 2011, 11:01:33 AM6/28/11
to django-d...@googlegroups.com
I figured that for most cases you wouldn't actually care what the timezone was. You would either always be displaying things in a timezone relative to the user, or you'd use some other external information (ex. coordinates of person or thing attached to the datetime object) to derive a contextually appropriate timezone.

For cases where you do want to retain the timezone, how about allowing the TZ-aware DateTime field to store the TZ data to an optional secondary field? That field could be a regular CharField. It wouldn't be needed for any of the date-based sorting or filtering. There's precedent for this sort of behaviour with the GenericForeignKey, and I think it would be fine, especially if the secondary field was optional.

What do you think?

Sam

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>

Michael Blume

unread,
Jun 28, 2011, 11:41:58 AM6/28/11
to django-d...@googlegroups.com

This seems like probably the correct solution to me. I'd like to suggest that the time itself still be stored in UTC so that reading the timezone field is (somewhat) optional

Daniel Swarbrick

unread,
Jun 29, 2011, 5:27:28 AM6/29/11
to Django developers
On Jun 28, 3:26 pm, Stephen Burrows <stephen.r.burr...@gmail.com>
wrote:
> I agree that it would be nice to be able to store tz-aware datetimes -
> but if django were going to do so, it ought to store the datetimes
> with the timezone information intact, rather than converting
> everything to the project's timezone. So, if a conversion to UTC were
> to take place, there would need to be a separate field in the database
> to store the timezone.

As I've already illustrated earlier in this thread, storing the TZ
with the timestamp in a DB that will treat it as a text string (such
as SQLite) is going to break chronological SQL sorting. Perhaps
contrary to popular misconception, Postgres does not store the TZ in
"timestamp with time zone" fields. They are normalized to UTC and
stored as Julian dates. The TZ is interpreted on input of a timestamp
value, if the entered value is being specified in a TZ other than the
client connection's current TZ. The TZ is also displayed in a SELECT
of a timestamp field - but the TZ is that of the client connection,
not that of the originally inserted value.

Storing the TZ in a separate field is going to get pretty nasty for
sorting as well, as the query will have to somehow concatenate or
compute together the two fields before applying the sorting algorithm.

I think the best way to handle multi-TZ timestamps is the way that
Unix and other multi-user, multi-timezone OSes have been doing it for
decades (mostly wrt to filesystems) - respect the TZ of the client
inputting the data, normalize to a common TZ (usually UTC) and store
as such. Perform all sorting and timestamp comparison/arithmetic in
that single, normalized TZ. Upon output, render the value in the
client's TZ.

...and that is essentially what Postgres does.

Ryan McIntosh

unread,
Jun 29, 2011, 6:01:50 AM6/29/11
to django-d...@googlegroups.com
> ... but if django were going to do so, it ought to store the datetimes with the timezone information intact, rather than converting everything to the project's timezone. So, if a conversion to UTC were to take place, there would need to be a separate field in the database to store the timezone.

Is that a crutch of your database?  It isn't something I would ever live with.  All the *sql database servers I hate have an idea of what a timestamp+offset is.  Django should leverage this.  IMHO, no such field should exist in the underlying database but for ticks since epoch if tzs aren't supported...  Everything useful should be derived if necessar...

I apologize for entering tardis y con mas fortis allah.  I have been following for a bit and honestly, if other devs on this thread don't step up, I will.  [Here] be silly dragons....

No pointing fingers though... ;-)

Sam Bull

unread,
Jun 29, 2011, 9:59:49 AM6/29/11
to django-d...@googlegroups.com

On 2011-06-29, at 5:27 AM, Daniel Swarbrick wrote:

> Storing the TZ in a separate field is going to get pretty nasty for
> sorting as well, as the query will have to somehow concatenate or
> compute together the two fields before applying the sorting algorithm.

I don't see why. If the datetime data is all normalized to UTC then the datetime column can be used for sorting on its own and the tz data can be ignored.

> I think the best way to handle multi-TZ timestamps is the way that
> Unix and other multi-user, multi-timezone OSes have been doing it for
> decades (mostly wrt to filesystems) - respect the TZ of the client
> inputting the data, normalize to a common TZ (usually UTC) and store
> as such. Perform all sorting and timestamp comparison/arithmetic in
> that single, normalized TZ. Upon output, render the value in the
> client's TZ.
>
> ...and that is essentially what Postgres does.

I think we agree here. I'm suggesting the separate tz field as a nice-to-have. I think Django has everything it needs to make DateTimeField timezone-aware. It sounds like its already doing the right thing with postgresql. It just needs a graceful fallback for other DBs that don't do timezones as well.

Here's my proposal for adding timezone support to DateTimeField:

1. Assume timezone-less datetimes are in the timezone specified in settings and convert them to timezone-aware datetimes
2. If the db bridge doesn't understand timezone data, normalize the datetimes to UTC before sending them through
3. When retrieving datetimes from the db, assume they are in UTC if they don't come with a timezone already attached
4. Convert the datetimes to the timezone specified in settings

Bonus:
5. Allow the user to specify a field in the DateTimeField declaration to be used for storing timezone data
6. When storing, extract the timezone and record it in this field
7. When retrieving, convert the datetime from UTC to the timezone specified in the field

I could put together a patch for this solution if you folks think it looks promising.

Sam

Daniel Swarbrick

unread,
Jun 30, 2011, 4:51:41 AM6/30/11
to Django developers
On Jun 29, 12:01 pm, Ryan McIntosh <thebigsl...@gmail.com> wrote:
> Is that a crutch of your database?  It isn't something I would ever live
> with.  All the *sql database servers I hate have an idea of what a
> timestamp+offset is.  Django should leverage this.  IMHO, no such field
> should exist in the underlying database but for ticks since epoch if tzs
> aren't supported...  Everything useful should be derived if necessar...

I think there is a misunderstanding of UTC and timezones in general in
this thread. If TZ-aware datetime handling were to be added to Django,
it would almost certainly require the pytz module. That module
contains sufficient information to convert any UTC timestamp to any TZ
in the world, dating back sometimes as far as a couple of hundred
years, and reaching forward in the future (pending any future changes
of DST rules for a particular timezone).

It is redundant to store the original TZ in a clip-on field, if the
original input is properly handled. Take for example the following
three timestamps:

'1969-07-20T20:17:40+00:00'
'1969-07-20T16:17:40-04:00'
'1969-07-20T21:17:40+01:00'

They are all the same moment in time. The fact that they are expressed
as different local times is only relevant to the observer. Which one
will a (half-decent) DB engine store? The UTC one. Which one do I want
to show my users? Well, that depends which timezone they live in...
NOT which timezone the data was input.

PS: The timestamp example I've used, incidentally, is the Apollo 11
moon landing. Which timezone was it on the moon? UTC, like most things
in astronomy. Which timezone would TV news channels have reported?
Probably whichever TZ that news channel was broadcast in... all around
the world.

PPS: The moon landing occurred a few months too early to be able to be
stored in a Unix epoch timestamp. That rules out MySQL's "timestamp"
field....

With pytz, I can convert any non-naive timestamp to any other TZ that
I need to. And if I have users in multiple timezones, then I most
likely will want to render timestamps in multiple timezones. The only
situation I can think of where timestamps are nearly always expressed
in the local TZ are flight departure/arrivals. And since you would
most likely also be storing the departure/arrival *airport*, you can
deduce from there what TZ the timestamp should be displayed as, in
order to be a "local time".

On Jun 29, 3:59 pm, Sam Bull <osir...@gmail.com> wrote:
> I think we agree here. I'm suggesting the separate tz field as a nice-to-have. I think Django has everything it needs to make DateTimeField timezone-aware. It sounds like its already doing the right thing with postgresql. It just needs a graceful fallback for other DBs that don't do timezones as well.

Django simply sets the client connection TZ (from settings.py) when
connecting to Postgres. From that point on, Postgres will assume that
all naive timestamps Django sends it (which at the moment, is all of
them...) should be interpreted as being in that TZ. Postgres still
stores "timestamp with time zone" fields internally as UTC Julian
dates however. It will also by default render them in a SELECT as the
client connection's timezone. You can however explicitly specific the
TZ when doing an INSERT, eg

insert into foo values ('1999-01-08 04:05:06 -8:00');

You can also convert timestamps explicitly to other timezones in the
Postgres backend itself, using the "AS TIME ZONE" operator. It should
be sufficient however for Postgres to simply return the timestamp+TZ
(*any* TZ), for Python to construct a non-naive datetime with tzinfo
struct.

Believe it or not, MySQL also does things much the same way as
Postgres, when using the MySQL "timestamp" field type. MySQL has a
client connection TZ, and will assume any input timestamps that are
not qualified with a TZ to be in that client connection's TZ. It also
implicitly converts the stored UTC timestamp values to the client
connection's TZ. I gave a clear example of that happening in an
earlier post. BUT.... Django does not use MySQL's "timestamp" field
type for storing datetime objects - it instead uses the MySQL
"datetime" field type, which does NOT support any concept of TZ...
that is, they are stored as naive datetimes.

The main difference between MySQL's "timestamp" field type and
PostgreSQL's "timestamp with time zone" field type (and indeed also
MySQL's "datetime" field type) is the date range that they support.
MySQL's "timestamp" field type is modeled on Unix epoch timestamps,
and therefore does not support dates earlier than 1 Jan, 1970 00:00:00
UTC. PostgreSQL's "timestamp with time zone" field type on the other
hand is modeled on Julian dates, and supports dates ranging from 4713
BC to 294276 AD (with 1 µs accuracy, I might add). And MySQL's
"datetime" field type supports some bizarre range of
'0000-00-00' (yes, you really can specify the zero'th day of the
zero'th month... isn't that cool?) to '9999-12-31'. One can only hope
that Oracle's acquisition of MySQL might one day lead to better SQL
compliance.

I think the real issue here is that Django should really start to
think pretty hard about using non-naive datetimes, everywhere. We
already have pretty good l10n and i18n support in Django. So if we
acknowledge the fact that we have users with different locales/
languages, why do ignore the fact that they're often in different
timezones too?

Tom Evans

unread,
Jun 30, 2011, 5:06:49 AM6/30/11
to django-d...@googlegroups.com

That is a domain specific assertion, I don't believe that is true in
all cases. Typically, when we put things into the ORM, we expect to
get the same things out again. If you consider viewing a timeline of a
users activities, we might want to display the datetime of an event in
both the viewing users timezone and the viewees timezone. If that
information is discarded as irrelevant, then we could not do so.

I could certainly see what you are describing as a specific field - a
NormalizedDateTimeField, which explicitly normalizes to UTC for
storage and to settings.TZ for display - but it shouldn't be the only
way.

Cheers

Tom

Daniel Swarbrick

unread,
Jun 30, 2011, 8:14:50 AM6/30/11
to Django developers
On Jun 30, 11:06 am, Tom Evans <tevans...@googlemail.com> wrote:

> That is a domain specific assertion, I don't believe that is true in
> all cases. Typically, when we put things into the ORM, we expect to
> get the same things out again. If you consider viewing a timeline of a
> users activities, we might want to display the datetime of an event in
> both the viewing users timezone and the viewees timezone. If that
> information is discarded as irrelevant, then we could not do so.

I personally find it relatively useless to know what the wall clock
time was when something happened in another timezone. I find it more
relevant to know what time it was for me, when said event occurred. In
most such cases, if the local time of the event was of importance,
then the *location* of the event would also be important, and probably
stored in the same DB model. If pytz is utilized, then we have a full
database of timezones at our disposal, and can derive from a location
what the local TZ was in that location at that particular moment in
time. More accurately, we can ascertain what the UTC offset was, and
whether DST was in effect - since those rules have changed through
history, and will continue to change.

Western Samoa for example, will move to the other side of the
international date line on December 29th this year, effectively
jumping from a UTC-11 offset to a UTC+11 offset (skipping an entire
day in the process). Whilst such events are rare, it illustrates that
it is more useful to store the location of an event, plus have access
to the Olson tzdata DB, than to simply store a -11 or +11 offset
(which does not indicate whether DST was in effect). "Pacific/Samoa"
on the other hand can be looked up in tzdata, which will have an entry
stating that from 4 July 1892 until 29 December 2011, Samoa's UTC
offset was -11, at which point it changes to +11.

For correct chronological sorting of events (which may occur in
multiple different timezones), or calculating deltas between
timestamps, life is a lot easier if you normalize to a common TZ, eg.
UTC, and only apply a local TZ offset when rendering the output to the
user.

> I could certainly see what you are describing as a specific field - a
> NormalizedDateTimeField, which explicitly normalizes to UTC for
> storage and to settings.TZ for display - but it shouldn't be the only
> way.

This is exactly what Postgres is doing internally - normalizing to
UTC. It does not, I repeat, does NOT store the TZ. The client
connection's TZ is merely dynamically added/subtracted to/from the
normalized UTC timestamps coming from the storage backend.

The fact that MySQL's datetime field type doesn't normalize to UTC is
a question for the MySQL developers... maybe they all came from the
same village, and had no need for timezones.

>
> Cheers
>
> Tom

Tom Evans

unread,
Jun 30, 2011, 9:58:51 AM6/30/11
to django-d...@googlegroups.com
On Thu, Jun 30, 2011 at 1:14 PM, Daniel Swarbrick
<daniel.s...@gmail.com> wrote:
> I personally find it relatively useless to know what the wall clock
> time was when something happened in another timezone.

Right, but this isn't DanielsWebFramework, we need to cover more than
what you find personally useful.

Maybe what I'm suggesting is incredibly esoteric, and should only
belong in TomsWebFramework, but I can definitely see the utility in a
DateTimeField that is TZ-aware and reconstitutes precisely the
TZ-aware datetime that was put into it - so not just the UTC offset,
but the specific TZ.

I'm aware that this isn't how databases handle this internally, this
is why we have an ORM, in order to be clever.

Cheers

Tom

Andy Dustman

unread,
Jun 30, 2011, 11:12:18 AM6/30/11
to django-d...@googlegroups.com
On Thu, Jun 30, 2011 at 4:51 AM, Daniel Swarbrick
<daniel.s...@gmail.com> wrote:

> The main difference between MySQL's "timestamp" field type and
> PostgreSQL's "timestamp with time zone" field type (and indeed also
> MySQL's "datetime" field type) is the date range that they support.
> MySQL's "timestamp" field type is modeled on Unix epoch timestamps,
> and therefore does not support dates earlier than 1 Jan, 1970 00:00:00
> UTC. PostgreSQL's "timestamp with time zone" field type on the other
> hand is modeled on Julian dates, and supports dates ranging from 4713
> BC to 294276 AD (with 1 µs accuracy, I might add). And MySQL's
> "datetime" field type supports some bizarre range of
> '0000-00-00' (yes, you really can specify the zero'th day of the
> zero'th month... isn't that cool?) to '9999-12-31'. One can only hope
> that Oracle's acquisition of MySQL might one day lead to better SQL
> compliance.

Uh, no. 0000-00-00 is specifically an illegal value in MySQL. Invalid
dates are converted to 0000-00-00 unless you are in strict mode, in
which case they raise error conditions. The actual supported range is
from 1000-01-01 to 9999-12-31.
http://dev.mysql.com/doc/refman/5.0/en/datetime.html
--
Question the answers

Ian Clelland

unread,
Jun 30, 2011, 12:31:22 PM6/30/11
to django-d...@googlegroups.com
On Thu, Jun 30, 2011 at 5:14 AM, Daniel Swarbrick <daniel.s...@gmail.com> wrote:
On Jun 30, 11:06 am, Tom Evans <tevans...@googlemail.com> wrote:

> That is a domain specific assertion, I don't believe that is true in
> all cases. Typically, when we put things into the ORM, we expect to
> get the same things out again. If you consider viewing a timeline of a
> users activities, we might want to display the datetime of an event in
> both the viewing users timezone and the viewees timezone. If that
> information is discarded as irrelevant, then we could not do so.

I personally find it relatively useless to know what the wall clock
time was when something happened in another timezone. I find it more
relevant to know what time it was for me, when said event occurred. In
most such cases, if the local time of the event was of importance,
then the *location* of the event would also be important, and probably
stored in the same DB model. If pytz is utilized, then we have a full
database of timezones at our disposal, and can derive from a location
what the local TZ was in that location at that particular moment in
time. More accurately, we can ascertain what the UTC offset was, and
whether DST was in effect - since those rules have changed through
history, and will continue to change.

While I agree with most of your points regarding database-storage of timestamps, the argument that wall clock time is 'relatively useless' certainly does not reflect my experience.

As a counterexample, I have needed to store wall-clock time as well as time zones for future events, in scheduling applications. It is exactly because of the unpredictability of future DST changes (and, in very rare cases, wholesale time zone changes) that a one-time UTC conversion simply does not work for timestamps in the future.

When an event is scheduled for a specific time, on a specific date in the future, then users expect that the web service will react at the appropriate wall clock time. An unforseen change in daylight saving time legislation should not have the effect of making the time in the database incorrect.

Storing the time zone and wall clock time in the database allows the system to update the UTC timestamp appropriately whenever the Olson DB is updated.
 

Western Samoa for example, will move to the other side of the
international date line on December 29th this year, effectively
jumping from a UTC-11 offset to a UTC+11 offset (skipping an entire
day in the process). Whilst such events are rare, it illustrates that
it is more useful to store the location of an event, plus have access
to the Olson tzdata DB, than to simply store a -11 or +11 offset
(which does not indicate whether DST was in effect). "Pacific/Samoa"
on the other hand can be looked up in tzdata, which will have an entry
stating that from 4 July 1892 until 29 December 2011, Samoa's UTC
offset was -11, at which point it changes to +11.

For correct chronological sorting of events (which may occur in
multiple different timezones), or calculating deltas between
timestamps, life is a lot easier if you normalize to a common TZ, eg.
UTC, and only apply a local TZ offset when rendering the output to the
user.

Absolutely -- UTC is the only way to go in that situation, but it might be that the UTC timestamps are just a cache, generated from the local-time/TZ combination when needed.
 
--
Regards,
Ian Clelland
<clel...@gmail.com>

Daniel Swarbrick

unread,
Jun 30, 2011, 1:06:33 PM6/30/11
to Django developers
On Jun 30, 6:31 pm, Ian Clelland <clell...@gmail.com> wrote:

> As a counterexample, I have needed to store wall-clock time as well as time
> zones for future events, in scheduling applications. It is exactly because
> of the unpredictability of future DST changes (and, in very rare cases,
> wholesale time zone changes) that a one-time UTC conversion simply does not
> work for timestamps in the future.
>
> When an event is scheduled for a specific time, on a specific date in the
> future, then users expect that the web service will react at the appropriate
> wall clock time. An unforseen change in daylight saving time legislation
> should not have the effect of making the time in the database incorrect.

That is a good example, which I agree with. Although it sounds like
more of a case for storing the date and time as separate fields (which
is already feasible), since Python dates and times on their own have
no tzinfo struct... only the combined datetime does (whether or not it
includes hours/mins/secs components).

Daniel Swarbrick

unread,
Jun 30, 2011, 12:24:05 PM6/30/11
to Django developers
On Jun 30, 5:12 pm, Andy Dustman <farcep...@gmail.com> wrote:

> Uh, no. 0000-00-00 is specifically an illegal value in MySQL. Invalid
> dates are converted to 0000-00-00 unless you are in strict mode, in
> which case they raise error conditions. The actual supported range is
> from 1000-01-01 to 9999-12-31.http://dev.mysql.com/doc/refman/5.0/en/datetime.html

I stand corrected. I stopped used MySQL for anything important back
around version 3.something, after it ate my data once too often.

The fact remains however that Postgres handled a much broader date
range, including dates before the dark ages, and that it properly
understands timezones. Timezones have existed since before the Unix
epoch.

Andy Dustman

unread,
Jun 30, 2011, 1:45:07 PM6/30/11
to django-d...@googlegroups.com
On Thu, Jun 30, 2011 at 12:24 PM, Daniel Swarbrick
<daniel.s...@gmail.com> wrote:
> On Jun 30, 5:12 pm, Andy Dustman <farcep...@gmail.com> wrote:
>
>> Uh, no. 0000-00-00 is specifically an illegal value in MySQL. Invalid
>> dates are converted to 0000-00-00 unless you are in strict mode, in
>> which case they raise error conditions. The actual supported range is
>> from 1000-01-01 to 9999-12-31.http://dev.mysql.com/doc/refman/5.0/en/datetime.html
>
> I stand corrected. I stopped used MySQL for anything important back
> around version 3.something, after it ate my data once too often.

Since 4.0 came out in the late 1990s, isn't is possible that things
have improved somewhat over your dozen year gap of ignorance?

(I'll stop feeding trolls now)
--
Question the answers

Daniel Swarbrick

unread,
Jun 30, 2011, 2:38:07 PM6/30/11
to Django developers
On Jun 30, 7:45 pm, Andy Dustman <farcep...@gmail.com> wrote:

> Since 4.0 came out in the late 1990s, isn't is possible that things
> have improved somewhat over your dozen year gap of ignorance?
>
> (I'll stop feeding trolls now)

No, it's a perfectly reasonable question that you ask. If I remember
correctly, 4.0 was even scarier than 3.23, and as such we avoided it.
The application in question was a RADIUS server backend, and it would
quite regularly lose about a few hundred thousand rows when the DB got
to over several million rows.

Last year a customer's MySQL 5.1 server (not our choice), running on
OpenSolaris, corrupted itself irreparably when it hit about 75 million
rows of CDR data. When the system was decommissioned a few months ago,
the upstream Postgres DB server was still happily chugging away with
over 96 million rows. We never lost a single record from that server.

Now I don't know whether we were making unreasonable demands of MySQL,
and these kinds of workloads are certainly not your typical blog or
web forum DB. But before somebody proposes a database jihad, let me at
least say that I'm trying to ensure that whatever offshoot arises from
this discussion, is equally well supported on all of Django's current
core DB engines... at least, as equally well as possible, given the
various DB engine characteristics.

I choose not to use MySQL based on past (and recent) experience with
it, but as always, YMMV.

Aymeric Augustin

unread,
Jun 30, 2011, 5:10:24 PM6/30/11
to django-d...@googlegroups.com

Strictly speaking, converting a naive datetime + a time zone to a "point in time" (eg. UTC timestamp) isn't possible, because of DST: one hour doesn't exist, and another one happens twice. In my opinion, the primary goal of the DateTime field is to store a "point in time" and it should do so unambiguously. Of course, we could store a naive datetime plus a time zone plus a DST flag, but that's cumbersome, I'd even say developer-hostile.

I think there's a consensus on the fact that the best representation for DateTime fields is UTC. I'm willing to trust the pytz guys on this topic:
> The preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans.

Thus, Django should always use UTC internally and on the connection with the database, and convert to local time in the forms widgets and localization functions. However, this is massively backwards incompatible in all databases that store DateTime as naive objects in local time (all but PostgreSQL). As a consequence, it won't be considered (or maybe in Django 2.0 ?)

For the example given above, I agree that storing a date + a time would be more appropriate. We could introduce a new field that keeps track of the time zone the date was entered in, but that comes after fixing DateTime fields to use UTC.

Finally, it isn't impossible to work around this problem in Django 1.3. I'm currently working on an application where I need to track time intervals exactly, so I must to store datetimes in UTC. I have written LocalDateTime model and form fields that store DateTimes in UTC (with TIME_ZONE = 'utc') but display it in local time in forms and in the admin (according to a custom setting, LOCAL_TIME_ZONE = 'Europe/Paris'). This takes around 50 lines of code. I had to monkey-patch `localize` to do the conversion on display, but otherwise it's very simple. I just have to remember to use `LocalDateTime` instead of `models.DateTime`.

Of course, since the UI is in local time, an user can still enter an ambiguous datetime. That's unavoidable: an UI in UTC isn't acceptable for my end users, and neither is an "is_dst" checkbox. But all datetimes generated internally are correct, and that's the most important for me.

Best regards,

--
Aymeric Augustin.

Ian Kelly

unread,
Jun 30, 2011, 5:25:09 PM6/30/11
to django-d...@googlegroups.com
On Thu, Jun 30, 2011 at 3:10 PM, Aymeric Augustin
<aymeric....@polytechnique.org> wrote:
> Strictly speaking, converting a naive datetime + a time zone to a "point in time" (eg. UTC timestamp) isn't possible, because of DST: one hour doesn't exist, and another one happens twice. In my opinion, the primary goal of the DateTime field is to store a "point in time" and it should do so unambiguously. Of course, we could store a naive datetime plus a time zone plus a DST flag, but that's cumbersome, I'd even say developer-hostile.

I'm no expert on time zone handling, but shouldn't the two datetimes
already have two different timezones in that case? For example,
01:30:00 CDT, followed by 01:30:00 CST an hour later, if the user is
in the U.S. Central time zone. Both of those times should be
convertible to distinct UTC times. Likewise for the CST to CDT
transition in the spring.

Cheers,
Ian

Ian Clelland

unread,
Jun 30, 2011, 6:13:09 PM6/30/11
to django-d...@googlegroups.com
On Thu, Jun 30, 2011 at 2:10 PM, Aymeric Augustin <aymeric....@polytechnique.org> wrote:
On 30 juin 2011, at 19:06, Daniel Swarbrick wrote:

> On Jun 30, 6:31 pm, Ian Clelland <clell...@gmail.com> wrote:
>
>> As a counterexample, I have needed to store wall-clock time as well as time
>> zones for future events, in scheduling applications. It is exactly because
>> of the unpredictability of future DST changes (and, in very rare cases,
>> wholesale time zone changes) that a one-time UTC conversion simply does not
>> work for timestamps in the future.
>>
>> When an event is scheduled for a specific time, on a specific date in the
>> future, then users expect that the web service will react at the appropriate
>> wall clock time. An unforseen change in daylight saving time legislation
>> should not have the effect of making the time in the database incorrect.
>
> That is a good example, which I agree with. Although it sounds like
> more of a case for storing the date and time as separate fields (which
> is already feasible), since Python dates and times on their own have
> no tzinfo struct... only the combined datetime does (whether or not it
> includes hours/mins/secs components).

Strictly speaking, converting a naive datetime + a time zone to a "point in time" (eg. UTC timestamp) isn't possible, because of DST: one hour doesn't exist, and another one happens twice.


By the same standards, arithmetic division isn't possible, because the quotient is undefined when the denominator is zero. :)

Converting local time + time zone to UTC is absolutely possible, except for roughly one hour out of every 4000 or so, and those hours are probably chosen to be the least-likely to affect anyone working in that particular time zone. The fact that there can be non-existent wall-clock times, or ambiguous wall-clock times, is why we have exception handling (and why pytz raises exceptions in all the right circumstances). It's not a reason to throw one's hands up and claim "this isn't possible; computers can't deal with this".

UTC is brilliant, almost a requirement, for doing any sort of sane calculations involving actual times. The fact remains, though, that users think in local time, events get scheduled in local time, and when a user's government steps in and declares "daylight savings time is going to begin a week early this year," the user's 9:00AM meeting still happens at 9:00AM, wall clock time, and not at the UTC 'point in time' that the web service computed two months ago when the meeting was first scheduled.
 
In my opinion, the primary goal of the DateTime field is to store a "point in time" and it should do so unambiguously. Of course, we could store a naive datetime plus a time zone plus a DST flag, but that's cumbersome, I'd even say developer-hostile.

This would be exactly equivalent to storing a naive datetime and a UTC offset, which isn't much different than storing a UTC timestamp, except that, as you say, it's more cumbersome.

The point, for scheduling applications, of storing a semantic time zone, is that you can't necessarily predict the UTC 'point in time' that an event needs to occur at a certain time in that time zone.

Perhaps this does run contrary to the purpose of a DateTIme field; perhaps DateTime fields are only supposed to be used for events in the past :) My point, in bringing up the use case initially, was that different applications have different uses for time zones, and that simply declaring that timezone data was an irrelevant detail, able to be discarded on storage and relegated to a presentation-layer transformation at rendering time, represents a limited view of real-world problems.

I think there's a consensus on the fact that the best representation for DateTime fields is UTC. I'm willing to trust the pytz guys on this topic:
> The preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans.

Thus, Django should always use UTC internally and on the connection with the database, and convert to local time in the forms widgets and localization functions. However, this is massively backwards incompatible in all databases that store DateTime as naive objects in local time (all but PostgreSQL). As a consequence, it won't be considered (or maybe in Django 2.0 ?)

This isn't necessarily a backwards incompatibility. Django could deal with UTC internally, and an optional argument to the models.DateTimeField could provide a hint as to what time zone the data in the database is using, for both storage and retrieval. Immediately, it would have to default to settings.TIME_ZONE, but there could be a migration path towards changing that default in a future release.

Jacob Kaplan-Moss

unread,
Jun 30, 2011, 10:23:46 PM6/30/11
to django-d...@googlegroups.com
On Thu, Jun 30, 2011 at 1:38 PM, Daniel Swarbrick
<daniel.s...@gmail.com> wrote:
> Last year a customer's MySQL 5.1 server [...]

I doubt it was your intention, but this kind of argument can *so*
easily spiral into a flamewar. Certain things -- text editors,
monospace fonts, operating systems, database engines -- just can't be
successfully discussed. Luckily, they're also off-topic for this list,
so let's bring things back to the discussion at hand and steer clear
of religious territory, OK?

Jacob

Reply all
Reply to author
Forward
0 new messages