Now that PEP 495
(the fold attribute, added in Python 3.6) and PEP 615
(the zoneinfo module, added in Python 3.9) have been accepted,
there's not much reason to continue using pytz, and its
non-standard interface is a major source of bugs. In fact,
the creator and maintainer of pytz said during the PEP discussions
that he was looking
forward to deprecating pytz. After merging zoneinfo upstream
into CPython, I turned the reference implementation into a backport to Python
3.6+, which I continue to maintain alongside the upstream
zoneinfo module, so it is already possible to migrate Django today.
Right now, Django continues to use pytz as its source of time
zones, even for simple zones like UTC; I would like to propose
starting the migration to zoneinfo now, to enable Django
users to start doing their own migrations, and so there can be
ample time for proper deprecation cycles. I apologize in advance
for the huge wall of text, the TL;DR summary is:
Rationale
Before I get into the detailed migration plan, I'd like to lay
out the case for why this needs to be done. I expect it to
be at least somewhat painful, but and I'd prefer it if y'all were
convinced that it's the right thing to do before you hear
about the costs 😉. I am not the kind of person who thinks that
just because something is in the standard library it is
automatically "better" than what's available on PyPI, but in the
case of pytz, I've been recommending people move away from it from
a long time, because most people don't know how to use it
correctly. The issue is that pytz was originally designed to work
around issues with ambiguous and imaginary times that were fixed
by PEP 495 in Python 3.6, and the compromise it made for
correctness is that it is not really compatible with the standard
tzinfo interface. That's why you aren't supposed to directly
attach a timezone with the tzinfo argument, and why datetime
arithmetic requires normalization. Any substantial code base I've
ever seen that uses pytz either has bugs related to this or had a
bunch of bugs related to this until someone noticed the problem
and did some sort of large-scale change to fix all the bugs, then
imposed some sort of linting rules.
In addition to the "pytz is hard to use" problems (which,
honestly, should probably be enough), pytz also has a few
additional issues that the maintainer has said will not be fixed
(unless pytz becomes a thin wrapper around zoneinfo, which is at
best just using a slower version of zoneinfo). The biggest issue,
which is going to become increasingly relevant in the coming
years, is that it only supports the Version 1 spec in TZif files,
which (among other issues), does not support datetimes after 2038
(or before 1902) — this was deprecated 15 years ago, and it is
unlikely that you will find modern TZif files that don't have
Version 2+ data. Pytz also does not support sub-minute offsets,
which is mostly relevant only in historical time zones. And, of
course, pytz is not compatible with PEP 495 and in some ways
really cannot be made compatible with PEP 495 (at least not
easily).
Of course, one reasonable objection here is that Django is a huge
install base with very long release cycles, and it doesn't make
sense for Django to be an experimental "early adopter" of the new
library. This is a reasonable response, but it's actually because
it has a huge install base and long release cycles that it's
important for Django to migrate early, because it can't "turn on a
dime" like that. The long release cycles mean that changes made now
won't be universal for many years, and it's important for users to
have a long notice period that change is coming (particularly
since, as time goes on, Year 2038 bugs will become more and more
common). The huge install base means that at a minimum,
zoneinfo should be supported, to allow users to start
their own migrations today.
Migration Plan
I am fairly certain this is going to be a tricky migration and
will inevitably come with some user pain. I don't think
this will be Python 2 → 3 style pain, but some users who have been
doing the "right thing" with pytz will need to make changes to
their code in the long run, which is unfortunate. I think there
are several stages of "support" for zoneinfo, not all of which are
mandatory.
I am not sufficiently familiar with Django's development process
to know if feature flags are frequently used. I recommend at a
minimum doing #1 immediately and adding tests for
compatibility with the zoneinfo module. Preferably that would be
accompanied with #2 as well.
I have created a Proof of
Concept PR using the deprecation shims (#3), but with no
feature flags, which I think is the fastest you should
move on this. It was a fairly minimal change, which was
encouraging, but a great many of the tests still have an explicit
dependency on pytz. This will break some users, as is
probably evident from the fact that I needed to make changes other
than the constructors and fixes to warnings. Encouragingly,
the only test I needed to touch was to fix a warning, not an
error; to the extent that the Django test suite describes Django's
public interface, this is a "non-breaking change", but as I
mention in the pytz-deprecation-shim
migration guide, there are some semantic differences that
you could easily encounter if you are counting on
`django.utils.timezone.get_current_timezone()` returning a pytz
time zone instead of a shim class. The majority of people won't
encounter these, but... Hyrum's Law.
One remaining question here is what to do with the pytz-specific
tests. Until Django gets to stage 5 (pytz zones aren't even supported),
there should be at least some tests for pytz support, but
presumably most of the tests that explicitly use pytz are only
doing so incidentally (e.g. almost all uses of `pytz.utc`). In
stage 3, I would think that most pytz-specific tests should either
be parameterized to test with both pytz and zoneinfo or tested
only with zoneinfo.
I am not terribly familiar with the release schedule or backwards
compatibility guarantees that Django makes in point versions, etc.
I read the Django documentation on stability, but it suffers from
the same problems that SemVer in general suffers from (and that
there's no avoiding, really), which is that breaking changes are
in the eye of the beholder. I leave it to y'all to decide the
roll-out schedule for this stuff (assuming it's accepted at all),
but I'm happy to offer what advice I can on the matter.
For those of you who have made it this far, thank you for
indulging me.
Best,
Paul
[0] There's an additional step between 2 and 3 that can be taken,
which is the adoption of pytz-deprecation-shim (or an equivalent
thereof), but only for UTC. Most (all?) of the semantic
differences between pytz and the shims don't come up with
fixed-offset zones or UTC, and a shim around UTC to provide
localize and normalize + warning can be a wrapper around
datetime.timezone.utc rather than the much newer zoneinfo.
[1] I created pytz-deprecation-shim specifically to help migrate
big code bases and popular libraries like Django and pandas off
pytz, but if you are concerned with taking on an additional
dependency, it's not terribly difficult to extract the core of the
library directly into Django (for example, Django doesn't need any
of the stuff for Python 2 compatibility), but the downside there
is that what support exists for a Django fork would lag behind the
upstream library.
[2] Unless this is a build-time flag, there's no way to have
Django defaulting to having pytz as a required dependency with an
option to not depend on it.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/b6e8e75c-dddf-da65-3af5-43f1b5b23eca%40ganssle.io.
The point about the arithmetic semantics is a good one. I'm
curious to know how often this is actually a problem. I think it
will happen strictly less frequently than the localize case, which
is handled in both a backwards and forward-compatible way,
and I hate to throw the baby out with the bathwater here.
I'm hesitant to say that it's a good idea to jump directly to
using zoneinfo with no warning, though. I see two possible
ways to get a reasonable warning here:
1. Configure localize and normalize separately, such that localize
works as expected, but normalize raises an exception (possibly we
can go more granular with this so that, for example, UTC.normalize
is just a warning). This could even be achieved with separate
warning classes for localize and normalize, with the normalize
warning configured as an error by default.
2. Add a feature flag that allows switching directly to zoneinfo
that will eventually default to True, and replace pytz with a shim
around pytz that raises warnings whenever you use something
pytz-specific. That will give people time to opt in to using
zoneinfo with, hopefully, zero changes in the intermediate.
(Unfortunately, though, it doesn't allow for any incremental
migration like pytz_deprecation_shim does).
I do think a simple shim similar to pytz_deprecation_shim would be
appropriate for the UTC object exposed in Django, though. That
could be done immediately with no impact on semantics and
allow for incremental migration, since even pytz's UTC object can
be directly attached to datetimes.
Right now, I think the obvious first step is to add support
for deliberately using zoneinfo / datetime.timezone. This can be
done in a perfectly backwards compatible way, so there's no point
in delaying.
Best,
Paul
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/c3a3a788-2300-4361-ab06-0d89523424ecn%40googlegroups.com.
The point about the arithmetic semantics is a good one. I'm curious to know how often this is actually a problem.
2. Add a feature flag that allows switching directly to zoneinfo that will eventually default to True, and replace pytz with a shim around pytz that raises warnings whenever you use something pytz-specific. That will give people time to opt in to using zoneinfo with, hopefully, zero changes in the intermediate. (Unfortunately, though, it doesn't allow for any incremental migration like pytz_deprecation_shim does).
This sounds like a reasonable timeline to me. I think the
breakage will be relatively small because I suspect many end-users
don't really even know to use `normalize` in the first place, and
when introducing the shim into a fundamental library at work I did
not get a huge number of breakages, but I am still convinced that
it is reasonably categorized as a breaking change.
I do think that there's one additional stage that we need to add
here (and we chatted about this on twitter a bit), which is a
stage that is fully backwards compatible where Django supports
using non-pytz zones for users who bring their own time zone. I
suspect that will help ease any breaking pain between 3.2 and 4.0,
because no one would be forced to make any changes, but end users
could proactively migrate to zoneinfo for a smoother transition.
I think most of what needs to be done is already in my original
PR, it just needs a little conditional logic to handle pytz as
well as the shim.
I am not sure how you feel about feature flags, but as a "nice to
have", I imagine it would also be possible to add a feature flag
that opts you in to `zoneinfo` as time zone provider even in 3.2,
so that people can jump straight to the 5.0 behavior if they are
ready for it.
I should be able to devote some time to at least the first part —
making Django compatible with zoneinfo even if not actively using
it — but likely not for a few weeks at minimum. If anyone wants to
jump on either of these ahead of me I don't mind at all and feel
free to ping me for review.
Best,
Paul
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/ce04a6b7-4409-4b20-ba30-4cd64dc0cabfn%40googlegroups.com.
I think either jumping straight to zoneinfo in 5.0 or using the
shim in 4.0 would be fine, though I will say that it is very
likely that if you don't want to change your code twice you won't
have to, particularly if we add a feature flag to opt-in to
zoneinfo even in 3.2.
The shim time zones already work the same as `zoneinfo` zones, but
they also expose the pytz interface (with the one semantic
difference) and start raising deprecation warnings. Once your code
works with the shims without throwing deprecation
warnings, it will also automatically work with zoneinfo zones.
In the hopefully conservative time scale I'm envisioning, there
would be a feature flag for 3.2 and 4.x where you can opt in to
having Django use `zoneinfo`, so you don't even need to do
anything to get rid of the shims if you don't want to be passing
around shim objects (if you have a mix of shim and non-shim time
zones, datetime comparisons and arithmetic will have slightly
different semantics).
Another thing to note: if you want any kind of warning,
you need some kind of shim, because there are many ways to use
pytz time zones that are not a problem. You need to target the
deprecation warnings for code that actually uses pytz's API, which
means wrapping pytz's API. Without actually supporting zoneinfo,
though, users can't actually do anything with the warning,
so they would be forced to change all their time zone logic at the
same time as updating their Django version. My
pytz-deprecation-shim module exposes time zones that don't raise
any warnings if you use them like zoneinfo objects, so people have
the option to gradually move their code into a state where
flipping the switch from pytz to zoneinfo would have very little
effect.
Of course, this is all very conservative and assumes that the
ability to gradually migrate is broadly desirable. It may be that
Django users by and large prefer doing major overhauls all at once
rather than a protracted period of gradually upgrading. It also
may be that the shims have some disadvantages I haven't found yet
that makes them much worse than a sudden break. It may also be
that y'all would prefer a little extra pain now to supporting the
shim chimera for the lifetime of the 4.x branch.
In my opinion, the strongest argument in favor of a sudden
breaking change in 4.0 would be that the sudden change will be
much louder because stuff will just break. With the shims,
you'll get a bunch of Deprecation Warnings (which many people may
not see because they are off by default), and the one
backwards-incompatible change is a fairly subtle difference in
arithmetic that only applies in certain situations. It would be a
lot easier to not notice the change until you see a bug caused by
it showing up in production. On the other hand, people not testing
for this adequately may not realize that the semantics are
different than most people think, so they might have the same bug
anyway.
Best,
Paul
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/be81c1a9-cefa-4f02-a9d6-17e6d1a93c2dn%40googlegroups.com.
I would definitely be in favor of an opt-in: it would give developers time to move to the new system at their convenience.
Example: we're about to try and tackle the TZ issue in our apps
and we want to do it "globally" with one definitive solution. I'd
much rather do it on a library that is currently favoured, but not
yet default than on a deprecated one, even if it's not yet
officially deprecated. We do have some "import pytz", but
currently they are few. Once we have a proper approach to handling
timezone stuff, there's likely going to be more of them... or
less, depending on the solution ;-)
LP,
Jure
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/e13e8ae2-5d43-e550-48a4-cb7ad6e699f6%40ganssle.io.
Before looking at alternatives, I wonder if we can just change the shims package to make it fully backwards compatible? Right now the shims version of normalize() is essentially a noop. Paul, couldn't it actually attempt to adjust the time the way pytz does? Perhaps by wrapping pytz itself, and calling its normalize() from the corresponding pytz timezone; or by simply replicating its time-changing logic? Apologies if that's a naive question.
Note that the two endpoints are identical, despite the fact that
one of them spans a DST transition and the other one doesn't.
Since the input to `normalize` is just a datetime and it's assumed
that this path-dependence would show up as an inconsistency in the
offset, there's nothing we can do here other than to actually have
all the same problems as pytz.
Of course, there is another option, which is to, rather than
adopting a wrapper around zoneinfo, adopt a wrapper around pytz
that does not follow PEP 495, but instead just deprecates
`pytz`'s API and tells people to turn on the "use zoneinfo"
feature flag. It has the upside of being fully
backwards-compatible, but the downside of prolonging dependence on
pytz.
Another option is to modify the shims so that `normalize` always
raises an exception instead of a warning (or maybe it raises an
exception for anything except UTC and fixed offsets). In that
case, version 4.0 will mostly just work and start raising
deprecation warnings, but there will be a hard breakage for anyone
who would be negatively affected by the change in semantics. This
would still leave a possible problem in the other
direction, though:
>>> from datetime import datetime, timedelta
>>> from zoneinfo import ZoneInfo
>>> import pytz
>>> NYC_p = pytz.timezone("America/New_York")
>>> NYC = ZoneInfo("America/New_York")
>>> dtp_0 = NYC_p.localize(datetime(2020, 10,
31, 12))
>>> dtp_1 = NYC_p.localize(datetime(2020, 11, 1,
12))
>>> (dtp_1 - dtp_0 ) / timedelta(hours=1)
25.0
>>> dtz_0 = datetime(2020, 10, 31, 12,
tzinfo=NYC)
>>> dtz_1 = datetime(2020, 11, 1, 12,
tzinfo=NYC)
>>> (dtz_1 - dtz_0) / timedelta(hours=1)
24.0
This occurs because localized pytz zones are different tzinfo
objects, and as such comparisons and subtraction use inter-zone
semantics. Of course, you'll have this same problem even with a
"hard break", since unlike invocation of `normalize` and
`localize`, subtraction operations will succeed if you swap out
the attached tzinfo for a zoneinfo tzinfo.
If we go with any variation of using shim-around-zoneinfo like
pytz-deprecation-shim, I'd say those shims need to be introduced
as a breaking change in Django 4.0. If we go with
shim-around-pytz, I think that can safely be introduced in 3.2
(though that would require simultaneously adding support
for using zoneinfo, and even then it might mostly force
people to either do the migration in a single huge step or to
involve some wrapper functions for handling the period of time
where the time zone type is not consistent throughout the
application).
Best,
Paul
I think that the simplest approach—the one that would result in the least amount of total work for both Django and its users—would be to adopt Nick's suggestion and just switch to zoneinfo in 4.0. The problem is that it's very hard to square that with Django's stability policy: "We’ll only break backwards compatibility of these APIs without a deprecation process if a bug or security hole makes it completely unavoidable."
If we're going to follow the deprecation process, then there needs to be some overlap where both ways of doing things are possible. The shims package is a promising approach, but the fact that it's not actually backwards compatible with pytz is a serious problem. Adopting it directly as Carlton proposes also seems to violate the stability policy, albeit in a less severe way.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/b18754a4-c308-492d-b547-6b3c7cdc1442n%40googlegroups.com.
The deprecation route using pytz_deprecation_shim in 4.0 changing to zoneinfo in 5.0
Another option is to modify the shims so that `normalize` always raises an exception instead of a warning (or maybe it raises an exception for anything except UTC and fixed offsets).
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/393fa2b7-6fca-fd2c-f36e-9942c6dc0104%40ganssle.io.
> It is not really possible to make the shims work the same way because there's not enough information available to determine whether an adjustment needs to be made.
But since you're shimming pytz, don't you, by definition, have access to the all the same information that it has?
No for two reasions:
1. The shims wrap zoneinfo (and dateutil, though that is not
relevant in this case), they do not wrap pytz (and in fact
do not have a dependency on pytz, though there is some magic that
allows them to play nicely with things namespaced in pytz).
2. pytz's mechanism for attaching time zones is incompatible with
PEP 495. It would not really be possible to have the shims work
both as pytz zones and as PEP 495 zones in all cases.
You are right that it would be possible to make it so that
`shim_zone.localize()` basically does what pytz does, attaching a
tzinfo specific to the offset that applies at that time, and for
shim_zone.normalize() to make sure that the one attached is the
right one, while also allowing `tzinfo=shim_zone` to work as a PEP
495 zone. That would make it very difficult to reason about the
system, though. It would make it so that depending on how you
attached your time zone, you would get different semantics for
comparisons and arithmetic. Sometimes normalize would work and
sometimes it wouldn't. So, for example, imagine you have:
def f(dt):
return shim_zone.localize(dt)
x = shim_zone.normalize(f(datetime(2020, 10, 31, 12) +
timedelta(days=1))
If someone changes `f` to instead use
`dt.replace(tzinfo=shim_zone)`, the value for `x` changes, because
some function you have is no longer using `localize`. Similarly,
if we have say `datetime.now(shim_zone)` return a non-localized
datetime, you have differences in semantics between
`shim_zone.localize(datetime.now())` and `datetime.now(shim_zone)`
(both of which are valid with pytz). If we have it return a localized
datetime, subtraction and comparison semantics would be affected,
because now times localized to one or the other offset will be
inter- rather than intra-zone comparisons.
Unfortunately there's simply no way to make it fully backwards
and forwards compatible. The best options I see are a shim around
pytz in 3.2 that just adds warnings and doesn't do anything else,
followed by a hard break in 4.0 or pytz-deprecation-shim in 4.0
and hard break in 5.0.
I think both are fine plans. I suspect that the slower plan will
get people upgrading to 4.0 much faster, but it does have the
disadvantage that some of the breakage is subtle and won't raise
big errors (which is also the case, though to a lesser extent,
with the faster plan).
In any case, it seems uncontroversial that 3.2 should support
"bring your own zoneinfo", and I think most people agree that a
feature flag in 3.2 is also a good idea, so to the extent that I
have time to work on this, I'll work on those things.
Best,
Paul
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/043100f2-fd50-458a-9b31-c52128a534cbn%40googlegroups.com.
Sorry guys for asking a really stupid question, but :
I just made a search for pytz in Django master and found 17 occurrences in 5 files. More in docs and tests though. But still.
Isn't what we're debating here moot since Django itself doesn't
really depend on pytz all that heavily? I mean, I realise the
difference between the libraries bears grave consequences, but not
in Django itself, AFAIR.
Seems like changing the implementation such that it would be able
to use either approach (e.g. via a setting & a common import
wrapper) shouldn't be too much of a hassle anyway.
Or am I missing something really obvious here?
LP,
Jure
P.S., but totally irrelevant to the discussion: I always found
having to import pytz to handle TZ-related stuff wasn't optimal. I
would have preferred having access to the necessary API from
Django's framework.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/4b473c2b-f4e1-2d25-4f35-4695815e4c25%40ganssle.io.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/1AA84619-74BA-4A7D-A5A8-DC7210885BE9%40polytechnique.org.
IIUC-at-first-pass: your suggestion differs in jumping straight to the end-point with a fallback for those who need it. (In contrast to getting folks to opt-in early if they want it.)
what did you think about allowing an early opt-in for Django v3.2?
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAJwKpyQrd7ueMfpuM-HkPUWCMjjpn2hTFpfz5%3D5ckVzZx-JMyg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/ef569f26-b5d5-4f4a-bf1b-daa7460876d6n%40googlegroups.com.