Proposal: Drop dependency on pytz in favor of zoneinfo

923 views
Skip to first unread message

Paul Ganssle

unread,
Jun 17, 2020, 10:32:46 AM6/17/20
to django-d...@googlegroups.com
Greetings all,

Now that PEP 495 (the fold attribute, added in Python 3.6) and PEP 615 (the zoneinfo module, added in Python 3.9) have been accepted, there's not much reason to continue using pytz, and its non-standard interface is a major source of bugs. In fact, the creator and maintainer of pytz said during the PEP discussions that he was looking forward to deprecating pytz. After merging zoneinfo upstream into CPython, I turned the reference implementation into a backport to Python 3.6+, which I continue to maintain alongside the upstream zoneinfo module, so it is already possible to migrate Django today.

Right now, Django continues to use pytz as its source of time zones, even for simple zones like UTC; I would like to propose starting the migration to zoneinfo now, to enable Django users to start doing their own migrations, and so there can be ample time for proper deprecation cycles. I apologize in advance for the huge wall of text, the TL;DR summary is:

  • It's important to start this migration as soon as possible, because it will take a long time and some of the bugs it fixes get worse as time goes on; right now Django's assumption that all time zones are from pytz will likely block Django users from migrating before Django does.
  • The combination of pytz's non-standard interface and Hyrum's Law makes this less straightforward than one would hope. Likely the best solution is to adopt "zoneinfo" immediately with a wrapper that deprecates the pytz-specific interface. I have provided a library to do this.
  • There are some open questions (which may only be open because I don't know Django's process well enough) in the migration plan:
    • Should this change be put under a feature flag of some sort? If so, what should be the default values for coming releases?
    • Should all the pytz-specific tests be kept with zoneinfo tests added, or should they be migrated and pytz tests confined to a smaller subset?

Rationale

Before I get into the detailed migration plan, I'd like to lay out the case for why this needs to be done. I expect it to be at least somewhat painful, but and I'd prefer it if y'all were convinced that it's the right thing to do before you hear about the costs 😉. I am not the kind of person who thinks that just because something is in the standard library it is automatically "better" than what's available on PyPI, but in the case of pytz, I've been recommending people move away from it from a long time, because most people don't know how to use it correctly. The issue is that pytz was originally designed to work around issues with ambiguous and imaginary times that were fixed by PEP 495 in Python 3.6, and the compromise it made for correctness is that it is not really compatible with the standard tzinfo interface. That's why you aren't supposed to directly attach a timezone with the tzinfo argument, and why datetime arithmetic requires normalization. Any substantial code base I've ever seen that uses pytz either has bugs related to this or had a bunch of bugs related to this until someone noticed the problem and did some sort of large-scale change to fix all the bugs, then imposed some sort of linting rules.

In addition to the "pytz is hard to use" problems (which, honestly, should probably be enough), pytz also has a few additional issues that the maintainer has said will not be fixed (unless pytz becomes a thin wrapper around zoneinfo, which is at best just using a slower version of zoneinfo). The biggest issue, which is going to become increasingly relevant in the coming years, is that it only supports the Version 1 spec in TZif files, which (among other issues), does not support datetimes after 2038 (or before 1902) — this was deprecated 15 years ago, and it is unlikely that you will find modern TZif files that don't have Version 2+ data. Pytz also does not support sub-minute offsets, which is mostly relevant only in historical time zones. And, of course, pytz is not compatible with PEP 495 and in some ways really cannot be made compatible with PEP 495 (at least not easily).

Of course, one reasonable objection here is that Django is a huge install base with very long release cycles, and it doesn't make sense for Django to be an experimental "early adopter" of the new library. This is a reasonable response, but it's actually because it has a huge install base and long release cycles that it's important for Django to migrate early, because it can't "turn on a dime" like that. The long release cycles mean that changes made now won't be universal for many years, and it's important for users to have a long notice period that change is coming (particularly since, as time goes on, Year 2038 bugs will become more and more common). The huge install base means that at a minimum, zoneinfo should be supported, to allow users to start their own migrations today.

Migration Plan

I am fairly certain this is going to be a tricky migration and will inevitably come with some user pain. I don't think this will be Python 2 → 3 style pain, but some users who have been doing the "right thing" with pytz will need to make changes to their code in the long run, which is unfortunate. I think there are several stages of "support" for zoneinfo, not all of which are mandatory.

  1. Drop all requirements that time zones support pytz-specific interfaces internally — make it so that end users have the option to use zoneinfo and datetime.timezone.
  2. Document that users should use zoneinfo, even if it's not the default.
  3. Convert all uses of pytz to using a shim that supports pytz's interface, but raises warnings whenever something pytz-specific is used. [0] I have already created a library for this purpose. [1] This has three sub-stages:
    1. Make this optional, enabled by a feature flag. [2]
    2. Make this optional, disabled by a feature flag.
    3. Remove all feature flags related to this, make it mandatory.
  4. Remove the shims, making zoneinfo the default for all options, but maintaining support for user-supplied pytz zones (again this can be rolled out under feature flags if desired).
  5. Remove support for pytz entirely.

I am not sufficiently familiar with Django's development process to know if feature flags are frequently used. I recommend at a minimum doing #1 immediately and adding tests for compatibility with the zoneinfo module. Preferably that would be accompanied with #2 as well.

I have created a Proof of Concept PR using the deprecation shims (#3), but with no feature flags, which I think is the fastest you should move on this. It was a fairly minimal change, which was encouraging, but a great many of the tests still have an explicit dependency on pytz. This will break some users, as is probably evident from the fact that I needed to make changes other than the constructors and fixes to warnings. Encouragingly, the only test I needed to touch was to fix a warning, not an error; to the extent that the Django test suite describes Django's public interface, this is a "non-breaking change", but as I mention in the pytz-deprecation-shim migration guide, there are some semantic differences that you could easily encounter if you are counting on `django.utils.timezone.get_current_timezone()` returning a pytz time zone instead of a shim class. The majority of people won't encounter these, but... Hyrum's Law.

One remaining question here is what to do with the pytz-specific tests. Until Django gets to stage 5 (pytz zones aren't even supported), there should be at least some tests for pytz support, but presumably most of the tests that explicitly use pytz are only doing so incidentally (e.g. almost all uses of `pytz.utc`). In stage 3, I would think that most pytz-specific tests should either be parameterized to test with both pytz and zoneinfo or tested only with zoneinfo.

I am not terribly familiar with the release schedule or backwards compatibility guarantees that Django makes in point versions, etc. I read the Django documentation on stability, but it suffers from the same problems that SemVer in general suffers from (and that there's no avoiding, really), which is that breaking changes are in the eye of the beholder. I leave it to y'all to decide the roll-out schedule for this stuff (assuming it's accepted at all), but I'm happy to offer what advice I can on the matter.

For those of you who have made it this far, thank you for indulging me.

Best,
Paul


[0] There's an additional step between 2 and 3 that can be taken, which is the adoption of pytz-deprecation-shim (or an equivalent thereof), but only for UTC. Most (all?) of the semantic differences between pytz and the shims don't come up with fixed-offset zones or UTC, and a shim around UTC to provide localize and normalize + warning can be a wrapper around datetime.timezone.utc rather than the much newer zoneinfo.

[1] I created pytz-deprecation-shim specifically to help migrate big code bases and popular libraries like Django and pandas off pytz, but if you are concerned with taking on an additional dependency, it's not terribly difficult to extract the core of the library directly into Django (for example, Django doesn't need any of the stuff for Python 2 compatibility), but the downside there is that what support exists for a Django fork would lag behind the upstream library.

[2] Unless this is a build-time flag, there's no way to have Django defaulting to having pytz as a required dependency with an option to not depend on it.


signature.asc

Ryan Hiebert

unread,
Jun 17, 2020, 3:26:04 PM6/17/20
to django-d...@googlegroups.com
I'm almost exclusively a lurker on this list, but a constant user, and have in the past a keen observer in the datetime discussions.

I think you've made your case well. I'd be happy for this migration from Django to be as aggressive as the maintainers are comfortable. I agree that doing step 1 at a minimum seems like the right thing to me. I'm very interested in using zoneinfo over pytz for my projects, and I'm willing to shoulder more risk in order to do so. I hope Django will allow me to make that choice.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/b6e8e75c-dddf-da65-3af5-43f1b5b23eca%40ganssle.io.

Kevin Henry

unread,
Jun 20, 2020, 3:34:09 AM6/20/20
to Django developers (Contributions to Django itself)
Thanks to Paul for this proposal, and for working to put proper timezone usage into Python itself. PEP 495 and 615 definitely make Python better, and it seems inevitable that everyone will sooner or later switch over. I'm all for getting this process going in Django.

I have doubts over whether it's a good idea to use the shim, though. Briefly: 1) The shim is not backward compatible with pytz; 2) To avoid the subtle bugs resulting from that, the developer must review pretty much the whole scope of their datetime usage when they adopt the shim; therefore 3) it will probably be easier on everyone (both developers and Django itself) to simply switch directly to zoneinfo.

1) I'm referring to the change in semantics around datetime arithmetic (see here for the description in the shim documentation, and here for a simple demonstration). Basically, if you're saying anything like "give me the datetime one day after this one", the value (both the wall-clock time and the actual point-in-time) could change once you stop using pytz.

2) Because of the above, the developer can't simply drop in the shim as a replacement for pytz. They need to review their codebase to see if they're doing datetime arithmetic anywhere. If they are, and if the change could affect the resulting values, they need to think about whether they want to use the old value or the new value. If they want the old value, they need to add some more code to recreate it. If they want the new value, they need to think about how to reconcile that change with previous behavior (and previous values stored in the database).

3) That's the hardest part of the transition, I think, so I'm not sure it's actually helpful to delay the rest. Removing calls to localize() and normalize() is comparatively simple. The difference between is_dst and fold does require some thought, but it's pretty much the same thought you need to do arithmetic right, and developers will benefit from making those changes at the same time.

In essence, the shim creates another, third way of handling datetimes, neither backwards compatible with pytz (due to the change in semantics) nor forward compatible with zoneinfo (due to the use of normalize(), etc.). And developers might end up in this twilight zone for years given typical Django deprecation cycles.

So my thought is simply to not use the shim; developers should either be doing things the pytz way or the Python (zoneinfo) way. I don't have strong feelings about how that should happen (e.g. a feature flag), or when it should happen.

Cheers,
Kevin

Paul Ganssle

unread,
Jun 22, 2020, 9:48:52 AM6/22/20
to django-d...@googlegroups.com

The point about the arithmetic semantics is a good one. I'm curious to know how often this is actually a problem. I think it will happen strictly less frequently than the localize case, which is handled in both a backwards and forward-compatible way, and I hate to throw the baby out with the bathwater here.

I'm hesitant to say that it's a good idea to jump directly to using zoneinfo with no warning, though. I see two possible ways to get a reasonable warning here:

1. Configure localize and normalize separately, such that localize works as expected, but normalize raises an exception (possibly we can go more granular with this so that, for example, UTC.normalize is just a warning). This could even be achieved with separate warning classes for localize and normalize, with the normalize warning configured as an error by default.

2. Add a feature flag that allows switching directly to zoneinfo that will eventually default to True, and replace pytz with a shim around pytz that raises warnings whenever you use something pytz-specific. That will give people time to opt in to using zoneinfo with, hopefully, zero changes in the intermediate. (Unfortunately, though, it doesn't allow for any incremental migration like pytz_deprecation_shim does).

I do think a simple shim similar to pytz_deprecation_shim would be appropriate for the UTC object exposed in Django, though. That could be done immediately with no impact on semantics and allow for incremental migration, since even pytz's UTC object can be directly attached to datetimes.

Right now, I think the obvious first step is to add support for deliberately using zoneinfo / datetime.timezone. This can be done in a perfectly backwards compatible way, so there's no point in delaying.

Best,
Paul

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
signature.asc

Kevin Henry

unread,
Jun 25, 2020, 7:55:43 AM6/25/20
to Django developers (Contributions to Django itself)

The point about the arithmetic semantics is a good one. I'm curious to know how often this is actually a problem.

That’s an interesting question. I think that doing localized datetime arithmetic in Django is idiomatic, if not necessarily common. Let’s say we have a calendar application that allows users to create events and set reminders. A typical way to handle an event creation request would be to activate() the user’s timezone and then read a localized datetime from the Form. Now, to create reminders, you add a timedelta of a day, a month, etc. and then store that result in the database. This datetime could be different under the new semantics.


2. Add a feature flag that allows switching directly to zoneinfo that will eventually default to True, and replace pytz with a shim around pytz that raises warnings whenever you use something pytz-specific. That will give people time to opt in to using zoneinfo with, hopefully, zero changes in the intermediate. (Unfortunately, though, it doesn't allow for any incremental migration like pytz_deprecation_shim does).


That is approximately what I was thinking. For the sake of example, I’ll present an opt-in approach modeled after the New Middleware transition (though I haven’t given the opt-in mechanism much thought):

- In version X, Django deprecates usage of pytz and introduces the TIMEZONE setting. It has the same meaning as TIME_ZONE, but the use of it signals that the user is opting into zoneinfo. If it’s set, Django does not use or assume pytz.
- If it’s not set, Django uses pytz and gives the user deprecation warnings. Those could come from a system check; or when Django datetime utilities are used; or via a shim around timezone objects; or something else.
- In version Y, pytz and TIME_ZONE usage is removed.

As you say, the main difference from your proposal is that there’s no way to mix pytz and zoneinfo. Trying to allow that would slow down and complicate the transition, and I'm just not seeing much of a benefit to outweigh that. (Just one perspective, of course...)

Cheers,
Kevin

Carlton Gibson

unread,
Oct 7, 2020, 10:48:37 AM10/7/20
to Django developers (Contributions to Django itself)
Hi Paul. 

Thanks for the input here, and for your patience 

> I am fairly certain this is going to be a tricky migration and will inevitably come with some user pain. I don't think this will be Python 2 → 3 style pain, but some users who have been doing the "right thing" with pytz will need to make changes to their code in the long run, which is unfortunate.

Looking at all the docs, your migration guide on pytz_deprecation_shim, the example Kevin gave, where we do some arithmetic in a local timezone, and call `normalize()` in case we crossed a DST boundary, there's no way we can do this without forcing a breaking change somewhere.

So, probably, I've been staring at this too long today, but I think we should introduce the shim in Django 4.0. Django 3.2, the next major release will be an LTS. If we hold-off introducing the change until 4.0, we can flag it as a breaking change in the 4.0 release notes, with big warnings, allowing folks extra time to hang out on the previous LTS if they need it. 

What I wouldn't want to do is to bring the breaking change in in Django 3.2, because we'll have a whole load of folks updating from the 2.2 LTS at about the time when it goes End of Life, and with no warning, that'd be a hard breaking change to throw on top of their other issues. 

We'd keep the shim in place for the entire 4.x series, removing in Django 5.0 as per the deprecation policy.

I think the advantages of doing it this way are two-fold: 

* We allow people to focus on the semantic breaking change (in folds) separately from the code changes per se — the logic may have changed slightly in these cases, but it'll still run. 
* It looks easier to migrate Django's code vs branching on a new setting. (I didn't think through exactly what that might look like, so happy to see a PoC from anyone.)

I'm more attached to the timeline (i.e. making the change after the next LTS) than whether we use the deprecation shim or not, but can I ask others to give this their thought too?

Thanks again! 

Kind Regards,

Carlton


Paul Ganssle

unread,
Oct 7, 2020, 11:26:21 AM10/7/20
to django-d...@googlegroups.com

This sounds like a reasonable timeline to me. I think the breakage will be relatively small because I suspect many end-users don't really even know to use `normalize` in the first place, and when introducing the shim into a fundamental library at work I did not get a huge number of breakages, but I am still convinced that it is reasonably categorized as a breaking change.

I do think that there's one additional stage that we need to add here (and we chatted about this on twitter a bit), which is a stage that is fully backwards compatible where Django supports using non-pytz zones for users who bring their own time zone. I suspect that will help ease any breaking pain between 3.2 and 4.0, because no one would be forced to make any changes, but end users could proactively migrate to zoneinfo for a smoother transition.

I think most of what needs to be done is already in my original PR, it just needs a little conditional logic to handle pytz as well as the shim.

I am not sure how you feel about feature flags, but as a "nice to have", I imagine it would also be possible to add a feature flag that opts you in to `zoneinfo` as time zone provider even in 3.2, so that people can jump straight to the 5.0 behavior if they are ready for it.

I should be able to devote some time to at least the first part — making Django compatible with zoneinfo even if not actively using it — but likely not for a few weeks at minimum. If anyone wants to jump on either of these ahead of me I don't mind at all and feel free to ping me for review.

Best,
Paul

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
signature.asc

Nick Pope

unread,
Oct 7, 2020, 11:52:49 AM10/7/20
to Django developers (Contributions to Django itself)
Hi Carlton,

Thanks for coming back to this.

Your reasoning makes a lot of sense. I too think it'll be good not to land this in 3.2 LTS and focus on making any changes in 4.0.

The question is then which path we choose to take:
  1. The deprecation route using pytz_deprecation_shim in 4.0 changing to zoneinfo in 5.0
  2. Just make a hard break to zoneinfo in 4.0 replacing the pytz dependency with backports.zoneinfo;python_version<"3.9"
If I'm honest, I actually prefer the second option for the following reasons:
  • It avoids having to go through the rigmarole of updating this stuff twice. One release with big shouty warnings is better than two.
  • People jumping from 3.2 LTS to 4.2 LTS to 5.2 LTS will still have two "consecutive" releases fiddling with this.
  • When Django 5.0 is released in January 2024, we'll still need to use the backport as Python 3.8 isn't retired until October 2024.
  • We don't need to introduce, maintain, deprecate, and remove any extra settings for choosing the pytz vs zoneinfo.
  • As Paul mentioned, most people are probably not using pytz's .normalize() properly anyway...
The disadvantage is that users have to make a major change. But they're going to have to sooner or later anyway.
Using the shim doesn't stop them from having to rewrite their code a second time to switch over to zoneinfo.
With good documentation and examples on how to migrate, I think it would be a better approach.

On a final note, I'd like to say thank you to Paul for fixing this major timezone handling wart. I'm looking forward to life becoming much easier!

Kind regards,

Nick

Paul Ganssle

unread,
Oct 7, 2020, 6:34:47 PM10/7/20
to django-d...@googlegroups.com

I think either jumping straight to zoneinfo in 5.0 or using the shim in 4.0 would be fine, though I will say that it is very likely that if you don't want to change your code twice you won't have to, particularly if we add a feature flag to opt-in to zoneinfo even in 3.2.

The shim time zones already work the same as `zoneinfo` zones, but they also expose the pytz interface (with the one semantic difference) and start raising deprecation warnings. Once your code works with the shims without throwing deprecation warnings, it will also automatically work with zoneinfo zones.

In the hopefully conservative time scale I'm envisioning, there would be a feature flag for 3.2 and 4.x where you can opt in to having Django use `zoneinfo`, so you don't even need to do anything to get rid of the shims if you don't want to be passing around shim objects (if you have a mix of shim and non-shim time zones, datetime comparisons and arithmetic will have slightly different semantics).

Another thing to note: if you want any kind of warning, you need some kind of shim, because there are many ways to use pytz time zones that are not a problem. You need to target the deprecation warnings for code that actually uses pytz's API, which means wrapping pytz's API. Without actually supporting zoneinfo, though, users can't actually do anything with the warning, so they would be forced to change all their time zone logic at the same time as updating their Django version. My pytz-deprecation-shim module exposes time zones that don't raise any warnings if you use them like zoneinfo objects, so people have the option to gradually move their code into a state where flipping the switch from pytz to zoneinfo would have very little effect.

Of course, this is all very conservative and assumes that the ability to gradually migrate is broadly desirable. It may be that Django users by and large prefer doing major overhauls all at once rather than a protracted period of gradually upgrading. It also may be that the shims have some disadvantages I haven't found yet that makes them much worse than a sudden break. It may also be that y'all would prefer a little extra pain now to supporting the shim chimera for the lifetime of the 4.x branch.

In my opinion, the strongest argument in favor of a sudden breaking change in 4.0 would be that the sudden change will be much louder because stuff will just break. With the shims, you'll get a bunch of Deprecation Warnings (which many people may not see because they are off by default), and the one backwards-incompatible change is a fairly subtle difference in arithmetic that only applies in certain situations. It would be a lot easier to not notice the change until you see a bug caused by it showing up in production. On the other hand, people not testing for this adequately may not realize that the semantics are different than most people think, so they  might have the same bug anyway.

Best,

Paul

signature.asc

Jure Erznožnik

unread,
Oct 8, 2020, 4:08:50 AM10/8/20
to django-d...@googlegroups.com

I would definitely be in favor of an opt-in: it would give developers time to move to the new system at their convenience.

Example: we're about to try and tackle the TZ issue in our apps and we want to do it "globally" with one definitive solution. I'd much rather do it on a library that is currently favoured, but not yet default than on a deprecated one, even if it's not yet officially deprecated. We do have some "import pytz", but currently they are few. Once we have a proper approach to handling timezone stuff, there's likely going to be more of them... or less, depending on the solution ;-)

LP,
Jure

smi...@gmail.com

unread,
Oct 9, 2020, 2:35:21 AM10/9/20
to Django developers (Contributions to Django itself)
Hi All,

While I understand the desire to have an early opt-in for some I think the important question here is the deprecation warnings. The recent URL() change showed that no matter how long there is a new way some?/many? folk won't change until they need to. 

Nick -- if we introduced a breaking change in 4.0, would that not have the same impact on folk upgrading to 4.2LTS from 3.2LTS as that which Carlton is concerned about (3.2 from 2.2), albeit a few years further into the future. 


David

Kevin Henry

unread,
Oct 9, 2020, 9:31:53 AM10/9/20
to Django developers (Contributions to Django itself)
I think that the simplest approach—the one that would result in the least amount of total work for both Django and its users—would be to adopt Nick's suggestion and just switch to zoneinfo in 4.0. The problem is that it's very hard to square that with Django's stability policy: "We’ll only break backwards compatibility of these APIs without a deprecation process if a bug or security hole makes it completely unavoidable."

If we're going to follow the deprecation process, then there needs to be some overlap where both ways of doing things are possible. The shims package is a promising approach, but the fact that it's not actually backwards compatible with pytz is a serious problem. Adopting it directly as Carlton proposes also seems to violate the stability policy, albeit in a less severe way.

Before looking at alternatives, I wonder if we can just change the shims package to make it fully backwards compatible? Right now the shims version of normalize() is essentially a noop. Paul, couldn't it actually attempt to adjust the time the way pytz does? Perhaps by wrapping pytz itself, and calling its normalize() from the corresponding pytz timezone; or by simply replicating its time-changing logic? Apologies if that's a naive question.

Cheers,
Kevin

Carlton Gibson

unread,
Oct 9, 2020, 9:55:27 AM10/9/20
to Django developers (Contributions to Django itself)
The reason I suggested going with Paul's shim is that it allows folks to update without having to stop there and then to make code changes. Yes, they'll need to review the deprecation warnings, and there's the issue with datetimes over offset changes, but we can call that out in the release notes, and doing it after the LTS allows people to hold off. I have sympathy for Nick's suggestion but think it'll result in people not updating, which is something in recent years we've managed to avoid.

As Paul points out, users will only have to update their code once. That we do all we can to avoid hard breakages is one of Django's biggest pluses. For me, for us to need to remove the shim at 5.0 is a cost that's worth paying. (It's not a big burden.)

Happy to see a PoC on an alternative but, I think additional complexity to create a shim that's 100% seamless, or to have an opt-in setting where we branch code for an extended time, will not  be worth the price of admission. It's unfortunate that there's a breaking change here but, as long noted on the pytz docs it's inevitable.

Paul Ganssle

unread,
Oct 9, 2020, 11:06:49 AM10/9/20
to django-d...@googlegroups.com
Before looking at alternatives, I wonder if we can just change the shims package to make it fully backwards compatible? Right now the shims version of normalize() is essentially a noop. Paul, couldn't it actually attempt to adjust the time the way pytz does? Perhaps by wrapping pytz itself, and calling its normalize() from the corresponding pytz timezone; or by simply replicating its time-changing logic? Apologies if that's a naive question.

It is not really possible to make the shims work the same way because there's not enough information available to determine whether an adjustment needs to be made. The reason that `normalize` works is that pytz attaches different `tzinfo` objects representing fixed offsets (with a reference to the time zone they represent) to the datetime. If arithmetic creates an invalid datetime (i.e. a datetime in mid-June 2020 with EST attached), `normalize` corrects this by attaching a `tzinfo` representing the correct offset — and it does that by assuming that the UTC datetime represented by the erroneous fixed offset is correct. With PEP 495-style zones, you never create those datetimes with erroneous offsets, so there's no way to tell whether a correction is required.

For example:
>>> from datetime import datetime, timedelta
>>> from zoneinfo import ZoneInfo

>>> NYC = ZoneInfo("America/New_York")
>>> dt0 = datetime(2020, 1, 1, tzinfo=NYC)
>>> dt1 = datetime(2020, 7, 1, tzinfo=NYC)

>>> print(dt0)
2020-01-01 00:00:00-05:00
>>> print(dt1)
2020-07-01 00:00:00-04:00

>>> print(dt0 + timedelta(days=183))
2020-07-02 00:00:00-04:00
>>> print(dt1 + timedelta(days=1))
2020-07-02 00:00:00-04:00

Note that the two endpoints are identical, despite the fact that one of them spans a DST transition and the other one doesn't. Since the input to `normalize` is just a datetime and it's assumed that this path-dependence would show up as an inconsistency in the offset, there's nothing we can do here other than to actually have all the same problems as pytz.

Of course, there is another option, which is to, rather than adopting a wrapper around zoneinfo, adopt a wrapper around pytz that does not follow PEP 495, but instead just deprecates `pytz`'s API and tells people to turn on the "use zoneinfo" feature flag. It has the upside of being fully backwards-compatible, but the downside of prolonging dependence on pytz.

Another option is to modify the shims so that `normalize` always raises an exception instead of a warning (or maybe it raises an exception for anything except UTC and fixed offsets). In that case, version 4.0 will mostly just work and start raising deprecation warnings, but there will be a hard breakage for anyone who would be negatively affected by the change in semantics. This would still leave a possible problem in the other direction, though:

>>> from datetime import datetime, timedelta
>>> from zoneinfo import ZoneInfo
>>> import pytz
>>> NYC_p = pytz.timezone("America/New_York")
>>> NYC = ZoneInfo("America/New_York")

>>> dtp_0 = NYC_p.localize(datetime(2020, 10, 31, 12))
>>> dtp_1 = NYC_p.localize(datetime(2020, 11, 1, 12))
>>> (dtp_1 - dtp_0 ) / timedelta(hours=1)
25.0

>>> dtz_0 = datetime(2020, 10, 31, 12, tzinfo=NYC)
>>> dtz_1 = datetime(2020, 11, 1, 12, tzinfo=NYC)
>>> (dtz_1 - dtz_0) / timedelta(hours=1)
24.0

This occurs because localized pytz zones are different tzinfo objects, and as such comparisons and subtraction use inter-zone semantics. Of course, you'll have this same problem even with a "hard break", since unlike invocation of `normalize` and `localize`, subtraction operations will succeed if you swap out the attached tzinfo for a zoneinfo tzinfo.

If we go with any variation of using shim-around-zoneinfo like pytz-deprecation-shim, I'd say those shims need to be introduced as a breaking change in Django 4.0. If we go with shim-around-pytz, I think that can safely be introduced in 3.2 (though that would require simultaneously adding support for using zoneinfo, and even then it might mostly force people to either do the migration in a single huge step or to involve some wrapper functions for handling the period of time where the time zone type is not consistent throughout the application).

Best,
Paul

On 10/9/20 9:31 AM, Kevin Henry wrote:
I think that the simplest approach—the one that would result in the least amount of total work for both Django and its users—would be to adopt Nick's suggestion and just switch to zoneinfo in 4.0. The problem is that it's very hard to square that with Django's stability policy: "We’ll only break backwards compatibility of these APIs without a deprecation process if a bug or security hole makes it completely unavoidable."

If we're going to follow the deprecation process, then there needs to be some overlap where both ways of doing things are possible. The shims package is a promising approach, but the fact that it's not actually backwards compatible with pytz is a serious problem. Adopting it directly as Carlton proposes also seems to violate the stability policy, albeit in a less severe way.


signature.asc

Adam Johnson

unread,
Oct 9, 2020, 1:10:29 PM10/9/20
to django-d...@googlegroups.com
The deprecation route using pytz_deprecation_shim in 4.0 changing to zoneinfo in 5.0

I'm in favour of this plan, with the feature flag to use zoneinfo. As Carlton wrote, a hard change that requires modification of code will stop users from upgrading, and that's not great.

Another option is to modify the shims so that `normalize` always raises an exception instead of a warning (or maybe it raises an exception for anything except UTC and fixed offsets).

It's possible to elevate a warning to an error, using warnings.simplefilter. I think it'd be better to document how to do this rather than always exception, so the exceptions become opt-in.



--
Adam

Kevin Henry

unread,
Oct 9, 2020, 2:46:30 PM10/9/20
to Django developers (Contributions to Django itself)
> It is not really possible to make the shims work the same way because there's not enough information available to determine whether an adjustment needs to be made.

But since you're shimming pytz, don't you, by definition, have access to the all the same information that it has?

So, for example, you could wrap pytz and keep a shadow copy of the pytz tzobject inside your tzobject, and use that to determine the correct behavior whenever a pytz-specific call is made. So when localize() is called you call localize on your internal object, and store that pytz EST tzobject. Then when normalize() is called you use that to get pytz's version of the EDT time.



> This occurs because localized pytz zones are different tzinfo objects, and as such comparisons and subtraction use inter-zone semantics.

Thank you for that example, I hadn't considered that. Unfortunately that is another fundamental incompatibility between pytz and any shim-around-zoneinfo (here is a runnable version). I can't think of any way around that one.



> Of course, there is another option, which is to, rather than adopting a wrapper around zoneinfo, adopt a wrapper around pytz that does not follow PEP 495, but instead just deprecates `pytz`'s API and tells people to turn on the "use zoneinfo" feature flag.

Agreed, that is the other main option. It has a few advantages:

- It's backwards-compatible.
- Because it's backwards compatible it could be adopted in 3.2, allowing a complete transition to zoneinfo by 4.2, a full two years earlier than the shims approach.
- It only requires users to think about the change once, when they opt in to the new approach. Using the shim means you have to think about this issue at least twice: once when the shim is dropped in and you have to figure out if you're affected by the backwards incompatibilities; and once (or more) when you actually make the change (or a series of changes) over to the native zoneinfo style.

The main disadvantage—and a real one—is that it's more work for Django.


Cheers,
Kevin

Paul Ganssle

unread,
Oct 9, 2020, 3:21:38 PM10/9/20
to django-d...@googlegroups.com


On 10/9/20 2:46 PM, Kevin Henry wrote:
> It is not really possible to make the shims work the same way because there's not enough information available to determine whether an adjustment needs to be made.

But since you're shimming pytz, don't you, by definition, have access to the all the same information that it has?

No for two reasions:

1. The shims wrap zoneinfo (and dateutil, though that is not relevant in this case), they do not wrap pytz (and in fact do not have a dependency on pytz, though there is some magic that allows them to play nicely with things namespaced in pytz).
2. pytz's mechanism for attaching time zones is incompatible with PEP 495. It would not really be possible to have the shims work both as pytz zones and as PEP 495 zones in all cases.

You are right that it would be possible to make it so that `shim_zone.localize()` basically does what pytz does, attaching a tzinfo specific to the offset that applies at that time, and for shim_zone.normalize() to make sure that the one attached is the right one, while also allowing `tzinfo=shim_zone` to work as a PEP 495 zone. That would make it very difficult to reason about the system, though. It would make it so that depending on how you attached your time zone, you would get different semantics for comparisons and arithmetic. Sometimes normalize would work and sometimes it wouldn't. So, for example, imagine you have:

def f(dt):
    return shim_zone.localize(dt)

x = shim_zone.normalize(f(datetime(2020, 10, 31, 12) + timedelta(days=1))


If someone changes `f` to instead use `dt.replace(tzinfo=shim_zone)`, the value for `x` changes, because some function you have is no longer using `localize`. Similarly, if we have say `datetime.now(shim_zone)` return a non-localized datetime, you have differences in semantics between `shim_zone.localize(datetime.now())` and `datetime.now(shim_zone)` (both of which are valid with pytz). If we have it return a localized datetime, subtraction and comparison semantics would be affected, because now times localized to one or the other offset will be inter- rather than intra-zone comparisons.

Unfortunately there's simply no way to make it fully backwards and forwards compatible. The best options I see are a shim around pytz in 3.2 that just adds warnings and doesn't do anything else, followed by a hard break in 4.0 or pytz-deprecation-shim in 4.0 and hard break in 5.0.

I think both are fine plans. I suspect that the slower plan will get people upgrading to 4.0 much faster, but it does have the disadvantage that some of the breakage is subtle and won't raise big errors (which is also the case, though to a lesser extent, with the faster plan).

In any case, it seems uncontroversial that 3.2 should support "bring your own zoneinfo", and I think most people agree that a feature flag in 3.2 is also a good idea, so to the extent that I have time to work on this, I'll work on those things.


Best,
Paul


signature.asc

Jure Erznožnik

unread,
Oct 10, 2020, 2:58:44 AM10/10/20
to django-d...@googlegroups.com

Sorry guys for asking a really stupid question, but :

I just made a search for pytz in Django master and found 17 occurrences in 5 files. More in docs and tests though. But still.

Isn't what we're debating here moot since Django itself doesn't really depend on pytz all that heavily? I mean, I realise the difference between the libraries bears grave consequences, but not in Django itself, AFAIR.

Seems like changing the implementation such that it would be able to use either approach (e.g. via a setting & a common import wrapper) shouldn't be too much of a hassle anyway.

Or am I missing something really obvious here?

LP,
Jure

P.S., but totally irrelevant to the discussion: I always found having to import pytz to handle TZ-related stuff wasn't optimal. I would have preferred having access to the necessary API from Django's framework.

Aymeric Augustin

unread,
Jan 2, 2021, 4:29:29 AMJan 2
to django-d...@googlegroups.com
Hello,

As the original author of support for timezone aware datetimes in Django, I've been meaning to review this... for six months... Better late than never I guess?

In this discussion, we're assuming settings.USE_TZ = True.


The original design, which still matches 80% of the current implementation, was very much based on advice found in the documentation of pytz. This has three consequences:

1. By default, users manipulate aware datetimes in UTC, which limits the breakage to cases where they explicitly switch to another timezone. This is auditable e.g. "look for timezone.activate or timezone.override in your code".

2. Django provides custom wrappers in order to take care of the pitfalls of pytz automatically e.g. timezone.make_aware. Backwards-compatibility can be implemented transparently there.

3. A strategy designed for migrating from pytz to zoneinfo should be applicable to Django.


I'm seeing three areas that need care:

A. APIs returning a tzinfo object, currently a pytz timezone (other than UTC — we switched from Django's custom definition of the UTC tzinfo to pytz' definition in the past without trouble).

B. APIs returning aware datetimes, currently in a pytz timezone (other than UTC).

C. APIs performing datetime conversions in the database, which is typically used for aggregating by day in a given timezone. This depends on the timezone name. I think we're fine on this front since we're keeping the same timezone names.

So the primary concern is leaking pytz tzinfo objects (either directly, or via aware datetime objects), to user code that requested it explicitly. I may sound like I'm belaboring the point. However, I think we can make a better backwards-compatibility decision with an accurate estimate of the extent of the breakage.


Django currently references pytz in the following places:

- timezone.get_default/current_timezone => see case A above.
- timezone.get_default/current_timezone_name => shouldn't be an issue since time zone names don't change.
- BaseDatabaseWrapper.timezone => for timezone conversions in Python code via make_aware / make_naive; see case B above.
- BaseDatabaseWrapper.timezone_name => for timezone conversions in the database via database-specific SQL functions; see case C above.

With SQLite, the SQL conversion functions are implemented in Python and use pytz, but the end result is the same.


My suggestion would be:

- Switch everyone to zoneinfo (or backports.zoneinfo) by default in 4.0
- Provide a temporary, immediately deprecated setting USE_PYTZ_DEPRECATION_SHIM in 4.0 and remove it in 5.0

Why?

- Most users don't use timezone.activate, timezone.override, or the DATABASES.TIME_ZONE setting to get aware datetimes in a timezone other than UTC for the purpose of doing datetime arithmetic in local time. Let's keep the upgrade instructions simple for the majority of users! "If you don't do anything fancy with timezones, you don't need to worry about this!"

- But some users need a more gradual path, especially those with large codebases. Conscious use of pytz_deprecation_shim looks like the best plan for them. Since it isn't easy to monkey-patch, let's instruct them to set USE_PYTZ_DEPRECATION_SHIM = True so they can upgrade now and fix deprecation warning later. This would be Paul's pull request, but conditional on the setting.

I considered using "can we import pytz_deprecation_shim?" as a signal instead of "is USE_PYTZ_DEPRECATION_SHIM set?" but this wouldn't work well if pytz_deprecation_shim becomes an indirect dependency. Then shims could get accidentally activated and the user wouldn't have a good solution.


I hope this helps!

-- 
Aymeric.

Carlton Gibson

unread,
Jan 3, 2021, 3:13:21 AMJan 3
to django-d...@googlegroups.com
Hi Aymeric. 

Thanks for inputting! 

I need to read in-depth what you’ve said here, but IIUC-at-first-pass: your suggestion differs in jumping straight to the end-point with a fallback for those who need it. (In contrast to getting folks to opt-in early if they want it.) This sounds better if we can. 

The Autumn hasn’t allowed time to make progress here but, what did you think about allowing an early opt-in for Django v3.2? (I’m not sure we have capacity to get the in in-time now anyway, so it may be moot.)

Kind regards, 
Carlton.



--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

Aymeric Augustin

unread,
Jan 3, 2021, 3:43:01 AMJan 3
to django-d...@googlegroups.com
Hi Carlton,

IIUC-at-first-pass: your suggestion differs in jumping straight to the end-point with a fallback for those who need it. (In contrast to getting folks to opt-in early if they want it.)

Indeed. The majority will switch when the default changes; better earlier than later.

what did you think about allowing an early opt-in for Django v3.2?

The more options we create, the more confusion we may generate e.g. when users try to understand release notes or to ask for support. This is why I tried to keep options to a minimum (1. initial situation, 2. target situation, 3. with shim) and to make as few users as possible go through option 3.

Implementing USE_PYTZ_DEPRECATION_SHIM is Django 3.2 makes the situation a bit more complex, as the setting will mean "use pds instead of pytz" in 3.2 and "use pds instead of zoneinfo" in 4.0. We'd have four options instead of three.

I'm pretty confident that the transition will go smoothly and I don't think the flaws in pytz are bad enough to warrant urgent action. As a consequence, offering opt-in in Django 3.2 doesn't seem all that valuable to me.

I'm OK with starting the migration in one year in Django 4.0. I'm also OK with offering the opt-in in Django 3.2 if others have a different value assessment!

Cheers,

-- 
Aymeric.



William Schwartz

unread,
Jan 5, 2021, 7:20:18 AMJan 5
to Django developers (Contributions to Django itself)
Just wanted to chime in with a +1 from a user in favor of moving away from pytz. Doing so will be very helpful for frozen Python environments: https://bugs.launchpad.net/pytz/+bug/1834363

Nick Pope

unread,
Jan 6, 2021, 10:12:32 AMJan 6
to Django developers (Contributions to Django itself)
Hi Carlton,

Sorry I didn't reply on the PR about advancing anything for 3.2. I ran out of capacity and at this late stage it is best to wait until 4.0 anyway.

I see that Aymeric is in favour of forging ahead to use zoneinfo in 4.0 as was my preference, but with the addition of an opt-out falling back to the deprecation shim.

I'm also +1 for this approach, rather than an opt-in. I don't think it is a bad idea to promote this and get people to think about it sooner rather than later.
For those with complex and/or significant use of  pytz's API, there is then the option to opt-out and have another two or three releases to address the issue.

Cheers,

Nick

Carlton Gibson

unread,
Jan 6, 2021, 10:44:43 AMJan 6
to Django developers (Contributions to Django itself)
Hey Nick,

Super, no problem. 4.0 is fine. (And Aymeric's option does sound better yes.)

Thanks!
C

Adam Johnson

unread,
Jan 6, 2021, 1:45:42 PMJan 6
to django-d...@googlegroups.com
I'm also +1 on Aymeric's suggestion.



--
Adam
Reply all
Reply to author
Forward
0 new messages