Howdy howdy. While working on my PEP I stumbled over a lot of
behavior by
annotations that I found inconsistent and inconvenient. I think
there are
several problems here that needs fixing. This discussion will
probably
evolve into a PEP, and I'll be happy to steer that process. But
I'm less
certain about what the right thing to do is. (Although I do know
what I'd
prefer!) So let's talk about it!
Annotations are represented in Python as a dictionary. They can
be present
on functions, classes, and modules as an attribute called
"__annotations__".
We start with: how do you get the annotations from one of these
objects?
Surely it's as easy as this line from Lib/inspect.py shows us:
return func.__annotations__
And yes, that's best practice for getting an annotation from a
function object.
But consider this line from Lib/functools.py:
ann = getattr(cls, '__annotations__', {})
Huh. Why doesn't it simply look at cls.__annotations__? It's
because the
language declares that __annotations__ on a class or module is
optional.
Since cls.__annotations__ may not be defined, evaluating that
might throw an
exception. Three-argument getattr() is much safer, and I assert
it's best
practice for getting the annotations from a module.
But consider this line from Lib/dataclasses.py:
cls_annotations = cls.__dict__.get('__annotations__', {})
And a very similar line from Lib/typing.py:
ann = base.__dict__.get('__annotations__', {})
Huh! Why is this code skipping the attribute entirely, and
examining
cls.__dict__? It's because the getattr() approach has a subtle
bug when
dealing with classes. Consider this example:
class A:
ax:int=3
class B(A):
pass
print(getattr(B, '__annotations__', {}))
That's right, B *inherits* A.__annotations__! So this prints
{'ax': int}.
This *can't* the intended behavior of __annotations__ on classes.
It's only
supposed to contain annotations for the fields of B itself, not
those of one
of its randomly-selected base classes. But that's how it behaves
today--and
people have had to work around this behavior for years. Examining
the class
dict is, sadly, best practice for getting __annotations__ from a
class.
So, already: three different objects can have __annotations__, and
there are
three different best practices for getting their __annotations__.
Let's zoom out for a moment. Here's the list of predefined data
fields
you can find on classes:
__annotations__
__bases__
__class__
__dict__
__doc__
__module__
__mro__
__name__
__qualname__
All of these describe metadata about the class. In every case
*except one*,
the field is mandatory, which also means it's never inherited. And
in every
case *except one*, you cannot delete the field. (Though you *are*
allowed
to overwrite some of them.)
You guessed it: __annotations__ is the exception. It's optional,
and
you're allowed to delete it. And these exceptions are causing
problems.
It seems to me that, if the only way to correctly use a
language-defined
attribute of classes is by rooting around in its __dict__, the
design is
a misfire.
(Much of the above also applies to modules, too. The big
difference: since
modules lack inheritance, you don't need to look in their
__dict__.)
Now consider what happens if my "delayed annotations of
annotations using
descriptors" PEP is accepted. If that happens, pulling
__annotations__
out of the class dict won't work if they haven't been generated
yet. So
today's "best practice" becomes tomorrow's "this code doesn't
work".
To correctly examine class annotations, code would have to do
something
like this, which should work correctly in any Python 3.x version:
if (getattr(cls, '__co_annotations__', None)
or ('__annotations__' in cls.__dict__)):
ann = cls.__annotations__
else:
ann = {}
This is getting ridiculous.
Let's move on to a related topic. For each of the objects that
can
have annotations, what happens if o.__annotations__ is set, and
you
"del o.__annotations__", then you access o.__annotations__? It
depends on
what the object is, because each of them behaves differently.
You already know what happens with classes: if any of the base
classes has
__annotations__ set, you'll get the first one you find in the
MRO. If none
of the bases have __annotations__ set you'll get an
AttributeError.
For a module, if you delete it then try to access it, you'll
always get
an AttributeError.
For a function, if you delete it then try to get it, the function
will
create a new empty dict, store it as its new annotations dict, and
return
that. Why does it do that? I'm not sure. The relevent PEP
(3107) doesn't
specify this behavior.
So, annotations can be set on three different object types, and
each of
those three have a different behavior when you delete the
annotations
then try to get them again.
As a final topic: what are the permitted types for
__annotations__?
If you say "o.__annotations__ = <x>", what types are and
aren't
allowed for <x>?
For functions, __annotations__ may be assigned to either None or
a dict (an object that passes PyDict_Check). Anything else throws
a
TypeError. For classes and modules, no checking is done
whatsoever,
and you can set __annotations__ on those two to any Python object.
While "a foolish consistency is the hobgoblin of little minds",
I don't see the benefit of setting a module's __annotations__ to
2+3j.
I think it's long past time that we cleaned up the behavior of
annotations.
They should be simple and consistent across all objects that
support them.
At the very least, I think we should make cls.__annotations__
required rather
than optional, so that it's never inherited. What should its
default value
be? An empty dict would be more compatible, but None would be
cheaper.
Note that creating the empty dict on the fly, the way function
objects do,
wouldn't really help--because current best practice means looking
in
cls.__dict__.
I also think you shouldn't be able to delete __annotations__ on
any of the
three objects (function, class, module). It should always be set,
so that
the best practice for accessing annotations on an object is always
o.__annotations__.
If I could wave my magic wand and do whatever I wanted, I'd change
the
semantics for __annotations__ to the following:
* Functions, classes, and modules always have an __annotations__
member set.
* "del o.__annotations__" always throws a TypeError.
* The language will set __annotations__ to a dict if the object
has
annotations, or None if it has no annotations.
* You may set __annotations__, but you can only set it to either
None or a
dict (passes PyDict_Check).
* You may only access __annotations__ as an attribute, and because
it's
always set, best practice is to use "o.__annotations__" (though
getattr
will always work too).
* How __annotations__ is stored is implementation-specific
behavior;
looking in the relevant __dict__ is unsupported.
This would grant sanity and consistency to __annotations__ in a
way it's
never so far enjoyed. The problem is, it's a breaking change.
But the
existing semantics are kind of terrible, so at this point my goal
is to
break them. I think the best practice needs to stop requiring
examining
cls.__dict__; in fact I'd prefer people stop doing it altogether.
If we change the behavior as part of a new release of Python,
code that examines annotations on classes can do a version check:
if (sys.version_info.major >=3
and sys.version_info.minor >= 10):
def get_annotations(o):
return o.__annotations__ or {}
else:
def get_annotations(o):
# eight or ten lines of complex code goes here
...
Or code can just use inspect.get_type_hints(), which is tied to
the Python version
anyway and should always do the right thing.
/arry
Or code can just use inspect.get_type_hints(), which is tied to the Python version
anyway and should always do the right thing.
If I could wave my magic wand and do whatever I wanted, I'd change the
semantics for __annotations__ to the following:
* Functions, classes, and modules always have an __annotations__ member set.
* "del o.__annotations__" always throws a TypeError.
* The language will set __annotations__ to a dict if the object has
annotations, or None if it has no annotations.
* You may set __annotations__, but you can only set it to either None or a
dict (passes PyDict_Check).
* You may only access __annotations__ as an attribute, and because it's
always set, best practice is to use "o.__annotations__" (though getattr
will always work too).
* How __annotations__ is stored is implementation-specific behavior;
looking in the relevant __dict__ is unsupported.
This would grant sanity and consistency to __annotations__ in a way it's
never so far enjoyed. The problem is, it's a breaking change. But the
existing semantics are kind of terrible, so at this point my goal is to
break them. I think the best practice needs to stop requiring examining
cls.__dict__; in fact I'd prefer people stop doing it altogether.
If we change the behavior as part of a new release of Python,
code that examines annotations on classes can do a version check:
if (sys.version_info.major >=3
and sys.version_info.minor >= 10):
def get_annotations(o):
return o.__annotations__ or {}
else:
def get_annotations(o):
# eight or ten lines of complex code goes here
...
Or code can just use inspect.get_type_hints(), which is tied to the Python version
anyway and should always do the right thing.
/arry
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/AWKVI3NRCHKPIDPCJYGVLW4HBYTEOQYL/
Code of Conduct: http://python.org/psf/codeofconduct/
At last, a nibble on the other fishing line! ;-)
So the biggest potential breakages are code that:
- Directly get the attribute from __dict__
- The fact that it would no longer be inherited
Am I missing anything else?
Those are the big ones, the ones I expect people to actually experience. I can name three more breakages, though these get progressively more obscure:
I have no idea if anybody is depending on these behaviors. The
lesson that years of Python core dev has taught me is: if Python
exhibits a behavior, somebody out there depends on it, and you'll
break their code if you change it. Or, expressed more succinctly,
any change is a breaking change for somebody. So the question is,
is the improvement this brings worth the breakage it also brings?
In this case, I sure hope so!
For issue #2, if the default was `None`, then couldn't that be used as an implicit feature marker that you can't (incorrectly) rely on inheritance to percolate up the annotations of the superclass if the subclass happens to not define any annotations?
Currently Python never sets o.__annotations__ to None on any object. So yes, assuming the user doesn't set it to None themselves, this would be new behavior. If I understand your question correctly, yes, users could write new code that says
if o.__annotations__ is None:
# great, we're in Python 3.10+ and no annotation was set on o!
....
else:
....
Or they could just look at sys.version_info ;-)
Thanks for your feedback,
/arry
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/WAWHRS6RYMCOVQEFMLLRXIYLCHF4FHUJ/
Code of Conduct: http://python.org/psf/codeofconduct/
The easiest thing would be just to create an empty `__annotations__` for classes that have no annotated variables, and to hell with the cost.
I assume you'd keep the existing behavior where functions lazy-create an empty dict if they have no annotations too?
That all would work fine and be consistent, but you'd probably have to set the empty __annotations__ dict on modules too. I've noticed that code that examines annotations tends to handle two classes of objects: "functions" and "not-functions". Modules also store their __annotations__ in their __dict__, so the same code path works fine for examining the annotations of both classes and modules.
(I noticed that `__slots__` is missing from your list. Maybe because it follows yet another pattern?)
I forgot about __slots__! Yup, it's optional, and you can even
delete it, though after the class is defined I'm not sure how much
difference that makes.
Slots intelligently support inheritance, too. I always kind of wondered why annotations didn't support inheritance--if D is a subclass of C, why doesn't D.__annotations__ contain all C's annotations too? But we're way past reconsidering that behavior now.
Cheers,
/arry
On 1/11/21 4:39 PM, Guido van Rossum wrote:
The easiest thing would be just to create an empty `__annotations__` for classes that have no annotated variables, and to hell with the cost.I assume you'd keep the existing behavior where functions lazy-create an empty dict if they have no annotations too?
That all would work fine and be consistent, but you'd probably have to set the empty __annotations__ dict on modules too. I've noticed that code that examines annotations tends to handle two classes of objects: "functions" and "not-functions". Modules also store their __annotations__ in their __dict__, so the same code path works fine for examining the annotations of both classes and modules.
(I noticed that `__slots__` is missing from your list. Maybe because it follows yet another pattern?)
I forgot about __slots__! Yup, it's optional, and you can even delete it, though after the class is defined I'm not sure how much difference that makes.
Slots intelligently support inheritance, too. I always kind of wondered why annotations didn't support inheritance--if D is a subclass of C, why doesn't D.__annotations__ contain all C's annotations too? But we're way past reconsidering that behavior now.
On Mon, Jan 11, 2021 at 5:21 PM Larry Hastings <la...@hastings.org> wrote:
Slots intelligently support inheritance, too. I always kind of wondered why annotations didn't support inheritance--if D is a subclass of C, why doesn't D.__annotations__ contain all C's annotations too? But we're way past reconsidering that behavior now.
Anyway, `__slots__` doesn't behave that way -- seems it behaves similar to `__annotations__`.
__slots__ itself doesn't behave that way, but subclasses do inherit the slots defined on their parent:
class C:
__slots__ = ['a']
class D(C):
__slots__ = ['b']
d = D()
d.a = 5
d.b = "foo"
print(f"{d.a=} {d.b=}")
prints
d.a=5 d.b='foo'
That's the inheritance behavior I was referring to.
Cheers,
/arry
On 12/01/21 6:22 am, Larry Hastings wrote:
* The language will set __annotations__ to a dict if the object has
annotations, or None if it has no annotations.
That sounds inconvenient -- it means that any code referencing
__annotations__ has to guard against the possibility of it being
None.
It was a balancing act. Using an 64-byte empty dict per object
with no defined annotations seems so wasteful. And anything short
of an empty dict, you'd have to guard against. Current code
already has to guard against "__annotations__ aren't set" anyway,
so I figured the cost of migrating to checking a different
condition would be small. And None is so cheap, and the guard is
so easy:
if o.__annotations__:
If we're changing things, I'm wondering if the best thing would be
to introduce an annotations() function as the new best practice for
getting an object's annotations. It would know how to handle all
the type-specific pecularities, and could take care of things such
as manufacturing an empty dict if the object doesn't have any
annotations.
I guess I'm marginally against this, just because it seems like a
needless change. We don't need the flexibility of a function with
optional parameters and such, and with a data descriptor we can
already put code behind __annotations__ (as I have already done).
Plus, the function should probably cache its result--you wouldn't
want code that called it ten times to generate ten fresh dicts,
would you?--and already we're most of the way to what I proposed
in PEP 649.
Cheers,
/arry
On Tue, Jan 12, 2021 at 12:56 PM Larry Hastings <la...@hastings.org> wrote:It was a balancing act. Using an 64-byte empty dict per object with no defined annotations seems so wasteful. And anything short of an empty dict, you'd have to guard against. Current code already has to guard against "__annotations__ aren't set" anyway, so I figured the cost of migrating to checking a different condition would be small. And None is so cheap, and the guard is so easy: if o.__annotations__:Does it have to be mutable? If not, maybe there could be a singleton "immutable empty dict-like object", in the same way that an empty tuple can be put anywhere that expects a sequence. That'd be as cheap as None (modulo a once-per-interpreter cost for the additional static object).
Historically, annotations dicts are mutable. I don't know how often people mutate them, but I would assume it's uncommon. So technically this would be a breaking change. But it does seem low-risk.
Cheers,
/arry
On 12/01/21 2:21 pm, Larry Hastings wrote:
Slots intelligently support inheritance, too.
Are you sure about that? My experiments suggest that it has
the same problem as __annotations__:
Python 3.8.2 (default, Mar 23 2020, 11:36:18)
[Clang 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class C:
... __slots__ = ['a', 'b']
...
>>> class D(C):
... __slots__ = ['c', 'd']
...
>>> class E(D):
... pass
...
>>> C.__slots__
['a', 'b']
>>> D.__slots__
['c', 'd']
>>> E.__slots__
['c', 'd']
>>>
Guido said the same thing. I did say "Slots", not "__slots__", though. You'll find that your class D supports attributes "a", "b", "c", and "d", and that's the inheritance I was referring to.
Cheers,
/arry
On 1/11/21 5:05 PM, Greg Ewing wrote:
On 12/01/21 6:22 am, Larry Hastings wrote:
* The language will set __annotations__ to a dict if the object has
annotations, or None if it has no annotations.
That sounds inconvenient -- it means that any code referencing
__annotations__ has to guard against the possibility of it being
None.
It was a balancing act. Using an 64-byte empty dict per object with no defined annotations seems so wasteful. And anything short of an empty dict, you'd have to guard against. Current code already has to guard against "__annotations__ aren't set" anyway, so I figured the cost of migrating to checking a different condition would be small. And None is so cheap, and the guard is so easy:
if o.__annotations__:
If we're changing things, I'm wondering if the best thing would be
to introduce an annotations() function as the new best practice for
getting an object's annotations. It would know how to handle all
the type-specific pecularities, and could take care of things such
as manufacturing an empty dict if the object doesn't have any
annotations.
I guess I'm marginally against this, just because it seems like a needless change. We don't need the flexibility of a function with optional parameters and such, and with a data descriptor we can already put code behind __annotations__ (as I have already done). Plus, the function should probably cache its result--you wouldn't want code that called it ten times to generate ten fresh dicts, would you?--and already we're most of the way to what I proposed in PEP 649.
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.
Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on
classes and modules, and not worry about the memory consumption,
that goes a long way to cleaning up the semantics.
And I know you were somewhat joking when you mentioned using sys.version_info, but since this would be behind a __future__ import
Would it?
My original proposal would make breaking changes to how you
examine __annotations__. Let's say we put those behind a from
__future__ import. Now we're gonna write library code that
examines annotations. A user passes in a class and asks us to
examine its annotations. The old semantics might be active on it,
or the new ones. How do we know which set of semantics we need to
use?
It occurs to me that you could take kls.__module__, pull out the module from sys.modules, then look inside to see if it contains the correct "future" object imported from the __future__ module. Is that an approach we would suggest to our users?
Also, very little code ever examines annotations; most code with
annotations merely defines them. So I suspect most annotations
users wouldn't care either way--which also means a "from
__future__ import" that changes the semantics of examining or
modifying annotations isn't going to see a lot of uptake, because
it doesn't really affect them. The change in semantics only
affects people whose code examines annotations, which I suspect is
very few.
So I wasn't really joking when I proposed making these changes without a from __future__ import, and suggested users use a version check. The library code would know based on the Python version number which semantics were active, no peeking in modules to find future object. They could literally write what I suggested:
if you know you're running python 3.10 or higher:
examine using the new semantics
else:
examine using the old semantics
I realize that's a pretty aggressive approach, which is why I prefaced it with "if I could wave my magic wand". But if we're going to make breaking changes, then whatever we do, it's going to break some people's code until it gets updated to cope with the new semantics. In that light this approach seemed reasonable.
But really this is why I started this thread in the first place.
My idea of what's reasonable is probably all out of whack. So I
wanted to start the conversation, to get feedback on how much
breakage is allowable and how best to mitigate it. If it wasn't a
controversial change, then we wouldn't need to talk about it!
And finally: if we really do set a default of an empty dict on classes and modules, then my other in-theory breaking changes:
will, I expect, in practice breaking exactly zero code. Who
deletes __annotations__? Who ever sets __annotations__ to
something besides a dict? So if the practical breakage is zero,
why bother gating it with "from __future__ import" at all?
I think it really means people need to rely on typing.get_type_hints() more than they may be doing right now.
What I find frustrating about that answer--and part of what motivated me to work on this in the first place--is that typing.get_type_hints() requires your annotations to be type hints. All type hints are annotations, but not all annotations are type hints, and it's entirely plausible for users to have reasonable uses for non-type-hint annotations that typing.get_type_hints() wouldn't like.
The two things typing.get_type_hints() does, that I know of, that
can impede such non-type-hint annotations are:
PEP 484 "explicitly does NOT prevent other uses of annotations". But if you force everyone to use typing.get_type_hints() to examine their annotations, then you have de facto prevented any use of annotations that isn't compatible with type hints.
Cheers,
/arry
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
And I know you were somewhat joking when you mentioned using sys.version_info, but since this would be behind a __future__ importWould it?
My original proposal would make breaking changes to how you examine __annotations__. Let's say we put those behind a from __future__ import. Now we're gonna write library code that examines annotations. A user passes in a class and asks us to examine its annotations. The old semantics might be active on it, or the new ones. How do we know which set of semantics we need to use?
It occurs to me that you could take kls.__module__, pull out the module from sys.modules, then look inside to see if it contains the correct "future" object imported from the __future__ module. Is that an approach we would suggest to our users?
Also, very little code ever examines annotations; most code with annotations merely defines them. So I suspect most annotations users wouldn't care either way--which also means a "from __future__ import" that changes the semantics of examining or modifying annotations isn't going to see a lot of uptake, because it doesn't really affect them. The change in semantics only affects people whose code examines annotations, which I suspect is very few.
So I wasn't really joking when I proposed making these changes without a from __future__ import, and suggested users use a version check. The library code would know based on the Python version number which semantics were active, no peeking in modules to find future object. They could literally write what I suggested:
if you know you're running python 3.10 or higher:
examine using the new semantics
else:
examine using the old semantics
I realize that's a pretty aggressive approach, which is why I prefaced it with "if I could wave my magic wand". But if we're going to make breaking changes, then whatever we do, it's going to break some people's code until it gets updated to cope with the new semantics. In that light this approach seemed reasonable.
But really this is why I started this thread in the first place. My idea of what's reasonable is probably all out of whack. So I wanted to start the conversation, to get feedback on how much breakage is allowable and how best to mitigate it. If it wasn't a controversial change, then we wouldn't need to talk about it!
And finally: if we really do set a default of an empty dict on classes and modules, then my other in-theory breaking changes:
- you can't delete __annotations__
- you can only set __annotations__ to a dict or None (this is already true of functions, but not of classes or modules)
will, I expect, in practice breaking exactly zero code. Who deletes __annotations__? Who ever sets __annotations__ to something besides a dict? So if the practical breakage is zero, why bother gating it with "from __future__ import" at all?
I think it really means people need to rely on typing.get_type_hints() more than they may be doing right now.
What I find frustrating about that answer--and part of what motivated me to work on this in the first place--is that typing.get_type_hints() requires your annotations to be type hints. All type hints are annotations, but not all annotations are type hints, and it's entirely plausible for users to have reasonable uses for non-type-hint annotations that typing.get_type_hints() wouldn't like.
The two things typing.get_type_hints() does, that I know of, that can impede such non-type-hint annotations are:
- It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)".
- It regards all string annotations as "forward references", which means they get eval()'d and the result returned as the annotation. typing.get_type_hints() doesn't catch any exceptions here, so if the eval fails, typing.get_type_hints() fails and you can't use it to examine your annotations.
PEP 484 "explicitly does NOT prevent other uses of annotations". But if you force everyone to use typing.get_type_hints() to examine their annotations, then you have de facto prevented any use of annotations that isn't compatible with type hints.
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
And I know you were somewhat joking when you mentioned using sys.version_info, but since this would be behind a __future__ importWould it?
My original proposal would make breaking changes to how you examine __annotations__. Let's say we put those behind a from __future__ import. Now we're gonna write library code that examines annotations. A user passes in a class and asks us to examine its annotations. The old semantics might be active on it, or the new ones. How do we know which set of semantics we need to use?
It occurs to me that you could take kls.__module__, pull out the module from sys.modules, then look inside to see if it contains the correct "future" object imported from the __future__ module. Is that an approach we would suggest to our users?
Also, very little code ever examines annotations; most code with annotations merely defines them. So I suspect most annotations users wouldn't care either way--which also means a "from __future__ import" that changes the semantics of examining or modifying annotations isn't going to see a lot of uptake, because it doesn't really affect them. The change in semantics only affects people whose code examines annotations, which I suspect is very few.
So I wasn't really joking when I proposed making these changes without a from __future__ import, and suggested users use a version check. The library code would know based on the Python version number which semantics were active, no peeking in modules to find future object. They could literally write what I suggested:
if you know you're running python 3.10 or higher:
examine using the new semantics
else:
examine using the old semantics
I realize that's a pretty aggressive approach, which is why I prefaced it with "if I could wave my magic wand". But if we're going to make breaking changes, then whatever we do, it's going to break some people's code until it gets updated to cope with the new semantics. In that light this approach seemed reasonable.
But really this is why I started this thread in the first place. My idea of what's reasonable is probably all out of whack. So I wanted to start the conversation, to get feedback on how much breakage is allowable and how best to mitigate it. If it wasn't a controversial change, then we wouldn't need to talk about it!
And finally: if we really do set a default of an empty dict on classes and modules, then my other in-theory breaking changes:
- you can't delete __annotations__
- you can only set __annotations__ to a dict or None (this is already true of functions, but not of classes or modules)
will, I expect, in practice breaking exactly zero code. Who deletes __annotations__? Who ever sets __annotations__ to something besides a dict? So if the practical breakage is zero, why bother gating it with "from __future__ import" at all?
I think it really means people need to rely on typing.get_type_hints() more than they may be doing right now.
What I find frustrating about that answer--and part of what motivated me to work on this in the first place--is that typing.get_type_hints() requires your annotations to be type hints. All type hints are annotations, but not all annotations are type hints, and it's entirely plausible for users to have reasonable uses for non-type-hint annotations that typing.get_type_hints() wouldn't like.
The two things typing.get_type_hints() does, that I know of, that can impede such non-type-hint annotations are:
- It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)".
- It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)".
Huh, I wasn't aware of that.
On Tue, Jan 12, 2021 at 8:00 PM Brett Cannon <br...@python.org> wrote:
- It turns a None annotation into type(None). Which means now you can't tell the difference between "None" and "type(None)".
Huh, I wasn't aware of that.This has tripped up many people. Maybe we should just bite the bullet and change this?
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
Could you get the best of both worlds by making __annotations__ an auto-populating descriptor on "type", the way it is on functions?
Continue to add a non-empty annotations dict to the class dict eagerly, but only add the empty dict when "cls.__annotations__" is accessed.
I think that'll work though it's a little imprecise. Consider the best practice for getting class annotations, example here from Lib/dataclasses.py:
cls_annotations = cls.__dict__.get('__annotations__', {})
What happens when that current best practice code meets your
proposed "lazy-populate the empty dict" approach?
So the code will continue to work, even though it's arguably a
little misguided. If anybody distinguished between "annotations
are unset" and "annotations are set to an empty dict", that code
would fail, but I expect nobody ever does that.
Two notes about this idea. First, I think most people who use
this best-practices code above use it for modules as well as
classes. (They have two code paths: one for functions, the other
for not-functions.) But everything I said above is true for both
classes and modules.
Second, I think this is only sensible if, at the same time, we make it illegal to delete cls.__annotations__. If we lazy-populate the empty dict, and a user deletes cls.__annotations__, and we don't remember some extra state, we'd just re-"lazy" create the empty dict the next time they asked for it. Which is actually what functions do, just lazy-repopulate the empty annotations dict every time, and I'm not keen to bring those semantics to classes and modules.
Cheers,
/arry
On 17/01/21 12:31 pm, Larry Hastings wrote:
Consider the best practice for getting class annotations, example here from Lib/dataclasses.py:
cls_annotations = cls.__dict__.get('__annotations__', {})
Isn't that going to get broken anyway? It won't trigger the
calling of __co_annotations__.
I proposed these as two separate conversations, because I wanted
to clean up the semantics of annotations whether or not PEP 649
was accepted. But, yes, if PEP 649 is accepted (in some form),
this current-best-practice would no longer work, and the new best
practice would likely become much more complicated.
Cheers,
/arry
[...] If anybody distinguished between "annotations are unset" and "annotations are set to an empty dict", that code would fail, but I expect nobody ever does that.
I do worry about the best practice getting worse if your PEP 649 is accepted.
A good part of what motivated me to start this second thread
("Let's Fix ...") was how much worse best practice would become if
PEP 649 is accepted. But if we accept PEP 649, and take
steps to fix the semantics of annotations, I think the resulting
best practice will be excellent in the long-run.
Let's assume for a minute that PEP 649 is accepted more-or-less
like it is now. (The name resolution approach is clearly going to
change but that won't affect the discussion here.) And let's
assume that we also change the semantics so annotations are always
defined (you can't delete them) and they're guaranteed to be
either a dict or None. (Permitting __annotations__ to be None
isn't settled yet, but it's most similar to current semantics, so
let's just assume it for now.)
Because the current semantics are kind of a mess, most people who examine annotations already have a function that gets the annotations for them. Given that, I really do think the best approach is to gate the code on version 3.10, like I've described before:
if python version >= 3.10:
def get_annotations(o):
return o.__annotations__
else:
def get_annotations(o):
if isinstance(o, (type, types.ModuleType)):
return o.__dict__.get("__annotations__", None)
else:
return o.__annotations__
This assumes returning None is fine. If it had to always return
a valid dict, I'd add "or {}" to the end of every return
statement.
Given that it already has to be a function, I think this approach is readable and performant. And, at some future time when the code can drop support for Python < 3.10, we can throw away the if statement and the whole else block, keeping just the one-line function. At which point maybe we'd refactor away the function and just use "o.__annotations__" everywhere.
I concede that, in the short term, now we've got nine lines and
two if statements to do something that should be
relatively straightforward--accessing the annotations on an
object. But that's where we find ourselves. Current best
practice is kind of a mess, and unfortunately PEP 649 breaks
current best practice anyway. My goal is to fix the semantics so
that long-term best practice is sensible, easy, and obvious.
Cheers,
/arry
Hm. It's unfortunate that this would break code using what is *currently* the best practice.
I can't figure out how to avoid it. The problem is, current best
practice sidesteps the class and goes straight to the dict. How
do we intercept that and run the code to lazy-calculate the
annotations?
I mean, let's consider something crazy. What if we change
cls.__dict__ from a normal dict to a special dict that handles the
__co_annotations__ machinery? That might work, except, we
literally allow users to supply their own cls.__dict__ via
__prepare__. So we can't rely on our special dict.
What if we change cls.__dict__ to a getset? The user is allowed
to set cls.__dict__, but when you get __dict__, we wrap the actual
internal dict object with a special object that intercepts
accesses to __annotations__ and handles the __co_annotations__
mechanism. That might work but it's really crazy and
unfortunate. And it's remotely possible that a user might
override __dict__ as a property, in a way that breaks this
mechanism too. So it's not guaranteed to always work.
I'm not suggesting we should do these things, I'm just trying to
illustrate how hard I think the problem is. If someone has a good
idea how we can add the __co_annotations__ machinery without
breaking current best practice I'd love to hear it.
Also, for functions and modules I would recommend `getattr(o, "__annotations__", None)` (perhaps with `or {}` added).
For functions you don't need to bother; fn.__annotations__ is guaranteed to always be set, and be either a dict or None. (Python will only ever set it to a dict, but the user is permitted to set it to None.)
I agree with your suggested best practice for modules as it
stands today.
And actually, let me walk back something I've said before. I believe I've said several times that "people treat classes and modules the same". Actually that's wrong.
So, for what it's worth, I literally have zero examples of people
treating classes and modules the same when it comes to
annotations. Sorry for the confusion!
I would also honestly discount what dataclasses.py and typing.py have to do. But what do 3rd party packages do when they don't want to use get_type_hints() and they want to get it right for classes? That would give an indication of how serious we should take breaking current best practice.
I'm not sure how to figure that out. Off the top of my head, the
only current third-party packages I can think of that uses
annotations are mypy and attrs. I took a quick look at mypy but I
can't figure out what it's doing.
attrs does something a little kooky. It access __annotations__
using a function called _has_own_attributes(), which detects
whether or not the object is inheriting an attribute. But it
doesn't peek in __dict__, instead it walks the mro and sees if any
of its base classes have the same (non-False) value for that
attribute.
Happily, that seems like it would continue to work even if PEP 649 is accepted. That's good news!
Cheers,
/arry
On 1/18/21 2:39 PM, Guido van Rossum wrote:
Hm. It's unfortunate that this would break code using what is *currently* the best practice.I can't figure out how to avoid it. The problem is, current best practice sidesteps the class and goes straight to the dict. How do we intercept that and run the code to lazy-calculate the annotations?
I mean, let's consider something crazy. What if we change cls.__dict__ from a normal dict to a special dict that handles the __co_annotations__ machinery? That might work, except, we literally allow users to supply their own cls.__dict__ via __prepare__. So we can't rely on our special dict.
What if we change cls.__dict__ to a getset? The user is allowed to set cls.__dict__, but when you get __dict__, we wrap the actual internal dict object with a special object that intercepts accesses to __annotations__ and handles the __co_annotations__ mechanism. That might work but it's really crazy and unfortunate. And it's remotely possible that a user might override __dict__ as a property, in a way that breaks this mechanism too. So it's not guaranteed to always work.
I'm not suggesting we should do these things, I'm just trying to illustrate how hard I think the problem is. If someone has a good idea how we can add the __co_annotations__ machinery without breaking current best practice I'd love to hear it.
Also, for functions and modules I would recommend `getattr(o, "__annotations__", None)` (perhaps with `or {}` added).
For functions you don't need to bother; fn.__annotations__ is guaranteed to always be set, and be either a dict or None. (Python will only ever set it to a dict, but the user is permitted to set it to None.)
I agree with your suggested best practice for modules as it stands today.
And actually, let me walk back something I've said before. I believe I've said several times that "people treat classes and modules the same". Actually that's wrong.
- Lib/typing.py treats functions and modules the same; it uses getattr(o, '__annotations__', None). It treats classes separately and uses cls.__dict__.get('__annotations__', {}).
- Lib/dataclasses.py uses fn.__annotations__ for functions and cls.__dict__.get('__annotations__', {}) for classes. It doesn't handle modules at all.
- Lib/inspect.py calls Lib/typing.py to get annotations. Which in retrospect I think is a bug, because annotations and type hints aren't the same thing. (typing.get_type_hints changes None to type(None), it evaluates strings, etc).
So, for what it's worth, I literally have zero examples of people treating classes and modules the same when it comes to annotations. Sorry for the confusion!
I would also honestly discount what dataclasses.py and typing.py have to do. But what do 3rd party packages do when they don't want to use get_type_hints() and they want to get it right for classes? That would give an indication of how serious we should take breaking current best practice.
I'm not sure how to figure that out. Off the top of my head, the only current third-party packages I can think of that uses annotations are mypy and attrs. I took a quick look at mypy but I can't figure out what it's doing.
attrs does something a little kooky. It access __annotations__ using a function called _has_own_attributes(), which detects whether or not the object is inheriting an attribute. But it doesn't peek in __dict__, instead it walks the mro and sees if any of its base classes have the same (non-False) value for that attribute.
Happily, that seems like it would continue to work even if PEP 649 is accepted. That's good news!
On Tue, Jan 12, 2021 at 6:35 PM Larry Hastings <la...@hastings.org> wrote:
On 1/12/21 5:28 PM, Brett Cannon wrote:
The other thing to keep in mind is we are talking about every module, class, and function getting 64 bytes ... which I bet isn't that much.Actually it's only every module and class. Functions don't have this problem because they've always stored __annotations__ internally--meaning, peeking in their __dict__ doesn't work, and they don't support inheritance anyway. So the number is even smaller than that.
If we can just make __annotations__ default to an empty dict on classes and modules, and not worry about the memory consumption, that goes a long way to cleaning up the semantics.
I would like that very much. And the exception for functions is especially helpful.
First of all, I've proposed a function that should also help a lot:
The function will be called inspect.get_annotations(o). It's like typing.get_type_hints(o) except less opinionated. This function would become the best practice for everybody who wants annotations**, like so:
import inspect
if hasattr(inspect, "get_annotations"):
how_i_get_annotations = inspect.get_annotations
else:
# do whatever it was I did in Python 3.9 and before...
** Everybody who specifically wants type hints should
instead call typing.get_type_hints(), and good news!, that
function has existed for several versions now. So they probably
already do call it.
I'd still like to add a default empty __annotations__ dict to all classes and modules for Python 3.10, for everybody who doesn't switch to using this as-yet-unwritten inspect.get_annotations() function. The other changes I propose in that thread (e.g. deleting __annotations__ always throws TypeError) would be nice, but honestly they aren't high priority. They can wait until after Python 3.10. Just these these two things (inspect.get_annotations() and always populating __annotations__ for classes and modules) would go a long way to cleaning up how people examine annotations.
Long-term, hopefully we can fold the desirable behaviors of
inspect.get_annotations() into the language itself, at which point
we could probably deprecate the function. That wouldn't be until
a long time from now of course.
Does this need a lot of discussion, or can I just go ahead with
the bpo and PR and such? I mean, I'd JFDI, as Barry always
encourages, but given how much debate we've had over annotations
in the last two weeks, I figured I should first bring it up here.
Happy two-weeks'-notice,
/arry
p.s. I completely forgot about this until just now--sorry. At
least I remembered before Python 3.10b1!
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/J4LZEIZTYZQWGIM5VZGNMQPWB5ZWVEXP/
Code of Conduct: http://python.org/psf/codeofconduct/
This is happening, right? Adding a default `__annotations = {}` to modules and classes. (Though https://bugs.python.org/issue43901 seems temporarily stuck.)
It's happening, and I wouldn't say it's stuck. I'm actively
working on it--currently puzzling my way through some wild unit
test failures. I expect to ship my first PR over the weekend.
Cheers,
/arry
So I now suspect that my knee-jerk answer is wrong. Am I going too far down the rabbit hole? Should I just make the change for user classes and leave builtin classes untouched? What do you think?
Cheers,
/arry
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
On Sat, 24 Apr 2021, 5:53 pm Larry Hastings, <la...@hastings.org> wrote:
So I now suspect that my knee-jerk answer is wrong. Am I going too far down the rabbit hole? Should I just make the change for user classes and leave builtin classes untouched? What do you think?
I'd suggest kicking the can down the road: leave builtin classes alone for now, but file a ticket to reconsider the question for 3.11.
In the meantime, inspect.get_annotations can help hide the discrepancy.
The good news: inspect.get_annotations() absolutely can handle it. inspect.get_annotations() is so paranoid about examining the object you pass in, I suspect you could pass in an old boot and it would pull out the annotations--if it had any.
Cheers,
/arry
On 24. 04. 21 9:52, Larry Hastings wrote:
I've hit a conceptual snag in this.
What I thought I needed to do: set __annotations__= {} in the module dict, and set __annotations__= {} in user class dicts. The latter was more delicate than the former but I think I figured out a good spot for both. I have this much working, including fixing the test suite.
But now I realize (*head-slap* here): if *every* class is going to have annotations, does that mean builtin classes too? StructSequence classes like float? Bare-metal type objects like complex? Heck, what about type itself?!
My knee-jerk initial response: yes, those too. Which means adding a new getsetdef to the type object. But that's slightly complicated. The point of doing this is to preserve the existing best-practice of peeking in the class dict for __annotations__, to avoid inheriting it. If I'm to preserve that, the get/set for __annotations__ on a type object would need to get/set it on tp_dict if tp_dict was not NULL, and use internal storage somewhere if there is no tp_dict.
It's worth noticing that builtin types don't currently have __annotations__ set, and you can't set them. (Or, at least, float, complex, and type didn't have them set, and wouldn't let me set annotations on them.) So presumably people using current best practice--peek in the class dict--aren't having problems.
So I now suspect that my knee-jerk answer is wrong. Am I going too far down the rabbit hole? Should I /just/ make the change for user classes and leave builtin classes untouched? What do you think?
Beware of adding mutable state to bulit-in (C static) type objects: these are shared across interpreters, so changing them can “pollute” unwanted contexts.
This has been so for a long time [0]. There are some subinterpreter efforts underway that might eventually lead to making __annotations__ on static types easier to add, but while you're certainly welcome to explore the neighboring rabbit hole as well, I do think you're going in too far for now :)
[0] https://mail.python.org/archives/list/pytho...@python.org/message/KLCZIA6FSDY3S34U7A72CPSBYSOMGZG3/
That's a good point! The sort of detail one forgets in the rush of the moment.
Given that the lack of annotations on builtin types already isn't a problem, and given this wrinkle, and generally given the "naw you don't have to" vibe I got from you and Nick (and the lack of "yup you gotta" I got from anybody else), I'm gonna go with not polluting the builtin types for now.
This is not to say that, in the fullness of time, those objects
should never have annotations. Even in the three random types I
picked in my example, there's at least one example: float.imag is
a data member and might theoretically be annotated. But we can
certainly kick this can down the road too. Maybe by the time we
get around to it, we'll have a read-only dictionary we can use for
the purpose.
Cheers,
/arry
This is not to say that, in the fullness of time, those objects should never have annotations. Even in the three random types I picked in my example, there's at least one example: float.imag is a data member and might theoretically be annotated. But we can certainly kick this can down the road too. Maybe by the time we get around to it, we'll have a read-only dictionary we can use for the purpose.