[Python-ideas] Proposal: Use mypy syntax for function annotations

780 views
Skip to first unread message

Guido van Rossum

unread,
Aug 13, 2014, 3:45:51 PM8/13/14
to Python-Ideas, Jukka Lehtosalo, Bob Ippolito
[There is no TL;DR other than the subject line. Please read the whole thing before replying. I do have an appendix with some motivations for adding type annotations at the end.]

Yesterday afternoon I had an inspiring conversation with Bob Ippolito (man of many trades, author of simplejson) and Jukka Lehtosalo (author of mypy: http://mypy-lang.org/). Bob gave a talk at EuroPython about what Python can learn from Haskell (and other languages); yesterday he gave the same talk at Dropbox. The talk is online (https://ep2014.europython.eu/en/schedule/sessions/121/) and in broad strokes comes down to three suggestions:

  (a) Python should adopt mypy's syntax for function annotations
  (b) Python's use of mutabe containers by default is wrong
  (c) Python should adopt some kind of Abstract Data Types

Proposals (b) and (c) don't feel particularly actionable (if you disagree please start a new thread, I'd be happy to discuss these further if there's interest) but proposal (a) feels right to me.

So what is mypy?  It is a static type checker for Python written by Jukka for his Ph.D. thesis. The basic idea is that you add type annotations to your program using some custom syntax, and when running your program using the mypy interpreter, type errors will be found during compilation (i.e., before the program starts running).

The clever thing here is that the custom syntax is actually valid Python 3, using (mostly) function annotations: your annotated program will still run with the regular Python 3 interpreter. In the latter case there will be no type checking, and no runtime overhead, except to evaluate the function annotations (which are evaluated at function definition time but don't have any effect when the function is called).

In fact, it is probably more useful to think of mypy as a heavy-duty linter than as a compiler or interpreter; leave the type checking to mypy, and the execution to Python. It is easy to integrate mypy into a continuous integration setup, for example.

To read up on mypy's annotation syntax, please see the mypy-lang.org website. Here's just one complete example, to give a flavor:

  from typing import List, Dict

  def word_count(input: List[str]) -> Dict[str, int]:
      result = {}  #type: Dict[str, int]
      for line in input:
          for word in line.split():
              result[word] = result.get(word, 0) + 1
      return result


Note that the #type: comment is part of the mypy syntax; mypy uses comments to declare types in situations where no syntax is available -- although this particular line could also be written as follows:

    result = Dict[str, int]()

Either way the entire function is syntactically valid Python 3, and a suitable implementation of typing.py (containing class definitions for List and Dict, for example) can be written to make the program run correctly. One is provided as part of the mypy project.

I should add that many of mypy's syntactic choices aren't actually new. The basis of many of its ideas go back at least a decade: I blogged about this topic in 2004 (http://www.artima.com/weblogs/viewpost.jsp?thread=85551 -- see also the two followup posts linked from the top there).

I'll emphasize once more that mypy's type checking happens in a separate pass: no type checking happens at run time (other than what the interpreter already does, like raising TypeError on expressions like 1+"1").

There's a lot to this proposal, but I think it's possible to get a PEP written, accepted and implemented in time for Python 3.5, if people are supportive. I'll go briefly over some of the action items.

(1) A change of direction for function annotations

PEP 3107, which introduced function annotations, is intentional non-committal about how function annotations should be used. It lists a number of use cases, including but not limited to type checking. It also mentions some rejected proposals that would have standardized either a syntax for indicating types and/or a way for multiple frameworks to attach different annotations to the same function. AFAIK in practice there is little use of function annotations in mainstream code, and I propose a conscious change of course here by stating that annotations should be used to indicate types and to propose a standard notation for them.

(We may have to have some backwards compatibility provision to avoid breaking code that currently uses annotations for some other purpose. Fortunately the only issue, at least initially, will be that when running mypy to type check such code it will produce complaints about the annotations; it will not affect how such code is executed by the Python interpreter. Nevertheless, it would be good to deprecate such alternative uses of annotations.)

(2) A specification for what to add to Python 3.5

There needs to be at least a rough consensus on the syntax for annotations, and the syntax must cover a large enough set of use cases to be useful. Mypy is still under development, and some of its features are still evolving (e.g. unions were only added a few weeks ago). It would be possible to argue endlessly about details of the notation, e.g. whether to use 'list' or 'List', what either of those means (is a duck-typed list-like type acceptable?) or how to declare and use type variables, and what to do with functions that have no annotations at all (mypy currently skips those completely).

I am proposing that we adopt whatever mypy uses here, keeping discussion of the details (mostly) out of the PEP. The goal is to make it possible to add type checking annotations to 3rd party modules (and even to the stdlib) while allowing unaltered execution of the program by the (unmodified) Python 3.5 interpreter. The actual type checker will not be integrated with the Python interpreter, and it will not be checked into the CPython repository. The only thing that needs to be added to the stdlib is a copy of mypy's typing.py module. This module defines several dozen new classes (and a few decorators and other helpers) that can be used in expressing argument types. If you want to type-check your code you have to download and install mypy and run it separately.

The curious thing here is that while standardizing a syntax for type annotations, we technically still won't be adopting standard rules for type checking. This is intentional. First of all, fully specifying all the type checking rules would make for a really long and boring PEP (a much better specification would probably be the mypy source code). Second, I think it's fine if the type checking algorithm evolves over time, or if variations emerge. The worst that can happen is that you consider your code correct but mypy disagrees; your code will still run.

That said, I don't want to completely leave out any specification. I want the contents of the typing.py module to be specified in the PEP, so that it can be used with confidence. But whether mypy will complain about your particular form of duck typing doesn't have to be specified by the PEP. Perhaps as mypy evolves it will take options to tell it how to handle certain edge cases. Forks of mypy (or entirely different implementations of type checking based on the same annotation syntax) are also a possibility. Maybe in the distant future a version of Python will take a different stance, once we have more experience with how this works out in practice, but for Python 3.5 I want to restrict the scope of the upheaval.

Appendix -- Why Add Type Annotations?

The argument between proponents of static typing and dynamic typing has been going on for many decades. Neither side is all wrong or all right. Python has traditionally fallen in the camp of extremely dynamic typing, and this has worked well for most users, but there are definitely some areas where adding type annotations would help.

- Editors (IDEs) can benefit from type annotations; they can call out obvious mistakes (like misspelled method names or inapplicable operations) and suggest possible method names. Anyone who has used IntelliJ or Xcode will recognize how powerful these features are, and type annotations will make such features more useful when editing Python source code.

- Linters are an important tool for teams developing software. A linter doesn't replace a unittest, but can find certain types of errors better or quicker. The kind of type checking offered by mypy works much like a linter, and has similar benefits; but it can find problems that are beyond the capabilities of most linters.

- Type annotations are useful for the human reader as well! Take the above word_count() example. How long would it have taken you to figure out the types of the argument and return value without annotations? Currently most people put the types in their docstrings; developing a standard notation for type annotations will reduce the amount of documentation that needs to be written, and running the type checker might find bugs in the documentation, too. Once a standard type annotation syntax is introduced, it should be simple to add support for this notation to documentation generators like Sphinx.

- Refactoring. Bob's talk has a convincing example of how type annotations help in (manually) refactoring code. I also expect that certain automatic refactorings will benefit from type annotations -- imagine a tool like 2to3 (but used for some other transformation) augmented by type annotations, so it will know whether e.g. x.keys() is referring to the keys of a dictionary or not.

- Optimizers. I believe this is actually the least important application, certainly initially. Optimizers like PyPy or Pyston wouldn't be able to fully trust the type annotations, and they are better off using their current strategy of optimizing code based on the types actually observed at run time. But it's certainly feasible to imagine a future optimizer also taking type annotations into account.

--
--Guido "I need a new hobby" van Rossum (python.org/~guido)

Ethan Furman

unread,
Aug 13, 2014, 4:00:33 PM8/13/14
to python...@python.org
On 08/13/2014 12:44 PM, Guido van Rossum wrote:
>
> [There is no TL;DR other than the subject line. Please read the whole thing before replying. I do have an appendix with
> some motivations for adding type annotations at the end.]

+0 on the proposal as a whole. It is not something I'm likely to use, but I'm not opposed to it, so long as it stays
optional.


> Nevertheless, it would be good to deprecate such alternative uses of annotations.

-1 on deprecating alternative uses of annotations.

--
~Ethan~
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

unread,
Aug 13, 2014, 4:20:35 PM8/13/14
to Ethan Furman, Python-Ideas
On Wed, Aug 13, 2014 at 12:59 PM, Ethan Furman <et...@stoneleaf.us> wrote:
On 08/13/2014 12:44 PM, Guido van Rossum wrote:

[There is no TL;DR other than the subject line. Please read the whole thing before replying. I do have an appendix with
some motivations for adding type annotations at the end.]

+0 on the proposal as a whole.  It is not something I'm likely to use, but I'm not opposed to it, so long as it stays optional.



Nevertheless, it would be good to deprecate such alternative uses of annotations.

-1 on deprecating alternative uses of annotations.

Do you have a favorite alternative annotation use that you actually use (or are likely to)?

--
--Guido van Rossum (python.org/~guido)

Alex Gaynor

unread,
Aug 13, 2014, 4:30:31 PM8/13/14
to python...@python.org
I'm strongly opposed this, for a few reasons.

First, I think that standardizing on a syntax, without a semantics is
incredibly confusing, and I can't imagine how having *multiple* competing
implementations would be a boon for anyone.

This proposal seems to be built around the idea that we should have a syntax,
and then people can write third party tools, but Python itself won't really do
anything with them.

Fundamentally, this seems like a very confusing approach. How we write a type,
and what we do with that information are fundamentally connected. Can I cast a
``List[str]`` to a ``List[object]`` in any way? If yes, what happens when I go
to put an ``int`` in it? There's no runtime checking, so the type system is
unsound, on the other hand, disallowing this prevents many types of successes.

Both solutions have merit, but the idea of some implementations of the type
checker having covariance and some contravariance is fairly disturbing.

Another concern I have is that analysis based on these types is making some
pretty strong assumptions about static-ness of Python programs that aren't
valid. While existing checkers like ``flake8`` also do this, their assumptions
are basically constrained to the symbol table, while this is far deeper. For
example, can I annotate somethign as ``six.text_type``? What about
``django.db.models.sql.Query`` (keep in mind that this class is redefined based
on what database you're using (not actually true, but it used to be))?

Python's type system isn't very good. It lacks many features of more powerful
systems such as algebraic data types, interfaces, and parametric polymorphism.
Despite this, it works pretty well because of Python's dynamic typing. I
strongly believe that attempting to enforce the existing type system would be a
real shame.

Alex

PS: You're right. None of this would provide *any* value for PyPy.

Christian Heimes

unread,
Aug 13, 2014, 4:31:14 PM8/13/14
to python...@python.org
On 13.08.2014 21:44, Guido van Rossum wrote:
> Yesterday afternoon I had an inspiring conversation with Bob Ippolito
> (man of many trades, author of simplejson) and Jukka Lehtosalo (author
> of mypy: http://mypy-lang.org/). Bob gave a talk at EuroPython about
> what Python can learn from Haskell (and other languages); yesterday he
> gave the same talk at Dropbox. The talk is online
> (https://ep2014.europython.eu/en/schedule/sessions/121/) and in broad
> strokes comes down to three suggestions:
>
> (a) Python should adopt mypy's syntax for function annotations
> (b) Python's use of mutabe containers by default is wrong
> (c) Python should adopt some kind of Abstract Data Types

I was at Bob's talk during EP14 and really liked the idea. A couple of
colleagues and other attendees also said it's a good and useful
proposal. I also like your proposal to standardize the type annotations
first without a full integration of mypy.

In general I'm +1 but I like to discuss two aspects:

1) I'm not keen with the naming of mypy's typing classes. The visual
distinction between e.g. dict() and Dict() is too small and IMHO
confusing for newcomers. How about an additional 'T' prefix to make
clear that the objects are referring to typing objects?

from typing import TList, TDict

def word_count(input: TList[str]) -> TDict[str, int]:
...

2) PEP 3107 only specifies arguments and return values but not
exceptions that can be raised by a function. Java has the "throws"
syntax to list possible exceptions:

public void readFile() throws IOException {}

May I suggest that we also standardize a way to annotate the exceptions
that can be raised by a function? It's a very useful piece of
information and commonly requested information on the Python user
mailing list. It doesn't have to be a new syntax element, a decorator in
the typing module would suffice, too. For example:

from typing import TList, TDict, raises

@raises(RuntimeError, (ValueError, "is raised when input is empty"))
def word_count(input: TList[str]) -> TDict[str, int]:
...

Regards,
Christian

Ethan Furman

unread,
Aug 13, 2014, 4:50:43 PM8/13/14
to Python-Ideas
On 08/13/2014 01:19 PM, Guido van Rossum wrote:
> On Wed, Aug 13, 2014 at 12:59 PM, Ethan Furman wrote:
>>
>> -1 on deprecating alternative uses of annotations.
>
> Do you have a favorite alternative annotation use that you actually use (or are likely to)?

My script argument parser [1] uses annotations to figure out how to parse the cli parameters and cast them to
appropriate values (copied the idea from one of Michele Simionato's projects... plac [2], I believe).

I could store the info in some other structure besides 'annotations', but it's there and it fits the bill conceptually.
Amusingly, it's a form of type info, but instead of saying what it has to already be, says what it will become.

--
~Ethan~


[1] https://pypi.python.org/pypi/scription (due for an overhaul now I've used it for awhile ;)
[2] https://pypi.python.org/pypi/plac/0.9.1

Donald Stufft

unread,
Aug 13, 2014, 4:54:17 PM8/13/14
to Alex Gaynor, python...@python.org
I agree with Alex that I think leaving the actual semantics of what these things
mean up to a third party, which can possibly be swapped out by individual end
users, is terribly confusing. I don’t think I agree though that this is a bad
idea in general, I think that we should just add it for real and skip the
indirection.

IOW I'm not sure I see the benefit of defining the syntax but not the semantics
when it seems this is already completely possible given the fact that mypy
exists.

The only real benefits I can see from doing it are that the stdlib can use it,
and the ``import typing`` aspect. I don't believe that the stdlib benefits are
great enough to get the possible confusion of multiple different implementations
and I think that the typing import could easily be provided as a project on PyPI
that people can depend on if they want to use this in their code.

So my vote would be to add mypy semantics to the language itself.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Andrey Vlasovskikh

unread,
Aug 13, 2014, 5:09:19 PM8/13/14
to gu...@python.org, Python-Ideas
2014-08-14, 0:19, Guido van Rossum <gu...@python.org> wrote:

> Yesterday afternoon I had an inspiring conversation with Bob Ippolito (man of many trades, author of simplejson) and Jukka Lehtosalo (author of mypy: http://mypy-lang.org/). Bob gave a talk at EuroPython about what Python can learn from Haskell (and other languages); yesterday he gave the same talk at Dropbox. The talk is online (https://ep2014.europython.eu/en/schedule/sessions/121/) and in broad strokes comes down to three suggestions:
>
> (a) Python should adopt mypy's syntax for function annotations


+1. I'm a developer of the code analysis engine of PyCharm. I have discussed this idea with Jukka Lehtosalo and recently with Dave Halter, the author of Jedi code completion library. Standardized type annotations would be very useful for code analysis tools and IDEs such as PyCharm, Jedi and pylint. Type annotations would be especially great for third-party libraries. The idea is that most Python programmers don't have to write annotations in order to benefit from them. Annotated libraries are often enough for good code analysis.

We (PyCharm) and Jukka have made some initial steps in this direction, including thoughts on semantics of annotations (https://github.com/pytypes/pytypes). Feedback is welcome.

Here are slides from my talk about optional typing in Python, that show how Mypy types can be used in both static and dynamic type checking (http://blog.pirx.ru/media/files/2013/python-optional-typing/), Mypy-related part starts from slide 14.

We are interested in getting type annotations standardized and we would like to help developing and testing type annotations proposals.

--
Andrey Vlasovskikh
Web: http://pirx.ru/

Antoine Pitrou

unread,
Aug 13, 2014, 5:13:51 PM8/13/14
to python...@python.org

Hello,

First, as a disclaimer, I am currently working on Numba for Continuum
Analytics. Numba has its own type inference system which it applies to
functions decorated with the @jit decorator. Due to Numba's objectives,
the type inference system is heavily geared towards numerical computing,
but it is conceptually (and a bit concretely) able to represent more
generic information, such as "an enumerate() over an iterator of a
complex128 numpy array".

There are two sides to type inference:

1) first the (optional) annotations
(I'm saying "optional" because in the most basic usage, a JIT compiler
is normally able to defer compilation until the first function or method
call, and to deduce input types from that)

2) second the inference engine properly, which walks the code (in
whatever form the tool's developer has chosen: bytecode, AST, IR) and
deduces types for any intermediate values

Now only #1 is implied by this PEP proposal, but it also sounds like we
should take into account the desired properties of #2 (for example,
being able to express "an iterator of three-tuples" can be important for
a JIT compiler - or not, perhaps, depending on the JIT compiler :-)).
What #2 wants to do will differ depending on the use case: e.g. a code
checker may need less type granularity than a JIT compiler.


Therefore, regardless of mypy's typesystem's completeness and
granularity, one requirement is for it to be easily extensible. By
extensible I mean not only being able to define new type descriptions,
but being able to do so for existing third-party libraries you don't
want to modify.

I'm saying that because I'm looking at
http://mypy-lang.org/tutorial.html#genericclasses , and it's not clear
from this example whether the typing code has to be interwoven with the
collection's implementation, or can be written as a separate code module
entirely (*). Ideally both should probably be possible (in the same vein
as being able to subclass an ABC, or register an existing class with
it). This also includes being to type-declare functions and types from C
extension modules.

In Numba, this would be typically required to write typing descriptions
for Numpy arrays and functions; but also to derive descriptions for
fixed-width integers, single-precision floats, etc. (this also means
some form of subclassing for type descriptions themselves).

(*) (actually, I'm a bit worried when I see that "List[int]()"
instantiates an actual list; calling a type description class should
give you a parametered type description, not an object; the [] notation
is in general not powerful enough if you want several type parameters,
possibly keyword-only)


At some point, it will be even better if the typing system is powerful
enough to remember properties of the *values* (for example not only "a
string", but "a one-character string, or even "one of the 'Y', 'M', 'D'
strings"). Think about type-checking / type-infering calls to the struct
module.


I may come back with more comments once I've read the mypy docs and/or
code in detail.

Regards

Antoine.

Guido van Rossum

unread,
Aug 13, 2014, 5:47:46 PM8/13/14
to Alex Gaynor, Python-Ideas
On Wed, Aug 13, 2014 at 1:29 PM, Alex Gaynor <alex....@gmail.com> wrote:
I'm strongly opposed this, for a few reasons.

First, I think that standardizing on a syntax, without a semantics is
incredibly confusing, and I can't imagine how having *multiple* competing
implementations would be a boon for anyone.

That part was probably overly vague in my original message. I actually do want to standardize on semantics, but I think the semantics will prove controversial (they already have :-) and I think it's better to standardize the syntax and *some* semantics first rather than having to wait another decade for the debate over the semantics to settle. I mostly want to leave the door open for mypy to become smarter. But it might make sense to have a "weaker" interpretation in some cases too (e.g. an IDE might use a weaker type system in order to avoid overwhelming users with warnings).
 
This proposal seems to be built around the idea that we should have a syntax,
and then people can write third party tools, but Python itself won't really do
anything with them.

Right.
 
Fundamentally, this seems like a very confusing approach. How we write a type,
and what we do with that information are fundamentally connected. Can I cast a
``List[str]`` to a ``List[object]`` in any way? If yes, what happens when I go
to put an ``int`` in it? There's no runtime checking, so the type system is
unsound, on the other hand, disallowing this prevents many types of successes.

Mypy has a cast() operator that you can use to shut it up when you (think you) know the conversion is safe.
 
Both solutions have merit, but the idea of some implementations of the type
checker having covariance and some contravariance is fairly disturbing.

Yeah, that wouldn't be good. ;-)
 
Another concern I have is that analysis based on these types is making some
pretty strong assumptions about static-ness of Python programs that aren't
valid. While existing checkers like ``flake8`` also do this, their assumptions
are basically constrained to the symbol table, while this is far deeper. For
example, can I annotate something as ``six.text_type``? What about

``django.db.models.sql.Query`` (keep in mind that this class is redefined based
on what database you're using (not actually true, but it used to be))?

Time will have to tell. Stubs can help. I encourage you to try annotating a medium-sized module. It's likely that you'll find a few things: maybe a bug in mypy, maybe a missing mypy feature, maybe a bug in your code, maybe a shady coding practice in your code or a poorly documented function (I know I found several of each during my own experiments so far).
 
Python's type system isn't very good. It lacks many features of more powerful
systems such as algebraic data types, interfaces, and parametric polymorphism.
Despite this, it works pretty well because of Python's dynamic typing. I
strongly believe that attempting to enforce the existing type system would be a
real shame.

Mypy shines in those areas of Python programs that are mostly statically typed. There are many such areas in most large systems. There are usually also some areas where mypy's type system is inadequate. It's easy to shut it up for those cases (in fact, mypy is silent unless you use at least one annotation for a function). But that's the case with most type systems. Even Haskell sometimes calls out to C.

Guido van Rossum

unread,
Aug 13, 2014, 6:01:47 PM8/13/14
to Ethan Furman, Python-Ideas
On Wed, Aug 13, 2014 at 1:50 PM, Ethan Furman <et...@stoneleaf.us> wrote:
On 08/13/2014 01:19 PM, Guido van Rossum wrote:

On Wed, Aug 13, 2014 at 12:59 PM, Ethan Furman wrote:

-1 on deprecating alternative uses of annotations.

Do you have a favorite alternative annotation use that you actually use (or are likely to)?

My script argument parser [1] uses annotations to figure out how to parse the cli parameters and cast them to appropriate values (copied the idea from one of Michele Simionato's projects... plac [2], I believe).

I could store the info in some other structure besides 'annotations', but it's there and it fits the bill conceptually.  Amusingly, it's a form of type info, but instead of saying what it has to already be, says what it will become.

I couldn't find any docs for scription (the tarball contains just the source code, not even an example), although I did find some for plac. I expect using type annotations to the source of scription.py might actually make it easier to grok what it does. :-)

But really, I'm sure that in Python 3.5, scription and mypy can coexist. If the mypy idea takes off you might eventually be convinced to use a different convention. But you'd get plenty of warning.
 
[1] https://pypi.python.org/pypi/scription  (due for an overhaul now I've used it for awhile ;)
[2] https://pypi.python.org/pypi/plac/0.9.1

Guido van Rossum

unread,
Aug 13, 2014, 6:07:21 PM8/13/14
to Donald Stufft, Python-Ideas, Alex Gaynor
On Wed, Aug 13, 2014 at 1:53 PM, Donald Stufft <don...@stufft.io> wrote:
I agree with Alex that I think leaving the actual semantics of what these things
mean up to a third party, which can possibly be swapped out by individual end
users, is terribly confusing. I don’t think I agree though that this is a bad
idea in general, I think that we should just add it for real and skip the
indirection.

Yeah, I probably overstated the option of alternative interpretations. I just don't want to have to write a PEP that specifies every little detail of mypy's type checking algorithm, and I don't think anyone would want to have to read such a PEP either. But maybe we can compromise on something that sketches broad strokes and leaves the details up to the team that maintains mypy (after all that tactic has worked pretty well for Python itself :-).
 
IOW I'm not sure I see the benefit of defining the syntax but not the semantics
when it seems this is already completely possible given the fact that mypy
exists.

The only real benefits I can see from doing it are that the stdlib can use it,
and the ``import typing`` aspect. I don't believe that the stdlib benefits are
great enough to get the possible confusion of multiple different implementations
and I think that the typing import could easily be provided as a project on PyPI
that people can depend on if they want to use this in their code.

So my vote would be to add mypy semantics to the language itself.

What exactly would that mean? I don't think the Python interpreter should reject programs that fail the type check -- in fact, separating the type check from run time is the most crucial point of my proposal.

I'm fine to have a discussion on things like covariance vs. contravariance, or what form of duck typing are acceptable, etc.
 

Guido van Rossum

unread,
Aug 13, 2014, 6:08:48 PM8/13/14
to Andrey Vlasovskikh, Python-Ideas
Wow. Awesome. I will make time to study what you have already done!


On Wed, Aug 13, 2014 at 2:08 PM, Andrey Vlasovskikh <andrey.vl...@gmail.com> wrote:
2014-08-14, 0:19, Guido van Rossum <gu...@python.org> wrote:

> Yesterday afternoon I had an inspiring conversation with Bob Ippolito (man of many trades, author of simplejson) and Jukka Lehtosalo (author of mypy: http://mypy-lang.org/). Bob gave a talk at EuroPython about what Python can learn from Haskell (and other languages); yesterday he gave the same talk at Dropbox. The talk is online (https://ep2014.europython.eu/en/schedule/sessions/121/) and in broad strokes comes down to three suggestions:
>
>  (a) Python should adopt mypy's syntax for function annotations


+1. I'm a developer of the code analysis engine of PyCharm. I have discussed this idea with Jukka Lehtosalo and recently with Dave Halter, the author of Jedi code completion library. Standardized type annotations would be very useful for code analysis tools and IDEs such as PyCharm, Jedi and pylint. Type annotations would be especially great for third-party libraries. The idea is that most Python programmers don't have to write annotations in order to benefit from them. Annotated libraries are often enough for good code analysis.

We (PyCharm) and Jukka have made some initial steps in this direction, including thoughts on semantics of annotations (https://github.com/pytypes/pytypes). Feedback is welcome.

Here are slides from my talk about optional typing in Python, that show how Mypy types can be used in both static and dynamic type checking (http://blog.pirx.ru/media/files/2013/python-optional-typing/), Mypy-related part starts from slide 14.

We are interested in getting type annotations standardized and we would like to help developing and testing type annotations proposals.

--
Andrey Vlasovskikh
Web: http://pirx.ru/




Juancarlo Añez

unread,
Aug 13, 2014, 6:22:46 PM8/13/14
to Guido van Rossum, Jukka Lehtosalo, Python-Ideas

On Wed, Aug 13, 2014 at 3:14 PM, Guido van Rossum <gu...@python.org> wrote:
I am proposing that we adopt whatever mypy uses here, keeping discussion of the details (mostly) out of the PEP. The goal is to make it possible to add type checking annotations to 3rd party modules (and even to the stdlib) while allowing unaltered execution of the program by the (unmodified) Python 3.5 interpreter.

I'll comment later on the core subject.

For now, I think this deserves some thought:

Function annotations are not available in Python 2.7, so promoting widespread use of annotations in 3.5 would be promoting code that is compatible only with 3.x, when the current situation is that much effort is being spent on writing code that works on both 2.7 and 3.4 (most libraries?).

Independently of its core merits, this proposal should fail unless annotations are added to Python 2.8.

Cheers,

--
Juancarlo Añez

Todd

unread,
Aug 13, 2014, 6:29:00 PM8/13/14
to python-ideas


On Aug 13, 2014 9:45 PM, "Guido van Rossum" <gu...@python.org> wrote:
> (1) A change of direction for function annotations
>
> PEP 3107, which introduced function annotations, is intentional non-committal about how function annotations should be used. It lists a number of use cases, including but not limited to type checking. It also mentions some rejected proposals that would have standardized either a syntax for indicating types and/or a way for multiple frameworks to attach different annotations to the same function. AFAIK in practice there is little use of function annotations in mainstream code, and I propose a conscious change of course here by stating that annotations should be used to indicate types and to propose a standard notation for them.
>
> (We may have to have some backwards compatibility provision to avoid breaking code that currently uses annotations for some other purpose. Fortunately the only issue, at least initially, will be that when running mypy to type check such code it will produce complaints about the annotations; it will not affect how such code is executed by the Python interpreter. Nevertheless, it would be good to deprecate such alternative uses of annotations.)

I watched the original talk and read your proposal.  I think type annotations could very very useful in certain contexts. 

However, I still don't get this bit. Why would allowing type annotations automatically imply that no other annotations would be possible?  Couldn't we formalize what would be considered a type annotation while still allowing annotations that don't fit this criteria to be used for other things?

Manuel Cerón

unread,
Aug 13, 2014, 6:29:01 PM8/13/14
to python...@python.org
On Wed, Aug 13, 2014 at 9:44 PM, Guido van Rossum <gu...@python.org> wrote:
[There is no TL;DR other than the subject line. Please read the whole thing before replying. I do have an appendix with some motivations for adding type annotations at the end.]

This is a very interesting idea. I played a bit with function annotations (https://github.com/ceronman/typeannotations) and I gave a talk about them at EuroPython 2013. Certainly static type analysis is probably the best use case. 

The curious thing here is that while standardizing a syntax for type annotations, we technically still won't be adopting standard rules for type checking. This is intentional. First of all, fully specifying all the type checking rules would make for a really long and boring PEP (a much better specification would probably be the mypy source code). Second, I think it's fine if the type checking algorithm evolves over time, or if variations emerge. The worst that can happen is that you consider your code correct but mypy disagrees; your code will still run.

That said, I don't want to completely leave out any specification. I want the contents of the typing.py module to be specified in the PEP, so that it can be used with confidence. But whether mypy will complain about your particular form of duck typing doesn't have to be specified by the PEP. Perhaps as mypy evolves it will take options to tell it how to handle certain edge cases. Forks of mypy (or entirely different implementations of type checking based on the same annotation syntax) are also a possibility. Maybe in the distant future a version of Python will take a different stance, once we have more experience with how this works out in practice, but for Python 3.5 I want to restrict the scope of the upheaval.

The type checking algorithm might evolve over the time, but by including typing.py in the stdlib, the syntax for annotations would be almost frozen and that will be a limitation. In other projects such as TypeScript (http://www.typescriptlang.org/), that the syntax usually evolves alongside the algorithms. 

Is the syntax specifyed in typing.py mature enough to put it in the stdlib and expect users to start annotating their projects without worrying too much about future changes?

Is there enough feedback from users using mypy in their projects?

I think that rushing typing.py into 3.5 is not a good idea. However, It'd be nice to add some notes in PEP8, encourage it's use as an external library, let some projects and tools (e.g. PyCharm) use it. It's not that bad if mypy lives 100% outside the Python distribution for a while. Just like TypeScript to JavaScript. After getting some user base, part of it (typing.py) could be moved to the stdlib.

Manuel.

Ryan Gonzalez

unread,
Aug 13, 2014, 6:35:11 PM8/13/14
to Christian Heimes, python-ideas
On Wed, Aug 13, 2014 at 3:29 PM, Christian Heimes <chri...@python.org> wrote:
On 13.08.2014 21:44, Guido van Rossum wrote:
> Yesterday afternoon I had an inspiring conversation with Bob Ippolito
> (man of many trades, author of simplejson) and Jukka Lehtosalo (author
> of mypy: http://mypy-lang.org/). Bob gave a talk at EuroPython about
> what Python can learn from Haskell (and other languages); yesterday he
> gave the same talk at Dropbox. The talk is online
> (https://ep2014.europython.eu/en/schedule/sessions/121/) and in broad
> strokes comes down to three suggestions:
>
>   (a) Python should adopt mypy's syntax for function annotations
>   (b) Python's use of mutabe containers by default is wrong
>   (c) Python should adopt some kind of Abstract Data Types

I was at Bob's talk during EP14 and really liked the idea. A couple of
colleagues and other attendees also said it's a good and useful
proposal. I also like your proposal to standardize the type annotations
first without a full integration of mypy.

In general I'm +1 but I like to discuss two aspects:

1) I'm not keen with the naming of mypy's typing classes. The visual
distinction between e.g. dict() and Dict() is too small and IMHO
confusing for newcomers. How about an additional 'T' prefix to make
clear that the objects are referring to typing objects?

  from typing import TList, TDict

  def word_count(input: TList[str]) -> TDict[str, int]:
      ...

Eeewwwww. That's way too Pascal-ish.


2) PEP 3107 only specifies arguments and return values but not
exceptions that can be raised by a function. Java has the "throws"
syntax to list possible exceptions:

 public void readFile() throws IOException {}

May I suggest that we also standardize a way to annotate the exceptions
that can be raised by a function? It's a very useful piece of
information and commonly requested information on the Python user
mailing list. It doesn't have to be a new syntax element, a decorator in
the typing module would suffice, too. For example:

  from typing import TList, TDict, raises

  @raises(RuntimeError, (ValueError, "is raised when input is empty"))
  def word_count(input: TList[str]) -> TDict[str, int]:
      ...

That was a disaster in C++. It's confusing, especially since Python uses exceptions more than most other languages do.
 

Regards,
Christian

_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/



--
Ryan
If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated."

Donald Stufft

unread,
Aug 13, 2014, 6:44:59 PM8/13/14
to gu...@python.org, Python-Ideas, Alex Gaynor
I don’t know exactly :)

Some ideas:

1) Raise a warning when the type check fails, but allow it happen. This would
   have the benefit of possibly catching bugs, but it's still opt in in the
   sense that you have to write the annotations for anything to happen. This
   would also enable people to turn on enforced type checking by raising the
   warning level to an exception.

   Even if this was off by default it would make it easy to enable it during
   test runs and also enable easier/better quickcheck like functionality.

2) Simply add a flag to the interpreter that turns on type checking.

3) Add a stdlib module that would run the program under type checking, like
   ``python -m typing myprog`` instead of ``python -m myprog``.

Really I think a lot of the benefit is likely to come in the form of linting
and during test runs. However if I have to run a seperate Python interpreter
to actually do the run then I risk getting bad results through varying things
like interpreter differences, language level differences, etc.

Although I wouldn't complain if it meant that Python had actual type checking
at the run time if a function had type annotations :)


I'm fine to have a discussion on things like covariance vs. contravariance, or what form of duck typing are acceptable, etc.

I’m not particularly knowledgable about the actual workings of a type system and
covariance vs contravariance and the like. My main concern there is having a
single reality. The meaning of something shouldn't change because I used a
different interpreter/linter/whatever. Beyond that I don't know enough to have
an opinion on the actual semantics.

Guido van Rossum

unread,
Aug 13, 2014, 7:13:17 PM8/13/14
to Juancarlo Añez, Jukka Lehtosalo, Python-Ideas
Actually, mypy already has a solution. There's a codec (https://github.com/JukkaL/mypy/tree/master/mypy/codec) that you can use which transforms Python-2-with-annotations into vanilla Python 2. It's not an ideal solution, but it can work in cases where you absolutely have to have state of the art Python 3.5 type checking *and* backwards compatibility with Python 2.

Guido van Rossum

unread,
Aug 13, 2014, 7:26:15 PM8/13/14
to Manuel Cerón, Python-Ideas
On Wed, Aug 13, 2014 at 3:26 PM, Manuel Cerón <cero...@gmail.com> wrote:
The type checking algorithm might evolve over the time, but by including typing.py in the stdlib, the syntax for annotations would be almost frozen and that will be a limitation. In other projects such as TypeScript (http://www.typescriptlang.org/), that the syntax usually evolves alongside the algorithms.

What kind of evolution did TypeScript experience?
 
Is the syntax specifyed in typing.py mature enough to put it in the stdlib and expect users to start annotating their projects without worrying too much about future changes?

This is a good question. I do think it is good enough as a starting point for future evolution. Perhaps the biggest question is how fast will the annotation syntax need to evolve? If it needs to evolve significantly faster than Python 3 feature releases come out (every 18 months, approximately) then it may be better to hold off and aim for inclusion in the 3.6 standard library. That would allow more time to reach agreement (though I'm not sure that's a good thing :-), and in the mean time typing.py could be distributed as a 3rd party module on PyPI.
 
Is there enough feedback from users using mypy in their projects?

I think that rushing typing.py into 3.5 is not a good idea. However, It'd be nice to add some notes in PEP8, encourage it's use as an external library, let some projects and tools (e.g. PyCharm) use it. It's not that bad if mypy lives 100% outside the Python distribution for a while. Just like TypeScript to JavaScript.

Well, JavaScript's evolution is tied up forever in a standards body, so TypeScript realistically had no choice in the matter. But are there actually people writing TypeScript? I haven't heard from them yet (people at Dropbox seem to rather like CoffeeScript). Anyway, the situation isn't quite the same -- you wouldn't make any friends in the Python world if you wrote your code in an incompatible dialect that could only be executed after a translation step, but in the JavaScript world that's how all alternative languages work (and they even manage to interoperate).
 
After getting some user base, part of it (typing.py) could be moved to the stdlib.

I'm still hopeful that we can get a sufficient user base and agreement on mypy's features for inclusion in 3.5 (extrapolating the 3.4 release schedule by 18 months, 3.5 alpha 1 would go out around February 2015; the feature freeze cut-off date, beta 1, would around May thereafter).

Ben Finney

unread,
Aug 13, 2014, 7:28:16 PM8/13/14
to python...@python.org
Christian Heimes <chri...@python.org>
writes:

> 1) I'm not keen with the naming of mypy's typing classes. The visual
> distinction between e.g. dict() and Dict() is too small and IMHO
> confusing for newcomers. How about an additional 'T' prefix to make
> clear that the objects are referring to typing objects?

To this reader, ‘dict’ and ‘list’ *are* “typing objects” — they are
objects that are types. Seeing code that referred to something else as
“typing objects” would be an infitation to confusion, IMO.

You could argue “that's because you don't know the special meaning of
“typing object” being discussed here”. To which my response would be,
for a proposal to add something else as meaningful Python syntax, the
jargon is poorly chosen and needlessly confusing with established terms
in Python.

If there's going to be a distinction between the types (‘dict’, ‘list’,
etc.) and something else, I'd prefer it to be based on a clearer
terminology distinction.

--
\ “Simplicity and elegance are unpopular because they require |
`\ hard work and discipline to achieve and education to be |
_o__) appreciated.” —Edsger W. Dijkstra |
Ben Finney

Guido van Rossum

unread,
Aug 13, 2014, 7:31:43 PM8/13/14
to Todd, python-ideas
On Wed, Aug 13, 2014 at 3:28 PM, Todd <todd...@gmail.com> wrote:
However, I still don't get this bit. Why would allowing type annotations automatically imply that no other annotations would be possible?  Couldn't we formalize what would be considered a type annotation while still allowing annotations that don't fit this criteria to be used for other things?

We certainly *could* do that. However, I haven't seen sufficient other uses of annotations. If there is only one use for annotations (going forward), annotations would be unambiguous. If we allow different types of annotations, there would have to be a way to tell whether a particular annotation is intended as a type annotation or not. Currently mypy ignores all modules that don't import typing.py (using any form of import statement), and we could continue this convention. But it would mean that something like this would still require the typing import in order to be checked by mypy:

import typing

def gcd(int a, int b) -> int:
    <tralala>

The (necessary) import would be flagged as unused by every linter in the world... :-(

Guido van Rossum

unread,
Aug 13, 2014, 7:45:23 PM8/13/14
to Donald Stufft, Python-Ideas, Alex Gaynor
On Wed, Aug 13, 2014 at 3:44 PM, Donald Stufft <don...@stufft.io> wrote:
On Aug 13, 2014, at 6:05 PM, Guido van Rossum <gu...@python.org> wrote:

On Wed, Aug 13, 2014 at 1:53 PM, Donald Stufft <don...@stufft.io> wrote:

So my vote would be to add mypy semantics to the language itself.

What exactly would that mean? I don't think the Python interpreter should reject programs that fail the type check -- in fact, separating the type check from run time is the most crucial point of my proposal.

I don’t know exactly :)

Some ideas:

1) Raise a warning when the type check fails, but allow it happen. This would
   have the benefit of possibly catching bugs, but it's still opt in in the
   sense that you have to write the annotations for anything to happen. This
   would also enable people to turn on enforced type checking by raising the
   warning level to an exception.

I don't think that's going to happen. It would require the entire mypy implementation to be checked into the stdlib. It would also require all sorts of hacks in that implementation to deal with dynamic (or just delayed) imports. Mypy currently doesn't handle any of that -- it must be able to find all imported modules before it starts executing even one line of code.
 
   Even if this was off by default it would make it easy to enable it during
   test runs and also enable easier/better quickcheck like functionality.

It would *have* to be off by default -- it's way too slow to be on by default (note that some people are already fretting out today about a 25 msec process start-up time).
 
2) Simply add a flag to the interpreter that turns on type checking.

3) Add a stdlib module that would run the program under type checking, like
   ``python -m typing myprog`` instead of ``python -m myprog``.

Really I think a lot of the benefit is likely to come in the form of linting
and during test runs. However if I have to run a separate Python interpreter
to actually do the run then I risk getting bad results through varying things
like interpreter differences, language level differences, etc.

Yeah, but I just don't think it's realistic to do anything about that for 3.5 (or 3.6 for that matter). In a decade... Who knows! :-)
 
Although I wouldn't complain if it meant that Python had actual type checking
at the run time if a function had type annotations :)

It's probably possibly to write a decorator that translates annotations into assertions that are invoked when a function is called. But in most cases it would be way too slow to turn on everywhere.
I'm fine to have a discussion on things like covariance vs. contravariance, or what form of duck typing are acceptable, etc.
I’m not particularly knowledgable about the actual workings of a type system and
covariance vs contravariance and the like. My main concern there is having a
single reality. The meaning of something shouldn't change because I used a
different interpreter/linter/whatever. Beyond that I don't know enough to have
an opinion on the actual semantics.

Yeah, I regret writing it so vaguely already. Having Alex Gaynor open with "I'm strongly opposed [to] this" is a great joy killer. :-)

I just really don't want to have to redundantly write up a specification for all the details of mypy's type checking rules in PEP-worthy English. But I'm fine with discussing whether List[str] is a subclass or a superclass of List[object] and how to tell the difference.

Still, different linters exist and I don't hear people complain about that. I would also be okay if PyCharm's interpretation of the finer points of the type checking syntax was subtly different from mypy's. In fact I would be surprised if they weren't sometimes in disagreement. Heck, PyPy doesn't give *every* Python program the same meaning as CPython, and that's a feature. :-)

Donald Stufft

unread,
Aug 13, 2014, 7:59:40 PM8/13/14
to gu...@python.org, Python-Ideas, Alex Gaynor
Understood! And really the most important thing I'm worried about isn’t that
there is some sort of code in the stdlib or in the interpreter just that there
is an authoritative source of what stuff means.


Still, different linters exist and I don't hear people complain about that. I would also be okay if PyCharm's interpretation of the finer points of the type checking syntax was subtly different from mypy's. In fact I would be surprised if they weren't sometimes in disagreement. Heck, PyPy doesn't give *every* Python program the same meaning as CPython, and that's a feature. :-)


Depends on what is meant by "meaning" I suppose. Generally in those linters or
PyPy itself if there is a different *meaningful* result (for instance if
print was defaulting to sys.stderr) then CPython (incl docs) acts as the
authoritative source of what ``print()`` means (in this case writing to
sys.stdout).

I'm also generally OK with deferring possible code/interpreter changes to add
actual type checking until a later point in time. If there's a defined semantics
to what those annotations mean than third parties can experiment and do things
with it and those different things can be looked at adding/incorporating into
Python proper in 3.6 (or 3.7, or whatever).

Honestly I think that probably the things I was worried about is sufficiently
allayed given that it appears I was reading more into the vaguness and the
optionally different interpretations than what was meant and I don't want to
keep harping on it :) As long as there's some single source of what List[str]
or what have you means than I'm pretty OK with it all.

Chris Angelico

unread,
Aug 13, 2014, 8:32:42 PM8/13/14
to Python-Ideas
On Thu, Aug 14, 2014 at 5:44 AM, Guido van Rossum <gu...@python.org> wrote:
> from typing import List, Dict
>
> def word_count(input: List[str]) -> Dict[str, int]:
> result = {} #type: Dict[str, int]
> for line in input:
> for word in line.split():
> result[word] = result.get(word, 0) + 1
> return result

I strongly support the concept of standardized typing information.
There'll be endless bikeshedding on names, though - personally, I
don't like the idea of "from typing import ..." as there's already a
"types" module and I think it'd be confusing. (Also, "mypy" sounds
like someone's toy reimplementation of Python, which it does seem to
be :) but that's not really well named for "type checker using stdlib
annotations".) But I think the idea is excellent, and it deserves
stdlib support.

The cast notation sounds to me like it's what Pike calls a "soft cast"
- it doesn't actually *change* anything (contrast a C or C++ type
cast, where (float)42 is 42.0), it just says to the copmiler/type
checker "this thing is actually now this type". If the naming is clear
on this point, it leaves open the possibility of actual recursive
casting - where casting a List[str] to List[int] is equivalent to
[int(x) for x in lst]. Whether or not that's a feature worth adding
can be decided in the distant future :)

+1 on the broad proposal. +0.5 on defining the notation while leaving
the actual type checking to an external program.

ChrisA

Haoyi Li

unread,
Aug 13, 2014, 8:54:25 PM8/13/14
to Chris Angelico, Python-Ideas
Both solutions have merit, but the idea of some implementations of the type checker having covariance and some contravariance is fairly disturbing.

Why can't we have both? That's the only way to properly type things, since immutable-get-style APIs are always going to be convariant, set-only style APIs (e.g. a function that takes 1 arg and returns None) are going to be contravariant and mutable get-set APIs (like most python collections) should really be invariant.

Łukasz Langa

unread,
Aug 13, 2014, 9:01:40 PM8/13/14
to guido@python.org van Rossum, Python-Ideas
It’s great to see this finally happening!
I did some research on existing optional-typing approaches [1]. What I learned in the process was that linting is the most important use case for optional typing; runtime checks is too little, too late.

That being said, having optional runtime checks available *is* also important. Used in staging environments and during unit testing, this case is able to cover cases obscured by meta-programming. Implementations like “obiwan” and “pytypedecl” show that providing a runtime type checker is absolutely feasible.

The function annotation syntax currently supported in Python 3.4 is not well-suited for typing. This is because users expect to be able to operate on the types they know. This is currently not feasible because:
1. forward references are impossible
2. generics are impossible without custom syntax (which is the reason Mypy’s Dict exists)
3. optional types are clumsy to express (Optional[int] is very verbose for a use case this common)
4. union types are clumsy to express

All those problems are elegantly solved by Google’s pytypedecl via moving type information to a separate file. Because for our use case that would not be an acceptable approach, my intuition would be to:

1. Provide support for generics (understood as an answer to the question: “what does this collection contain?”) in Abstract Base Classes. That would be a PEP in itself.
2. Change the function annotation syntax so that it’s not executed at import time but rather treated as strings. This solves forward references and enables us to…
3. Extend the function annotation syntax with first-class generics support (most languages like "list<str>”)
4. Extend the function annotation syntax with first-class union type support. pytypedecl simply uses “int or None”, which I find very elegant.
5. Speaking of None, possibly further extend the function annotation syntax with first-class optionality support. In the Facebook codebase in Hack we have tens of thousands of optional ints (nevermind other optional types!), this is a case that’s going to be used all the time. Hack uses ?int, that’s the most succinct style you can get. Yes, it’s special but None is a special type, too.

All in all, I believe Mypy has the highest chance of becoming our typing linter, which is great! I just hope we can improve on the syntax, which is currently lacking. Also, reusing our existing ABCs where applicable would be nice. With Mypy’s typing module I feel like we’re going to get a new, orthogonal set of ABCs, which will confuse users to no end. Finally, the runtime type checker would make the ecosystem complete.

This is just the beginning of the open issues I was juggling with and the reason my own try at the PEP was coming up slower than I’d like.

[1] You can find a summary of examples I looked at here: http://lukasz.langa.pl/typehinting/

-- 
Best regards,
Łukasz Langa

WWW: http://lukasz.langa.pl/
Twitter: @llanga
IRC: ambv on #python-dev

On Aug 13, 2014, at 12:44 PM, Guido van Rossum <gu...@python.org> wrote:

[There is no TL;DR other than the subject line. Please read the whole thing before replying. I do have an appendix with some motivations for adding type annotations at the end.]
Yesterday afternoon I had an inspiring conversation with Bob Ippolito (man of many trades, author of simplejson) and Jukka Lehtosalo (author of mypy: http://mypy-lang.org/). Bob gave a talk at EuroPython about what Python can learn from Haskell (and other languages); yesterday he gave the same talk at Dropbox. The talk is online (https://ep2014.europython.eu/en/schedule/sessions/121/) and in broad strokes comes down to three suggestions:

  (a) Python should adopt mypy's syntax for function annotations
  (b) Python's use of mutabe containers by default is wrong
  (c) Python should adopt some kind of Abstract Data Types

Proposals (b) and (c) don't feel particularly actionable (if you disagree please start a new thread, I'd be happy to discuss these further if there's interest) but proposal (a) feels right to me.

So what is mypy?  It is a static type checker for Python written by Jukka for his Ph.D. thesis. The basic idea is that you add type annotations to your program using some custom syntax, and when running your program using the mypy interpreter, type errors will be found during compilation (i.e., before the program starts running).

The clever thing here is that the custom syntax is actually valid Python 3, using (mostly) function annotations: your annotated program will still run with the regular Python 3 interpreter. In the latter case there will be no type checking, and no runtime overhead, except to evaluate the function annotations (which are evaluated at function definition time but don't have any effect when the function is called).

In fact, it is probably more useful to think of mypy as a heavy-duty linter than as a compiler or interpreter; leave the type checking to mypy, and the execution to Python. It is easy to integrate mypy into a continuous integration setup, for example.

To read up on mypy's annotation syntax, please see the mypy-lang.org website. Here's just one complete example, to give a flavor:


  from typing import List, Dict

  def word_count(input: List[str]) -> Dict[str, int]:
      result = {}  #type: Dict[str, int]
      for line in input:
          for word in line.split():
              result[word] = result.get(word, 0) + 1
      return result


Note that the #type: comment is part of the mypy syntax; mypy uses comments to declare types in situations where no syntax is available -- although this particular line could also be written as follows:

    result = Dict[str, int]()

Either way the entire function is syntactically valid Python 3, and a suitable implementation of typing.py (containing class definitions for List and Dict, for example) can be written to make the program run correctly. One is provided as part of the mypy project.

I should add that many of mypy's syntactic choices aren't actually new. The basis of many of its ideas go back at least a decade: I blogged about this topic in 2004 (http://www.artima.com/weblogs/viewpost.jsp?thread=85551 -- see also the two followup posts linked from the top there).

I'll emphasize once more that mypy's type checking happens in a separate pass: no type checking happens at run time (other than what the interpreter already does, like raising TypeError on expressions like 1+"1").

There's a lot to this proposal, but I think it's possible to get a PEP written, accepted and implemented in time for Python 3.5, if people are supportive. I'll go briefly over some of the action items.

(1) A change of direction for function annotations

PEP 3107, which introduced function annotations, is intentional non-committal about how function annotations should be used. It lists a number of use cases, including but not limited to type checking. It also mentions some rejected proposals that would have standardized either a syntax for indicating types and/or a way for multiple frameworks to attach different annotations to the same function. AFAIK in practice there is little use of function annotations in mainstream code, and I propose a conscious change of course here by stating that annotations should be used to indicate types and to propose a standard notation for them.

(We may have to have some backwards compatibility provision to avoid breaking code that currently uses annotations for some other purpose. Fortunately the only issue, at least initially, will be that when running mypy to type check such code it will produce complaints about the annotations; it will not affect how such code is executed by the Python interpreter. Nevertheless, it would be good to deprecate such alternative uses of annotations.)

(2) A specification for what to add to Python 3.5

There needs to be at least a rough consensus on the syntax for annotations, and the syntax must cover a large enough set of use cases to be useful. Mypy is still under development, and some of its features are still evolving (e.g. unions were only added a few weeks ago). It would be possible to argue endlessly about details of the notation, e.g. whether to use 'list' or 'List', what either of those means (is a duck-typed list-like type acceptable?) or how to declare and use type variables, and what to do with functions that have no annotations at all (mypy currently skips those completely).

I am proposing that we adopt whatever mypy uses here, keeping discussion of the details (mostly) out of the PEP. The goal is to make it possible to add type checking annotations to 3rd party modules (and even to the stdlib) while allowing unaltered execution of the program by the (unmodified) Python 3.5 interpreter. The actual type checker will not be integrated with the Python interpreter, and it will not be checked into the CPython repository. The only thing that needs to be added to the stdlib is a copy of mypy's typing.py module. This module defines several dozen new classes (and a few decorators and other helpers) that can be used in expressing argument types. If you want to type-check your code you have to download and install mypy and run it separately.

The curious thing here is that while standardizing a syntax for type annotations, we technically still won't be adopting standard rules for type checking. This is intentional. First of all, fully specifying all the type checking rules would make for a really long and boring PEP (a much better specification would probably be the mypy source code). Second, I think it's fine if the type checking algorithm evolves over time, or if variations emerge. The worst that can happen is that you consider your code correct but mypy disagrees; your code will still run.

That said, I don't want to completely leave out any specification. I want the contents of the typing.py module to be specified in the PEP, so that it can be used with confidence. But whether mypy will complain about your particular form of duck typing doesn't have to be specified by the PEP. Perhaps as mypy evolves it will take options to tell it how to handle certain edge cases. Forks of mypy (or entirely different implementations of type checking based on the same annotation syntax) are also a possibility. Maybe in the distant future a version of Python will take a different stance, once we have more experience with how this works out in practice, but for Python 3.5 I want to restrict the scope of the upheaval.

Appendix -- Why Add Type Annotations?

The argument between proponents of static typing and dynamic typing has been going on for many decades. Neither side is all wrong or all right. Python has traditionally fallen in the camp of extremely dynamic typing, and this has worked well for most users, but there are definitely some areas where adding type annotations would help.

- Editors (IDEs) can benefit from type annotations; they can call out obvious mistakes (like misspelled method names or inapplicable operations) and suggest possible method names. Anyone who has used IntelliJ or Xcode will recognize how powerful these features are, and type annotations will make such features more useful when editing Python source code.

- Linters are an important tool for teams developing software. A linter doesn't replace a unittest, but can find certain types of errors better or quicker. The kind of type checking offered by mypy works much like a linter, and has similar benefits; but it can find problems that are beyond the capabilities of most linters.

- Type annotations are useful for the human reader as well! Take the above word_count() example. How long would it have taken you to figure out the types of the argument and return value without annotations? Currently most people put the types in their docstrings; developing a standard notation for type annotations will reduce the amount of documentation that needs to be written, and running the type checker might find bugs in the documentation, too. Once a standard type annotation syntax is introduced, it should be simple to add support for this notation to documentation generators like Sphinx.

- Refactoring. Bob's talk has a convincing example of how type annotations help in (manually) refactoring code. I also expect that certain automatic refactorings will benefit from type annotations -- imagine a tool like 2to3 (but used for some other transformation) augmented by type annotations, so it will know whether e.g. x.keys() is referring to the keys of a dictionary or not.

- Optimizers. I believe this is actually the least important application, certainly initially. Optimizers like PyPy or Pyston wouldn't be able to fully trust the type annotations, and they are better off using their current strategy of optimizing code based on the types actually observed at run time. But it's certainly feasible to imagine a future optimizer also taking type annotations into account.

--
--Guido "I need a new hobby" van Rossum (python.org/~guido)

Gregory P. Smith

unread,
Aug 13, 2014, 9:10:24 PM8/13/14
to Guido van Rossum, Jukka Lehtosalo, Python-Ideas

First, I am really happy that you are interested in this and that your point (2) of what you want to see done is very limited and acknowledges that it isn't going to specify everything!  Because that isn't possible. :)

Unfortunately I feel that adding syntax like this to the language itself is not useful without enforcement because it that leads to code being written with unintentionally incorrect annotations that winds up deployed in libraries that later become a problem as soon as an actual analysis tool attempts to run over something that uses that unknowingly incorrectly specified code in a place where it cannot be easily updated (like the standard library).

At the summit in Montreal earlier this year Łukasz Langa (cc'd) volunteered to lead writing the PEP on Python type hinting based on the many existing implementations of such things (including mypy, cython, numba and pytypedecl). I believe he has an initial draft he intends to send out soon. I'll let him speak to that.

Looks like Łukasz already responded, I'll stop writing now and go read that. :)

Personal opinion from experience trying: You can't express the depth of types for an interface within the Python language syntax itself (assuming hacks such as specially formatted comments, strings or docstrings do not count). Forward references to things that haven't even been defined yet are common. You often want an ability to specify a duck type interface rather than a specific type.  I think he has those points covered better than I do.

-gps

PS If anyone want to see a run time type checker make code run at half speed, look at the one pytypedecl offers. I'm sure it could be sped up, but run-time checkers in an interpreter are always likely to be slow.

Greg Ewing

unread,
Aug 13, 2014, 9:28:56 PM8/13/14
to python...@python.org
On 08/14/2014 12:32 PM, Chris Angelico wrote:
> I don't like the idea of "from typing import ..." as there's already a
> "types" module and I think it'd be confusing.

Maybe

from __statictyping__ import ...

More explicit, and being a dunder name suggests that it's
something special that linters should ignore if they don't
understand it.

--
Greg

Andrew Barnert

unread,
Aug 13, 2014, 9:30:53 PM8/13/14
to Alex Gaynor, python...@python.org
On Wednesday, August 13, 2014 1:30 PM, Alex Gaynor <alex....@gmail.com> wrote:


>I'm strongly opposed this, for a few reasons.


[...]

>Python's type system isn't very good. It lacks many features of more powerful
>systems such as algebraic data types, interfaces, and parametric polymorphism.
>Despite this, it works pretty well because of Python's dynamic typing. I
>strongly believe that attempting to enforce the existing type system would be a
>real shame.

This is my main concern, but I'd phrase it very differently.


First, Python's type system _is_ powerful, but only because it's dynamic. Duck typing simulates parametric polymorphism perfectly, disjunction types as long as they don't include themselves recursively, algebraic data types in some but not all cases, etc. Simple (Java-style) generics, of the kind that Guido seems to be proposing, are not nearly as flexible. That's the problem.

On the other hand, even though these types only cover a small portion of the space of Python's implicit type system, a lot of useful functions fall within that small portion. As long as you can just leave the rest of the program untyped, and there are no boundary problems, there's no real risk.

On the third hand, what worries me is this:

> Mypy has a cast() operator that you can use to shut it up when you (think you) know the conversion is safe.

Why do we need casts? You shouldn't be trying to enforce static typing in a part of the program whose static type isn't sound. Languages like Java and C++ have no choice; Python does, so why not take advantage of it?

The standard JSON example seems appropriate here. What's the return type of json.loads? In Haskell, you write a pretty trivial JSONThing ADT, and you return a JSONThing that's an Object (which means its value maps String to JSONThing). In Python today, you return a dict, and use it exactly the same as in Haskell, except that you can't verify its soundness at compile time. In Java or C++, it's… what? The sound option is a special JSONThing that has separate getObjectMemberString and getArrayMemberString and getObjectMemberInt, which is incredibly painful to use. A plain old Dict[String, Object] looks simple, but it means you have to downcast all over the place to do anything, making it completely unsound, and still unpleasant. The official Java json.org library gives you a hybrid between the two that manages to be neither sound nor user-friendly. And of course there are libraries for many poor static languages (especially C++) that try to fake duck
typing as far as possible for their JSON objects, which is of course nowhere near as far as Python gets for free.

Andrew Barnert

unread,
Aug 13, 2014, 9:42:40 PM8/13/14
to gu...@python.org, Python-Ideas, Jukka Lehtosalo
On Wednesday, August 13, 2014 12:45 PM, Guido van Rossum <gu...@python.org> wrote:


>  def word_count(input: List[str]) -> Dict[str, int]:
>      result = {}  #type: Dict[str, int]
>      for line in input:
>          for word in line.split():
>              result[word] = result.get(word, 0) + 1
>      return result


I just realized why this bothers me.

This function really, really ought to be taking an Iterable[String] (except that we don't have a String ABC). If you hadn't statically typed it, it would work just fine with, say, a text file—or, for that matter, a binary file. By restricting it to List[str], you've made it a lot less usable, for no visible benefit.

And, while this is less serious, I don't think it should be guaranteeing that the result is a Dict rather than just some kind of Mapping. If you want to change the implementation tomorrow to return some kind of proxy or a tree-based sorted mapping, you can't do so without breaking all the code that uses your function.

And if even Guido, in the motivating example for this feature, is needlessly restricting the usability and future flexibility of a function, I suspect it may be a much bigger problem in practice.


This example also shows exactly what's wrong with simple generics: if this function takes an Iterable[String], it doesn't just return a Mapping[String, int], it returns a Mapping of _the same String type_. If your annotations can't express that, any value that passes through this function loses type information. 

And not being able to tell whether the keys in word_count(f) are str or bytes *even if you know that f was a text file* seems like a pretty major loss.

Guido van Rossum

unread,
Aug 13, 2014, 9:44:10 PM8/13/14
to Haoyi Li, Python-Ideas
On Wed, Aug 13, 2014 at 5:53 PM, Haoyi Li <haoy...@gmail.com> wrote:
Both solutions have merit, but the idea of some implementations of the type checker having covariance and some contravariance is fairly disturbing.

Why can't we have both? That's the only way to properly type things, since immutable-get-style APIs are always going to be convariant, set-only style APIs (e.g. a function that takes 1 arg and returns None) are going to be contravariant and mutable get-set APIs (like most python collections) should really be invariant.
 
That makes sense. Can you put something in the mypy tracker about this? (Or send a pull request. :-)

Juancarlo Añez

unread,
Aug 13, 2014, 9:44:54 PM8/13/14
to Guido van Rossum, Jukka Lehtosalo, Python-Ideas

On Wed, Aug 13, 2014 at 6:41 PM, Guido van Rossum <gu...@python.org> wrote:
Actually, mypy already has a solution. There's a codec (https://github.com/JukkaL/mypy/tree/master/mypy/codec) that you can use which transforms Python-2-with-annotations into vanilla Python 2. It's not an ideal solution, but it can work in cases where you absolutely have to have state of the art Python 3.5 type checking *and* backwards compatibility with Python 2.

It can't be a solution because it's a hack...

Cheers,

--
Juancarlo Añez

Greg Ewing

unread,
Aug 13, 2014, 9:45:04 PM8/13/14
to python...@python.org
On 08/14/2014 01:26 PM, Andrew Barnert wrote:

> In Java or C++, it's… what? The sound option is a special JSONThing that
> has separate getObjectMemberString and getArrayMemberString and
> getObjectMemberInt, which is incredibly painful to use.

That's mainly because Java doesn't let you define your own
types that use convenient syntax such as [] for indexing.

Python doesn't have that problem, so a decent static type
system for Python should let you define a JSONThing class
that's fully type-safe while having a standard mapping
interface.

--
Greg

Łukasz Langa

unread,
Aug 13, 2014, 9:58:09 PM8/13/14
to Andrew Barnert, Jukka Lehtosalo, Python-Ideas
On Aug 13, 2014, at 6:39 PM, Andrew Barnert <abar...@yahoo.com.dmarc.invalid> wrote:

On Wednesday, August 13, 2014 12:45 PM, Guido van Rossum <gu...@python.org> wrote:

  def word_count(input: List[str]) -> Dict[str, int]:
      result = {}  #type: Dict[str, int]
      for line in input:
          for word in line.split():
              result[word] = result.get(word, 0) + 1
      return result

I just realized why this bothers me.

This function really, really ought to be taking an Iterable[String]

You do realize String also happens to be an Iterable[String], right? One of my big dreams about Python is that one day we'll drop support for strings being iterable. Nothing of value would be lost and that would enable us to use isinstance(x, Iterable) and more importantly isinstance(x, Sequence). Funny that this surfaces now, too.

Terry Reedy

unread,
Aug 13, 2014, 10:28:39 PM8/13/14
to python...@python.org
Guido, as requesting, I read your whole post before replying. Please to
the same. This response is both critical and supportive.

On 8/13/2014 3:44 PM, Guido van Rossum wrote:

> Yesterday afternoon I had an inspiring conversation with Bob Ippolito
> (man of many trades, author of simplejson) and Jukka Lehtosalo (author
> of mypy: http://mypy-lang.org/).

My main concern with static typing is that it tends to be
anti-duck-typing, while I consider duck-typing to be a major *feature*
of Python. The example in the page above is "def fib(n: int):". Fib
should get an count (non-negative integer) value, but it need not be an
int, and 'half' the ints do not qualify. Reading the tutorial, I could
not tell if it supports numbers.Number (which should approximate the
domain from above.)

Now consider an extended version (after Lucas).

def fib(n, a, b):
i = 0
while i <= n:
print(i,a)
i += 1
a, b = b, a+b

The only requirement of a, b is that they be addable. Any numbers should
be allowed, as in fib(10, 1, 1+1j), but so should fib(5, '0', '1').
Addable would be approximated from below by Union(Number, str).

> Bob gave a talk at EuroPython about
> what Python can learn from Haskell (and other languages); yesterday he
> gave the same talk at Dropbox. The talk is online
> (https://ep2014.europython.eu/en/schedule/sessions/121/) and in broad
> strokes comes down to three suggestions:
>
> (a) Python should adopt mypy's syntax for function annotations

-+ Syntax with no meaning is a bit strange. On the other hand, syntax
not bound to semantics, or at least not bound to just one meaning is
quite pythonic. '+' has two standard meanings, plus custom meanings
embodied in .__add__ methods.

+ The current semantics of annotations is that they are added to
functions objects as .__annotations__ (for whatever use) *and* used as
part of inspect.signature and included in help(ob) responses. In other
words, annotations are already used in the stdlib.

>>> def f(i:int) -> float: pass

>>> from inspect import signature as sig
>>> str(sig(f))
'(i:int) -> float'
>>> help(f)
Help on function f in module __main__:

f(i:int) -> float

Idle calltips include them also. A appropriately flexible standardized
notation would enhance this usage and many others.

+-+ I see the point of "The goal is to make it possible to add type
checking annotations to 3rd party modules (and even to the stdlib) while
allowing unaltered execution of the program by the (unmodified) Python
3.5 interpreter." On the other hand, "pip install mypytyping" is not a
huge burden. On the third hand, in the stdlib allows use in the stdlib.

> (b) Python's use of mutabe [mutable] containers by default is wrong

The premise of this is partly wrong and partly obsolete. As far as I can
remember, Python *syntax* only use tuples, not lists: "except (ex1,
ex2):", "s % (val1, val2)", etc. The use of lists as the common format
for data interchange between functions has largely been replaced by
iterators. This fact makes Python code much more generic, and
anti-generic static typing more wrong.

In remaining cases, 'wrong' is as much a philosophical opinion as a fact.

> (c) Python should adopt some kind of Abstract Data Types

I would have to look at the talk to know what Jukka means.

> Proposals (b) and (c) don't feel particularly actionable (if you
> disagree please start a new thread, I'd be happy to discuss these
> further if there's interest) but proposal (a) feels right to me.

> So what is mypy? It is a static type checker for Python written by
> Jukka for his Ph.D. thesis. The basic idea is that you add type
> annotations to your program using some custom syntax, and when running
> your program using the mypy interpreter, type errors will be found
> during compilation (i.e., before the program starts running).
>
> The clever thing here is that the custom syntax is actually valid Python
> 3, using (mostly) function annotations: your annotated program will
> still run with the regular Python 3 interpreter. In the latter case
> there will be no type checking, and no runtime overhead, except to
> evaluate the function annotations (which are evaluated at function
> definition time but don't have any effect when the function is called).
>
> In fact, it is probably more useful to think of mypy as a heavy-duty
> linter than as a compiler or interpreter; leave the type checking to
> mypy, and the execution to Python. It is easy to integrate mypy into a
> continuous integration setup, for example.
>
> To read up on mypy's annotation syntax, please see the mypy-lang.org
> <http://mypy-lang.org> website.

I did not see a 'reference' page, but the tutorial comes pretty close.
http://mypy-lang.org/tutorial.html
Beyond that, typings.py would be definitive,
https://github.com/JukkaL/mypy/blob/master/lib-typing/3.2/typing.py

> Here's just one complete example, to give a flavor:

> from typing import List, Dict
>
> def word_count(input: List[str]) -> Dict[str, int]:

The input annotation should be Iterable[str], which mypy does have.

> result = {} #type: Dict[str, int]
> for line in input:
> for word in line.split():
> result[word] = result.get(word, 0) + 1
> return result

The information that input is an Iterable[str] can be used either within
the definition of word_count or at places where word_count is called. A
type aware checker, either in the editor or compiler, could check that
the only uses of 'input' within the function is as input to functions
declared to accept an Iterable or in for statements.

Checking that the input to word_count is specifically Iterable[str] as
opposed to any other Iterable may not be possible. But I think what can
be done, including enhancing help information, might be worth it.

For instance, the parameter to s.join is named 'iterable'. Something
more specific, either 'iterable_of_strings' or 'strings: Iterable[str]'
would be more helpful. Indeed, there have been people posting on python
list who thought that 'iterable' means iterable and that .join would
call str() on each object. I think there are other cases where a
parameter is given a bland under-informative type name instead of a
context-specific semantic name just because there was no type annotation
available. There are places where the opposite problem occurs, too
specific instead of too general, where iterable parameters are still
called 'list'.

> Note that the #type: comment is part of the mypy syntax; mypy uses
> comments to declare types in situations where no syntax is available --
> although this particular line could also be written as follows:
>
> result = Dict[str, int]()
>
> Either way the entire function is syntactically valid Python 3, and a
> suitable implementation of typing.py (containing class definitions for
> List and Dict, for example) can be written to make the program run
> correctly. One is provided as part of the mypy project.
>
> I should add that many of mypy's syntactic choices aren't actually new.
> The basis of many of its ideas go back at least a decade: I blogged
> about this topic in 2004
> (http://www.artima.com/weblogs/viewpost.jsp?thread=85551 -- see also the
> two followup posts linked from the top there).
>
> I'll emphasize once more that mypy's type checking happens in a separate
> pass: no type checking happens at run time (other than what the
> interpreter already does, like raising TypeError on expressions like 1+"1").
>
> There's a lot to this proposal, but I think it's possible to get a PEP
> written, accepted and implemented in time for Python 3.5, if people are
> supportive. I'll go briefly over some of the action items.
>
> *(1) A change of direction for function annotations*
>
> PEP 3107 <http://legacy.python.org/dev/peps/pep-3107/>, which introduced
> function annotations, is intentional non-committal about how function
> annotations should be used. It lists a number of use cases, including
> but not limited to type checking. It also mentions some rejected
> proposals that would have standardized either a syntax for indicating
> types and/or a way for multiple frameworks to attach different
> annotations to the same function. AFAIK in practice there is little use
> of function annotations in mainstream code, and I propose a conscious
> change of course here by stating that annotations should be used to
> indicate types and to propose a standard notation for them.

There are many uses for type information and I think Python should
remain neutral among them.

> (We may have to have some backwards compatibility provision to avoid
> breaking code that currently uses annotations for some other purpose.
> Fortunately the only issue, at least initially, will be that when
> running mypy to type check such code it will produce complaints about
> the annotations; it will not affect how such code is executed by the
> Python interpreter. Nevertheless, it would be good to deprecate such
> alternative uses of annotations.)

I can imagine that people who have used annotations might feel a bit
betrayed by deprecation of a new-in-py3 feature. But I do not think it
necessary to do so. Tools that work with mypy annotations, including
mypy itself, should only assume mypy typing if typing is imported. No
'import typing', no 'Warning: annotation does not follow typing rules."
If 'typing' were a package with a 'mypy' module, the door would be
left open to other 'blessed' typing modules.

> *(2) A specification for what to add to Python 3.5*
>
> There needs to be at least a rough consensus on the syntax for
> annotations, and the syntax must cover a large enough set of use cases
> to be useful. Mypy is still under development, and some of its features
> are still evolving (e.g. unions were only added a few weeks ago). It
> would be possible to argue endlessly about details of the notation, e.g.
> whether to use 'list' or 'List', what either of those means (is a
> duck-typed list-like type acceptable?) or how to declare and use type
> variables, and what to do with functions that have no annotations at all
> (mypy currently skips those completely).
>
> I am proposing that we adopt whatever mypy uses here, keeping discussion
> of the details (mostly) out of the PEP. The goal is to make it possible
> to add type checking annotations to 3rd party modules (and even to the
> stdlib) while allowing unaltered execution of the program by the
> (unmodified) Python 3.5 interpreter. The actual type checker will not be
> integrated with the Python interpreter, and it will not be checked into
> the CPython repository. The only thing that needs to be added to the
> stdlib is a copy of mypy's typing.py module. This module defines several
> dozen new classes (and a few decorators and other helpers) that can be
> used in expressing argument types. If you want to type-check your code
> you have to download and install mypy and run it separately.
>
> The curious thing here is that while standardizing a syntax for type
> annotations, we technically still won't be adopting standard rules for
> type checking.

Fine with me, as that is not the only use. And even for type checking,
there is the choice between accept unless clearly wrong, versus reject
unless clearly right.

> This is intentional. First of all, fully specifying all
> the type checking rules would make for a really long and boring PEP (a
> much better specification would probably be the mypy source code).
> Second, I think it's fine if the type checking algorithm evolves over
> time, or if variations emerge.

As in the choice between accept unless clearly wrong, versus reject
unless clearly right.

> The worst that can happen is that you
> consider your code correct but mypy disagrees; your code will still run.
>
> That said, I don't want to /completely/ leave out any specification. I
> want the contents of the typing.py module to be specified in the PEP, so
> that it can be used with confidence. But whether mypy will complain
> about your particular form of duck typing doesn't have to be specified
> by the PEP. Perhaps as mypy evolves it will take options to tell it how
> to handle certain edge cases. Forks of mypy (or entirely different
> implementations of type checking based on the same annotation syntax)
> are also a possibility. Maybe in the distant future a version of Python
> will take a different stance, once we have more experience with how this
> works out in practice, but for Python 3.5 I want to restrict the scope
> of the upheaval.

As usual, we should review the code before acceptance. It is not clear
to me how much of the tutorial is implemented, as it says "Some of these
features might never see the light of day. " ???

> *Appendix -- Why Add Type Annotations?
> *
> The argument between proponents of static typing and dynamic typing has
> been going on for many decades. Neither side is all wrong or all right.
> Python has traditionally fallen in the camp of extremely dynamic typing,
> and this has worked well for most users, but there are definitely some
> areas where adding type annotations would help.

The answer to why on the mypy page is 'easier to find bugs', 'easier
maintenance'. I find this under-convincing as sufficient justification
in itself. I don't think there are many bugs on the tracker due to
calling functions with the wrong type of object. Logic errors, ignored
corner cases, and system idiosyncrasies are much more of a problem.

Your broader list is more convincing.

> - Editors (IDEs) can benefit from type annotations; they can call out
> obvious mistakes (like misspelled method names or inapplicable
> operations) and suggest possible method names. Anyone who has used
> IntelliJ or Xcode will recognize how powerful these features are, and
> type annotations will make such features more useful when editing Python
> source code.
>
> - Linters are an important tool for teams developing software. A linter
> doesn't replace a unittest, but can find certain types of errors better
> or quicker. The kind of type checking offered by mypy works much like a
> linter, and has similar benefits; but it can find problems that are
> beyond the capabilities of most linters.

Currently, Python linters do not have standard type annotations to work
with. I suspect that programs other than mypy would use them if available.

> - Type annotations are useful for the human reader as well! Take the
> above word_count() example. How long would it have taken you to figure
> out the types of the argument and return value without annotations?

Under a minute, including the fact the the annotation was overly
restrictive. But then I already know that only a mutation method can
require a list.

> Currently most people put the types in their docstrings; developing a
> standard notation for type annotations will reduce the amount of
> documentation that needs to be written, and running the type checker
> might find bugs in the documentation, too. Once a standard type
> annotation syntax is introduced, it should be simple to add support for
> this notation to documentation generators like Sphinx.
>
> - Refactoring. Bob's talk has a convincing example of how type
> annotations help in (manually) refactoring code. I also expect that
> certain automatic refactorings will benefit from type annotations --
> imagine a tool like 2to3 (but used for some other transformation)
> augmented by type annotations, so it will know whether e.g. x.keys() is
> referring to the keys of a dictionary or not.
>
> - Optimizers. I believe this is actually the least important
> application, certainly initially. Optimizers like PyPy or Pyston
> <https://github.com/dropbox/pyston> wouldn't be able to fully trust the
> type annotations, and they are better off using their current strategy
> of optimizing code based on the types actually observed at run time. But
> it's certainly feasible to imagine a future optimizer also taking type
> annotations into account.

--
Terry Jan Reedy

Andrew Barnert

unread,
Aug 13, 2014, 11:02:58 PM8/13/14
to Greg Ewing, python...@python.org
On Aug 13, 2014, at 18:44, Greg Ewing <greg....@canterbury.ac.nz> wrote:

On 08/14/2014 01:26 PM, Andrew Barnert wrote:

In Java or C++, it's… what? The sound option is a special JSONThing that
has separate getObjectMemberString and getArrayMemberString and
getObjectMemberInt, which is incredibly painful to use.

That's mainly because Java doesn't let you define your own
types that use convenient syntax such as [] for indexing.

No it's not, or other languages like C++ (which has operator methods and overloading) wouldn't have the exact same problem, but they do. Look at JsonCpp, for example:


Python doesn't have that problem, so a decent static type
system for Python should let you define a JSONThing class
that's fully type-safe while having a standard mapping
interface.

How?

If you go with a single JSONThing type that represents an object, array, number, bool, string, or null, then it can't have a standard mapping interface, because it also needs to have a standard sequence interface, and they conflict. Likewise for number vs. string. The only fully type-safe interface it can have is as_string, as_number, etc. methods (which of course can only check at runtime, so it's no better than using isinstance from Python, and you're forced to do it for every single access.)

What if you go the other way and have separate JSONObject, JSONArray, etc. types? Then all of those problems go away; you can define an unambiguous __getitem__. But what is its return value? The only possibility is a union of all the various types mentioned above, and such a union type has no interface at all. It's only useful if people subvert the type safety by casting.  (I guess you could argue that returning a union type makes your JSON library type safe, it's only every program that ever uses it for anything that's unsafe. But where does that get you?) The only usable type safe interface is separate get_string, get_number, etc. methods in place of __getitem__.

Or you can merge the two together and have a single JSONThing that has both as methods and, for convenience, combined as_object+get, or even as_object+get+as_str.

Also, look at the mutation interfaces for these libraries. They're only marginally tolerable because all variable have obligatory types that you can overload on, which wouldn't be the case in Python.

The alternative is, of course, to come up with a way to avoid type safety. In Swift, you parse a JSON object into a Cocoa NSDictionary, which is a dynamically-typed heterogeneous collection just like a Python dict. There are C++ libraries with a bunch of types that effectively act like Python dict, list, float, str, etc. and try to magically cast when they come into contact with native types. That's the best solution anyone has to dealing with even a dead-simple algebraic data type like JSON in a static language whose type system isn't powerful enough: to try to fake being a duck typed language.

Carlo Pires

unread,
Aug 13, 2014, 11:07:39 PM8/13/14
to Python-Ideas
I'm very happy to see this happening. "Optional" type checking for python would be a great addition to the language. For large codebases, use of type checking can really help.

I also like the idea of using annotations instead of decorators (or something like that). I'm already using this for python3 [1] in a non intrusive way, by using assert to disable it on production.

[1] https://pypi.python.org/pypi/optypecheck
--
  Carlo Pires

Andrew Barnert

unread,
Aug 13, 2014, 11:15:04 PM8/13/14
to Łukasz Langa, Python-Ideas
On Aug 13, 2014, at 18:56, Łukasz Langa <luk...@langa.pl> wrote:

On Aug 13, 2014, at 6:39 PM, Andrew Barnert <abar...@yahoo.com.dmarc.invalid> wrote:

On Wednesday, August 13, 2014 12:45 PM, Guido van Rossum <gu...@python.org> wrote:

  def word_count(input: List[str]) -> Dict[str, int]:
      result = {}  #type: Dict[str, int]
      for line in input:
          for word in line.split():
              result[word] = result.get(word, 0) + 1
      return result

I just realized why this bothers me.

This function really, really ought to be taking an Iterable[String]

You do realize String also happens to be an Iterable[String], right?

Of course, but that's not a new problem, so I didn't want to bring it up. The fact that the static type checker couldn't reject word_count(f.read()) is annoying, but that's not the fault of the static type checking proposal.

One of my big dreams about Python is that one day we'll drop support for strings being iterable. Nothing of value would be lost and that would enable us to use isinstance(x, Iterable) and more importantly isinstance(x, Sequence). Funny that this surfaces now, too.

IIRC, str doesn't implement Container, and therefore doesn't implement Sequence, because its __contains__ method is substring match instead of containment. So if you really want to treat sequences of strings separately from strings, you can. If only that really _were_ more important than Iterable, but I think the opposite is true.

But anyway, this is probably off topic, so I'll stop here.

Łukasz Langa

unread,
Aug 13, 2014, 11:16:50 PM8/13/14
to Andrew Barnert, Python-Ideas
str and bytes objects respond True to both isinstance(x, Container) and isinstance(x, Sequence).

But you’re right, off topic.

David Mertz

unread,
Aug 13, 2014, 11:29:04 PM8/13/14
to Łukasz Langa, Python-Ideas
A long while back I posted a recipe for using annotations for type checking.  I'm certainly not the first person to do this, and what I did was deliberately simple:


The approach I used was to use per-function decorators to say that a given function should be type checked.  The type system I enforce in that recipe is much less than what mypy allows, but I can't see a real reason that it couldn't be extended to cover exactly the same range of type specifiers.

The advantage I perceive in this approach is that it is purely optional, per module and per function.  As well, it doesn't actually require making ANY change to Python 3.5 to implement it.  Or as a minimal change, an extra decorator could simply be available in functools or elsewhere in the standard library, which implemented the full semantics of mypy.

Now admittedly, this would be type checking, but not *static* type checking.  There may not be an easy way to make a pre-runtime "lint" tool do the checking there.  On the other hand, as a number of posters have noted, there's also no way to enforce, e.g. 'Iterable[String]' either statically.

I'm not the BDFL of course, but I do not really get what advantage there is to the pre-runtime check that can catch a fairly small subset of type constraints rather than check at runtime everything that is available then (as the decorator approach could get you).


_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/



--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.

Guido van Rossum

unread,
Aug 13, 2014, 11:43:05 PM8/13/14
to Andrew Barnert, Python-Ideas
On Wed, Aug 13, 2014 at 6:39 PM, Andrew Barnert <abar...@yahoo.com.dmarc.invalid> wrote:
On Wednesday, August 13, 2014 12:45 PM, Guido van Rossum <gu...@python.org> wrote:

>  def word_count(input: List[str]) -> Dict[str, int]:
>      result = {}  #type: Dict[str, int]
>      for line in input:
>          for word in line.split():
>              result[word] = result.get(word, 0) + 1
>      return result

I just realized why this bothers me.

This function really, really ought to be taking an Iterable[String] (except that we don't have a String ABC). If you hadn't statically typed it, it would work just fine with, say, a text file—or, for that matter, a binary file. By restricting it to List[str], you've made it a lot less usable, for no visible benefit.

Heh. :-) I had wanted to write an additional paragraph explaining that it's easy to change this to use typing.Iterable instead of typing.List, but I forgot to add that.
 
And, while this is less serious, I don't think it should be guaranteeing that the result is a Dict rather than just some kind of Mapping. If you want to change the implementation tomorrow to return some kind of proxy or a tree-based sorted mapping, you can't do so without breaking all the code that uses your function.

Yeah, there's a typing.Mapping for that.
 
And if even Guido, in the motivating example for this feature, is needlessly restricting the usability and future flexibility of a function, I suspect it may be a much bigger problem in practice.

Well, so it was actually semi-intentional. :-)
 
This example also shows exactly what's wrong with simple generics: if this function takes an Iterable[String], it doesn't just return a Mapping[String, int], it returns a Mapping of _the same String type_. If your annotations can't express that, any value that passes through this function loses type information.

In most cases it really doesn't matter though -- some types are better left concrete, especially strings and numbers. If you read the mypy docs you'll find that there are generic types, so that it's possible to define a function as taking an Iterable[T] and returning a Mapping[T, int]. What's not currently possible is expressing additional constraints on T such as that it must be a String. When I last talked to Jukka he explained that he was going to add something for that too (@Jukka: structured types?).
 
And not being able to tell whether the keys in word_count(f) are str or bytes *even if you know that f was a text file* seems like a pretty major loss.

On this point one of us must be confused. Let's assume it's me. :-) Mypy has a few different IO types that can express the difference between text and binary files. I think there's some work that needs to be done (and of course the built-in open() function has a terribly ambiguous return type :-( ), but it should be possible to say that a text file is an Interable[str] and a binary file is an Iterable[bytes]. So together with the structured (?) types it should be possible to specify the signature of word_count() just as you want it. However, in most cases it's overkill, and you wouldn't want to do that for most code.

Also, it probably wouldn't work for more realistic examples -- as soon as you replace the split() method call with something that takes punctuation into account, you're probably going to write it in a way that works only for text strings anyway, and very few people will want or need to write the polymorphic version. (But if they do, mypy has a handy @overload decorator that they can use. :-)

Anyway, I agree it would be good to make sure that some of these more advanced things can actually be spelled before we freeze our commitment to a specific syntax, but let's not assume that just because you can't spell every possible generic use case it's no good.

Jukka Lehtosalo

unread,
Aug 14, 2014, 12:07:25 AM8/14/14
to Andrew Barnert, Python-Ideas
On Wed, Aug 13, 2014 at 6:39 PM, Andrew Barnert <abar...@yahoo.com> wrote:
On Wednesday, August 13, 2014 12:45 PM, Guido van Rossum <gu...@python.org> wrote:


>  def word_count(input: List[str]) -> Dict[str, int]:
>      result = {}  #type: Dict[str, int]
>      for line in input:
>          for word in line.split():
>              result[word] = result.get(word, 0) + 1
>      return result


I just realized why this bothers me.

This function really, really ought to be taking an Iterable[String] (except that we don't have a String ABC). If you hadn't statically typed it, it would work just fine with, say, a text file—or, for that matter, a binary file. By restricting it to List[str], you've made it a lot less usable, for no visible benefit.

And, while this is less serious, I don't think it should be guaranteeing that the result is a Dict rather than just some kind of Mapping. If you want to change the implementation tomorrow to return some kind of proxy or a tree-based sorted mapping, you can't do so without breaking all the code that uses your function.

I see this is a matter of programming style. In a library module, I'd usually use about as general types as feasible (without making them overly complex). However, if we have just a simple utility function that's only used within a single program, declaring everything using abstract types buys you little, IMHO, but may make things much more complicated. You can always refactor the code to use more general types if the need arises. Using simple, concrete types seems to decrease the cognitive load, but that's just my experience.

Also, programmers don't always read documentation/annotations and can abuse the knowledge of the concrete return type of any function (they can figure this out easily by using repr()/type()). In general, as long as dynamically typed programs may call your function, changing the concrete return type of a library function risks breaking code that makes too many assumptions. Thus I'd rather use concrete types for function return types -- but of course everybody is free to not follow this convention.


And if even Guido, in the motivating example for this feature, is needlessly restricting the usability and future flexibility of a function, I suspect it may be a much bigger problem in practice.


This example also shows exactly what's wrong with simple generics: if this function takes an Iterable[String], it doesn't just return a Mapping[String, int], it returns a Mapping of _the same String type_. If your annotations can't express that, any value that passes through this function loses type information. 

If I define a subclass X of str, split() still returns a List[str] rather than List[X], unless I override something, so this wouldn't work with the above example:

>>> class X(str): pass
...
>>> type(X('x y').split()[0])
<class 'str'>


And not being able to tell whether the keys in word_count(f) are str or bytes *even if you know that f was a text file* seems like a pretty major loss.

Mypy considers bytes incompatible with str, and vice versa. The annotation Iterable[str] says that Iterable[bytes] (such as a binary file) would not be a valid argument. Text files and binary files have different types, though the return type of open(...) is not inferred correctly right now. It would be easy to fix this for the most common cases, though.

You could use AnyStr to make the example work with bytes as well:

  def word_count(input: Iterable[AnyStr]) -> Dict[AnyStr, int]:
      result = {}  #type: Dict[AnyStr, int]

      for line in input:
          for word in line.split():
              result[word] = result.get(word, 0) + 1
      return result

Again, if this is just a simple utility function that you use once or twice, I see no reason to spend a lot of effort in coming up with the most general signature. Types are an abstraction and they can't express everything precisely -- there will always be a lot of cases where you can't express the most general type. However, I think that relatively simple types work well enough most of the time, and give the most bang for the buck.

Jukka

Terry Reedy

unread,
Aug 14, 2014, 12:22:37 AM8/14/14
to python...@python.org
On 8/13/2014 5:08 PM, Andrey Vlasovskikh wrote:

> Here are slides from my talk about optional typing in Python, that
> show how Mypy types can be used in both static and dynamic type
> checking
> (http://blog.pirx.ru/media/files/2013/python-optional-typing/),
I tried this on Windows 7in both Firefox and Internet Explorer and I
cannot find any way to advance other than changing the page number on
the url bar.


--
Terry Jan Reedy

Jukka Lehtosalo

unread,
Aug 14, 2014, 12:34:35 AM8/14/14
to Guido van Rossum, Python-Ideas
On Wed, Aug 13, 2014 at 8:41 PM, Guido van Rossum <gu...@python.org> wrote:
On Wed, Aug 13, 2014 at 6:39 PM, Andrew Barnert <abar...@yahoo.com.dmarc.invalid> wrote:
This example also shows exactly what's wrong with simple generics: if this function takes an Iterable[String], it doesn't just return a Mapping[String, int], it returns a Mapping of _the same String type_. If your annotations can't express that, any value that passes through this function loses type information.

In most cases it really doesn't matter though -- some types are better left concrete, especially strings and numbers. If you read the mypy docs you'll find that there are generic types, so that it's possible to define a function as taking an Iterable[T] and returning a Mapping[T, int]. What's not currently possible is expressing additional constraints on T such as that it must be a String. When I last talked to Jukka he explained that he was going to add something for that too (@Jukka: structured types?).

I wrote another message where I touched this. Mypy is likely to support something like this in the future, but I doubt it's usually worth the complexity. If a type signature is very general, at some point it describes the implementation in sufficient detail that you can't modify the code without changing the type. For example, we could plausibly allow anything that just supports split(), but if we change the implementation to use something other than split(), the signature would have to change. If we use more specific types (such as str), we leave us the freedom to modify the implementation within the bounds of the str interface. Standard library functions often only accept concrete str objects, so the moment you start using an abstract string type you lose access to much of the stdlib.

 
And not being able to tell whether the keys in word_count(f) are str or bytes *even if you know that f was a text file* seems like a pretty major loss.

On this point one of us must be confused. Let's assume it's me. :-) Mypy has a few different IO types that can express the difference between text and binary files. I think there's some work that needs to be done (and of course the built-in open() function has a terribly ambiguous return type :-( ), but it should be possible to say that a text file is an Interable[str] and a binary file is an Iterable[bytes]. So together with the structured (?) types it should be possible to specify the signature of word_count() just as you want it. However, in most cases it's overkill, and you wouldn't want to do that for most code.

See my other message where I show that you can do this right now, except for the problem with open().
 

Also, it probably wouldn't work for more realistic examples -- as soon as you replace the split() method call with something that takes punctuation into account, you're probably going to write it in a way that works only for text strings anyway, and very few people will want or need to write the polymorphic version. (But if they do, mypy has a handy @overload decorator that they can use. :-)

Anyway, I agree it would be good to make sure that some of these more advanced things can actually be spelled before we freeze our commitment to a specific syntax, but let's not assume that just because you can't spell every possible generic use case it's no good.

It's always easy to come up with interesting corner cases where a type system would break down, but luckily, these are often almost non-existent in the wild :-) I've learned that examples should be motivated by patterns in existing, 'real' code, as otherwise you'll waste your time on things that happen maybe once a million lines (or maybe only in code that *you* write).
 
Jukka

Guido van Rossum

unread,
Aug 14, 2014, 12:56:47 AM8/14/14
to Łukasz Langa, Python-Ideas
On Wed, Aug 13, 2014 at 6:00 PM, Łukasz Langa <luk...@langa.pl> wrote:
It’s great to see this finally happening!

Yes. :-)
 
I did some research on existing optional-typing approaches [1]. What I learned in the process was that linting is the most important use case for optional typing; runtime checks is too little, too late.

That being said, having optional runtime checks available *is* also important. Used in staging environments and during unit testing, this case is able to cover cases obscured by meta-programming. Implementations like “obiwan” and “pytypedecl” show that providing a runtime type checker is absolutely feasible.

Yes. And the proposal here might well enable such applications (by providing a standard way to spell complex types). But I think it's going to be less important than good support for linting, so that's what I want to focus on first.
 
The function annotation syntax currently supported in Python 3.4 is not well-suited for typing. This is because users expect to be able to operate on the types they know. This is currently not feasible because:
1. forward references are impossible

(Mypy's hack for this is that a string literal can be used as a forward reference.)

2. generics are impossible without custom syntax (which is the reason Mypy’s Dict exists)
3. optional types are clumsy to express (Optional[int] is very verbose for a use case this common)

So define an alias 'oint'. :-)
 
4. union types are clumsy to express

Aliasing can help.
 
All those problems are elegantly solved by Google’s pytypedecl via moving type information to a separate file.

Mypy supports this too using stub files, but I think it is actually a strength that it doesn't require new syntax (although if the idea becomes popular we could certainly add syntax to support those things where mypy currently requires magic comments).

Honestly I'm not sure what to do about mypy vs. pytypedecl. Should they compete, collaborate, converge? Do we need a bake-off or a joint hackathon? Food for thought.
 
Because for our use case that would not be an acceptable approach, my intuition would be to:

1. Provide support for generics (understood as an answer to the question: “what does this collection contain?”) in Abstract Base Classes. That would be a PEP in itself.
2. Change the function annotation syntax so that it’s not executed at import time but rather treated as strings. This solves forward references and enables us to…
3. Extend the function annotation syntax with first-class generics support (most languages like "list<str>”)
4. Extend the function annotation syntax with first-class union type support. pytypedecl simply uses “int or None”, which I find very elegant.
5. Speaking of None, possibly further extend the function annotation syntax with first-class optionality support. In the Facebook codebase in Hack we have tens of thousands of optional ints (nevermind other optional types!), this is a case that’s going to be used all the time. Hack uses ?int, that’s the most succinct style you can get. Yes, it’s special but None is a special type, too.

Hm. I think that selling such (IMO) substantial changes to Python's syntax is going to be much harder than just the idea of a standard typing syntax implemented as a new stdlib module. While mypy's syntax is perhaps not as concise or elegant as would be possible if we were to design the syntax from the ground up, it's actually pretty darn readable, and it is compatible with Python 3.2. It has decent ways to spell generics, forward references, unions and optional types already. And while I want to eventually phase out other uses of function annotations, your change #2 would break all existing packages that use them for other purposes (like Ethan Furman's scription).
 
All in all, I believe Mypy has the highest chance of becoming our typing linter, which is great! I just hope we can improve on the syntax, which is currently lacking. Also, reusing our existing ABCs where applicable would be nice. With Mypy’s typing module I feel like we’re going to get a new, orthogonal set of ABCs, which will confuse users to no end. Finally, the runtime type checker would make the ecosystem complete.

We can discuss these things separately. Language evolution is an exercise in compromise. We may be able to reuse the existing ABCs, and mypy could still support Python 3.2 (or, with the codeck hack, 2.7) by having the typing module export aliases to those ABCs. I won't stop you from implementing a runtime type checker, but I think it should be a separate project.
 
This is just the beginning of the open issues I was juggling with and the reason my own try at the PEP was coming up slower than I’d like.

Hopefully I've motivated you to speed up!
 
[1] You can find a summary of examples I looked at here: http://lukasz.langa.pl/typehinting/
 
--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Aug 14, 2014, 1:12:49 AM8/14/14
to Gregory P. Smith, Python-Ideas
On Wed, Aug 13, 2014 at 6:09 PM, Gregory P. Smith <gr...@krypto.org> wrote:
First, I am really happy that you are interested in this and that your point (2) of what you want to see done is very limited and acknowledges that it isn't going to specify everything!  Because that isn't possible. :)

What a shame. :-)
 
Unfortunately I feel that adding syntax like this to the language itself is not useful without enforcement because it that leads to code being written with unintentionally incorrect annotations that winds up deployed in libraries that later become a problem as soon as an actual analysis tool attempts to run over something that uses that unknowingly incorrectly specified code in a place where it cannot be easily updated (like the standard library).

We could refrain from using type annotations in the stdlib (similar to how we refrain from using Unicode identifiers). Mypy's stubs mechanism makes it possible to ship the type declarations for stdlib modules with mypy instead of baking them into the stdlib.
 
At the summit in Montreal earlier this year Łukasz Langa (cc'd) volunteered to lead writing the PEP on Python type hinting based on the many existing implementations of such things (including mypy, cython, numba and pytypedecl). I believe he has an initial draft he intends to send out soon. I'll let him speak to that.

Mypy has a lot more than an initial draft. Don't be mistaken by its status as "one person's Ph.D. project" -- Jukka has been thinking about this topic for a decade, and mypy works remarkably well already. It also has some very active contributors already.
 
Looks like Łukasz already responded, I'll stop writing now and go read that. :)

Personal opinion from experience trying: You can't express the depth of types for an interface within the Python language syntax itself (assuming hacks such as specially formatted comments, strings or docstrings do not count). Forward references to things that haven't even been defined yet are common. You often want an ability to specify a duck type interface rather than a specific type.  I think he has those points covered better than I do.

I think mypy has solutions for the syntactic issues, and the rest can be addressed by introducing a few more magic helper functions. It's remarkably readable.

Guido van Rossum

unread,
Aug 14, 2014, 1:25:39 AM8/14/14