Shape checking welcome

mrahtz

unread,

Jun 12, 2020, 7:46:59 AM6/12/20

to Python shape checkers

Hey everyone!

First off - welcome to the group! There's been scattered interest in shape checking for some time, so in coming all together in one place here rather than in scattered email threads and GitHub issues and Slack channels I'm hoping we can push this through to something suitable for widespread use.

To summarise the current state of shape annotation and checking, there are three categories of things to care about:

Defining the syntax for how code should be annotated with shapes
Runtime shape checkers
Static shape checkers

Syntax

Stephan made a great start with Ideas for array shape typing in Python. A group of us at DeepMind have been working on a followup which goes into more detail which we should be able to share soon once it's been cleaned up.

There's also the syntax that tsalib uses, though only allows annotation of shapes (without specification of e.g. whether something is a tf.Tensor or a np.ndarray, and without any info on data type).

Runtime shape checkers

Some existing options here include ShapeGuard (which is what most folks at DeepMind are using at the moment) and tsanley. We also have an internal prototype at DeepMind that can check annotations like x: tf.Tensor[Batch, Time, 64, 64, 3] which we're still working on.

Static shape checkers

This is where things get interesting. A static shape checker is probably what it's going to take for us to be able to say the problem of shape checking is 'solved'.

One bottleneck here is support for variadic generics in existing static type checkers. Pyre apparently has prototype support for this in its ListVariadic type (see also this PyTorch example) but as far as I know neither pytype or mypy support this yet.

In the meantime, Teddy Liu has developed a toy static checker for his bachelor's thesis. It's written in OCaml, but reckons it should be portable to Python if necessary.

Outside of the Python world, Adam Paszke has made a static checker for Swift for TensorFlow. He's also interested in developing something similar for Python.

Next steps

I think the general direction of next steps should be to continue developing a static checker while simultaneously trying to work out the details of what a full syntax would look like (based on what we find to work well in practice with the static checker).

In particular, I'm wondering whether it would be worth porting Teddy's checker to Python (which I'm assuming more people will be comfortable with than OCaml) or whether we should join Adam in developing something from scratch. Adam, how are things going on your end?

I'll be able to work on this full-time for two weeks from the 22nd as part of a DeepMind hackathon, during which my plan is to finish the draft syntax proposal doc and polish an internal prototype we have for a runtime checker based on that syntax - but I'm also open to other ideas, if there are other higher-leverage things to do.

Cheers!

Matthew

Adam Paszke

unread,

Jun 12, 2020, 12:19:04 PM6/12/20

to mrahtz, Python shape checkers

Hi everyone,

Thanks a lot for starting the list Matthew! It'll be great for all of us to get together.

Since you've asked about the stage of my project, the Swift prototype is working pretty nicely, but I didn't get to porting it to Python (except for a very early prototype in Haskell). However, if there is sufficient interest in such an effort, then I'm happy to reprioritize and I'm pretty sure that I could spend even up to 50% of my time on that.

IMO the biggest bottleneck to porting my tool, or implementing any other one, is a good abstract interpreter for Python that can handle programs that are incomplete, have syntax errors, etc. That's an effort we could definitely share, and in the future we can even prototype multiple different checkers on top of this common infrastructure. In Swift this was easy, because I could write a very simple interpreter for SIL (Swift Intermediate Representation), which was already quite low-level. Mirroring Python's semantics faithfully will be much more difficult as even "trivial" operations like attribute lookups can get extremely complicated.

I also wanted to add that in my experience implementing an interpreter in Python will likely make it more difficult to write something good in the long term, because we'll miss out on all the nice things like ADTs with pattern matching, exhaustiveness checks, etc. Haskell/OCaml make those things a breeze. I do understand that they might make onboarding a little slower, but I wouldn't want to rule them as an option out just yet, as it makes a huge difference.

Also, it's super cool to see that other people are getting to implementing more static checkers too. I'll have to read Teddy's thesis soon to better understand how it relates to what I did.

Best,

Adam

--
You received this message because you are subscribed to the Google Groups "Python shape checkers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-shape-che...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-shape-checkers/a6282b34-ab5d-4d13-a0be-bd39e2759aa1n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Guido van Rossum

unread,

Jun 12, 2020, 12:24:06 PM6/12/20

to Adam Paszke, mrahtz, Python shape checkers

Here are some links to docs with well thought-out ideas for additions to the standard Python (static) type system as described by PEP 484 to support numpy(-ish) array types and shapes:

Here's a doc I've held on to (mostly written by Ivan Levkivskiy) with a list of proposals that made the rounds at PyCon 2019 (and even before).

https://paper.dropbox.com/doc/Type-system-improvements--A1zE~KliBDY3oh4Bf2bUz7C0Ag-HHOkniMG9WcCgS0LzXZAe

Also:

https://paper.dropbox.com/doc/Static-typing-of-Python-numeric-stack-summary--A1yRP8M22qukuVGksfuklZtVAg-6ZQzTkgN6e0oXko8fEWwN

And check out the typing summit schedule:

https://paper.dropbox.com/doc/Typing-Summit-Schedule--A1ylce3sozRREf~6CEi0CkA4Ag-7CZ2iT5PNszAq9RmY4WmN

To view this discussion on the web visit https://groups.google.com/d/msgid/python-shape-checkers/CAA9EW3LCU1U90YPcmBLxDDSZFh_1wNL0snfewQ8Ro6ji1woCCg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--

--Guido van Rossum (python.org/~guido)

Pronouns: he/him (why is my pronoun here?)

Matthew Rahtz

unread,

Jun 15, 2020, 6:37:29 AM6/15/20

to gu...@python.org, Adam Paszke, Sergei Lebedev, Python shape checkers

Thanks for the links, Guido! That first one in particular has some interesting ideas I hadn't seen before. I'll incorporate them as options into the doc we're preparing.

Adam -

IMO the biggest bottleneck to porting my tool, or implementing any other one, is a good abstract interpreter for Python that can handle programs that are incomplete, have syntax errors, etc

Interesting. I'm guessing you're saying this with a view to having good support in e.g. IDEs? Sergei, am I right in thinking you have some experience with this? I guess PyCharm must also have some internal solution to this; I'm not sure how hard it would be to use it as a standalone tool...

I also wanted to add that in my experience implementing an interpreter in Python will likely make it more difficult to write something good in the long term, because we'll miss out on all the nice things like ADTs with pattern matching, exhaustiveness checks, etc.

Also interesting. Yes, let's keep this in mind.

To view this discussion on the web visit https://groups.google.com/d/msgid/python-shape-checkers/CAP7%2BvJJpjxo5-Q-okQgXaA6qcLsarzhCORp5Pm4_RFetjfNcJA%40mail.gmail.com.

proye...@gmail.com

unread,

Jun 15, 2020, 6:39:14 AM6/15/20

to Python shape checkers

Hi everyone,

Thanks once again for bringing attention to this important topic.

I think that we all agree that the main step in this direction is the introduction of variadics, what has been mentioned several times 1, 2, 3.

Variadic support is more mature than it seems. In the case of Pyre, from already one year ago we have support for variadics. The first official proposal that I recall was at last year Python Typing Summit (here). The syntax is aligned with the proposal that Guido has shared. However, iirc the initial syntax relied to much on Concatenate/Expand making it verbose and ambiguous when there are 2 variadics. For that purpose, the current syntax relies on capture groups "[]" for manually specifying the part of the type that correspond to the variadic, and only requires Concatenate for concatenating types and variadics. More about the final syntax can be seen here (here). Although I don't want this to become a PEP, the way the syntax works with the proposed example is:

tf.Tensor[Batch, Time, [64, 64, 3]]

Special cases could be considered to make it more ergonomic when there is only one variadic at the end, in general having an unambiguous syntax is a must.

Regarding maturity, afaik Mark Mendoza had informal conversations with the community regarding the proposed syntax and got positive feedback, and he currently plans to submit a PEP, once the Parameter Specification PEP gets merged (here).

Hence, if we assume that we are not so far from agreeing on a final syntax, then the next question is about having actual support for it. As I mentioned, Pyre already supports it so it could be used for testing ideas and giving developers the opportunity of writing code stubs before other type checkers get support. In that sense, I believe that there are many reasons to think that is not a good idea to create yet another type checker for Python. In the particular case of Deepmind, my humble suggestion would be to contribute to Mypy or Pyre, at least for the part related with the type system. Regarding Teddy's work, afaik Teddy was trying to contribute to Mypy itself so perhaps his checker is more a proof of concept. I hope that he will tell us more about it.

About static shape checkers that rely on abstract interpretation, for sure there is room for them, but it will be better if they rely on the more advanced type specifications that variadics will introduce. After all, variadics at this point will only be useful for some use cases, since for many tensor operations other functionalities are need, for example arithmetic on types, which actually is something that we are currently working on, in case that anyone is interested.

Finally, if there are many teams working in this direction I would suggest to organise some online meetings, inspired in what they do at MLIR (here)

Best,

Alfonso.

PS: I would propose to keep future discussion in Python Typing mailing list. Many people that would be interested in this sort of discussions follow that list.

Matthew Rahtz

unread,

Jun 16, 2020, 5:22:12 AM6/16/20

to proye...@gmail.com, Python shape checkers

Agreed that it would be better to avoid the fragmentation caused by a separate mailing list. I'll move this conversation to typing-sig and we can continue talking there.

--
You received this message because you are subscribed to a topic in the Google Groups "Python shape checkers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-shape-checkers/tczSyUG0p-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-shape-che...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-shape-checkers/40c37aa0-49c9-4e51-87bf-1081e95eda62o%40googlegroups.com.

Reply all

Reply to author

Forward