ENB: understanding mypy

23 views

Skip to first unread message

Edward K. Ream

unread,

Nov 5, 2022, 9:09:52 AM11/5/22

to leo-editor

This Engineering Notebook post contains recent thoughts about how we programmers can understand other people's programs. This topic has been foremost on my mind lately.

I have been studying mypy to prepare for work on mypy issue #12352. I will ask the mypy devs for help only if I get truly stuck.

tl;dr:

1. There is no royal road to learning.

2. I can learn mypy's code in smallish pieces.

False starts

False start 1: From another post:

"I envisage a tool that collects data about programs at runtime, and associates that data with corresponding Leo nodes. Python's pdb debugger (or Leo's SherlockTracer class) is an obvious starting point."

This idea is misguided. The debugger would be very slow.

False start 2: scripts.leo now contains a prototype script that will insert code at the beginning and end of all functions/methods in a given tree.

Again, this idea seems misguided. General traces will be nearly useless.

Inserting tracing/analysis statements into a program is a reasonable thing to do, but there is no need to automate the process. Indeed, the real question is what those inserted statements should do.

Aha 1: Analysis statements should answer (shed light on) specific questions. It's unreasonable to gather data without a plan.

mypy's code should only be incrementally more complex than other programs!

Aha 2: mypy's functions should work regardless of the global mypy state. This is practically the definition of well-designed code!

True, mypy's code implements a complex algorithm, but that code is itself localized. In short, most (all?) of mypy's functions can be understood "locally", provided one understands only the function's arguments, algorithms, and results.

Within mypy, function arguments often imply (or are) a complex environment consisting of symbol tables, type-related objects in various states on analysis, and who knows what else. Lots of study lies ahead!

Aha 3: mypy calculates the "best" types for objects based on annotation and usage.

The unification (Hindley-Milner) algorithm defines what "best" means. This definition might be all that one needs to take away from the recommended textbook, Types and Programming Languages.

Unit tests

Happily, adding a new test (to a .test file) is straightforward. I'll add short (maybe even single-line) tests to see how mypy works. Even a small test will exercise a lot of code. Each unit test must:

- Calculate (from comments in the tests) the expected fails/messages.

- Simulate all of mypy's startup code.

- Completely type-check the code

Summary

Neither a custom debugger nor general trace statements are likely to help understand mypy.

The Hindley-Milner "algorithm" guides the python code by defining what the expected (best) results (types) should be. That's what I'll study first.

Well-designed functions (including mypy's) will work regardless of their context. One only has to understand the arguments and algorithms.

I'll use exploratory tests to study mypy's code in detail. I'll use bespoke traces (and calls to pdb) as needed.