Python Type-Inference based LINT.. (pylint.py)

David Jeske

unread,

Nov 21, 1999, 3:00:00 AM11/21/99

to pytho...@python.org

Hello,

I'm interested to know who out there is interested in working on a
type-inference based static type checker (i.e. LINT) for Python. I've
read various snippits about static type extensions, and the intent to
put type-inference into Viperi. However, I've not seen much more talk
of either of these.

I've been thinking that even with syntax extensions like the
"Add-Later Static Types" proposal, type-inference would be required to
get any valuable static type-checking out of the majority of (untyped)
Python code. Furthermore, a Type-Inference based static-lint should
allow some static type safety for those who want it, without impacting
any the existing Python environment for those who don't want it.

Scott Hassan and I have started a simple type-inference graph
generator based on the parser module's Abstract Syntax Trees. It is
able to parse basic test files and produce some simple type-inference
graphs. We could use some help from someone more familiar with
both ASTs and the Python Interpreter Internals.

You can download what we have so far by by going to:

http://www.chat.net/~jeske/Projects/PyLint/download/

The source file includes our theory of operation, implementation
documentation, and of course, the code. :)

--
David Jeske (N9LCA) + http://www.chat.net/~jeske/ + je...@egroups.net
eGroups Core Server Engineering

Martijn Faassen

unread,

Nov 23, 1999, 3:00:00 AM11/23/99

to

David Jeske <je...@egroups.net> wrote:
> I'm interested to know who out there is interested in working on a
> type-inference based static type checker (i.e. LINT) for Python. I've
> read various snippits about static type extensions, and the intent to
> put type-inference into Viperi. However, I've not seen much more talk
> of either of these.

I'm interested to watch the work and throw in the occasional comment, but
my resources are already overstretched so I don't promise anything. :)

See also the types-SIG. Currently dead, but it might be revivable.

> I've been thinking that even with syntax extensions like the
> "Add-Later Static Types" proposal,

Note that this doesn't need to be a syntax extension; check out some of
my Swallow posting with dejanews for examples of how it could work in
current Python. Not that I'm anywhere near any implementation. :)

> type-inference would be required to
> get any valuable static type-checking out of the majority of (untyped)
> Python code. Furthermore, a Type-Inference based static-lint should
> allow some static type safety for those who want it, without impacting
> any the existing Python environment for those who don't want it.

It sounds interesting. I myself would take the Python subset approach
(i.e. Swallow) to make sure no runtime dynamics exist that can mess up
any type checking. Seems to me the easier approach but John Skaller for
instance is going the whole program analysis route and he seems to be
optimistic about it, so I must be wrong. :)

> Scott Hassan and I have started a simple type-inference graph
> generator based on the parser module's Abstract Syntax Trees. It is
> able to parse basic test files and produce some simple type-inference
> graphs. We could use some help from someone more familiar with
> both ASTs and the Python Interpreter Internals.

That's not me. :)

> You can download what we have so far by by going to:

> http://www.chat.net/~jeske/Projects/PyLint/download/

> The source file includes our theory of operation, implementation
> documentation, and of course, the code. :)

I'll take a look anyway.

Regards,

Martijn
--
History of the 20th Century: WW1, WW2, WW3?
No, WWW -- Could we be going in the right direction?

Greg Ewing

unread,

Nov 24, 1999, 3:00:00 AM11/24/99

to

Martijn Faassen wrote:
>
> Note that this doesn't need to be a syntax extension; check out some of
> my Swallow posting with dejanews for examples of how it could work in
> current Python.

I think I'd prefer it if it *were* a syntax extension.
The suggestions I've seen so far for non-syntax-extending
type declarations strike me as rather ugly.

That's a minor point, though -- the main thing is to
get some sort of type checker to work! The syntax can
be thrashed out later.

> I myself would take the Python subset approach
> (i.e. Swallow) to make sure no runtime dynamics exist that can mess up
> any type checking. Seems to me the easier approach

Me, too. Also, I tend to think that explicit type
declarations are a valuable way for the programmer to
state the *intent* of the code in a way that both
humans and compilers can process. In my experience,
type errors are found sooner and reported in a more
easily understood way when explicit type declarations
are used.

I've tried it both ways when using Haskell-like
languages, and I find that the error messages I get
when I don't explicitly declare types tend to be of
the form

Couldn't unify <some-big-hairy-complicated-type-
expression> with <some-other-big-hairy-complicated-
type-expression-that-looks-vaguely-similar-although-
there-must-be-some-subtle-difference-if-only-I-can-
find-it>.

After getting a few of those I quickly learned to
provide type declarations for ALL my functions...

Greg

Martijn Faassen

unread,

Nov 26, 1999, 3:00:00 AM11/26/99

to

Greg Ewing <greg....@compaq.com> wrote:
> Martijn Faassen wrote:
>>
>> Note that this doesn't need to be a syntax extension; check out some of
>> my Swallow posting with dejanews for examples of how it could work in
>> current Python.

> I think I'd prefer it if it *were* a syntax extension.
> The suggestions I've seen so far for non-syntax-extending
> type declarations strike me as rather ugly.

Oh, agreed -- it's just that it's potentially easier to whip up something
using basic Python, first. Not that much easier on the scale of things, of
course.

Imagine, for instance, a list of integers. A dictionary with string keys and
values of some class. How do you write that down in a proposed syntax
extension? It's not hard to whip up something with basic Python lists and
dictionaries.

> That's a minor point, though -- the main thing is to
> get some sort of type checker to work! The syntax can
> be thrashed out later.

Agreed completely.

>> I myself would take the Python subset approach
>> (i.e. Swallow) to make sure no runtime dynamics exist that can mess up
>> any type checking. Seems to me the easier approach

> Me, too. Also, I tend to think that explicit type
> declarations are a valuable way for the programmer to
> state the *intent* of the code in a way that both
> humans and compilers can process. In my experience,
> type errors are found sooner and reported in a more
> easily understood way when explicit type declarations
> are used.

> I've tried it both ways when using Haskell-like
> languages, and I find that the error messages I get
> when I don't explicitly declare types tend to be of
> the form

> Couldn't unify <some-big-hairy-complicated-type-
> expression> with <some-other-big-hairy-complicated-
> type-expression-that-looks-vaguely-similar-although-
> there-must-be-some-subtle-difference-if-only-I-can-
> find-it>.

> After getting a few of those I quickly learned to
> provide type declarations for ALL my functions...

Sounds like we'd need some way to avoid this type of problem with
any 'optional-type-annotated Python'. Of course the Swallow idea is to
*fully* type annotate everything, and then use this to build fast
extension libraries written in (a subset of) Python. This is just to make
things easier, though -- types would be useful in plain Python as well,
if they were optional. I think. It's hard to say, actually, because if it
*has* to generate error messages like the kind you mentioned, optional
type annotations may not even be worth it..

skaller

unread,

Dec 1, 1999, 3:00:00 AM12/1/99

to

Martijn Faassen wrote:
>It's hard to say, actually, because if it
> *has* to generate error messages like the kind you mentioned, optional
> type annotations may not even be worth it..

I think they're worth it ... the main problem
is syntax. It isn't clear what syntax to use to specify
an annotation, and it isn't clear how to name the types.
The 'natural' notation:

x : int = 1

can be used in function parameter lists, but not so easily for
variables (since ':' clashes with it's other use as a suite introducer).
It's also not clear that:

x : PyInstance = SomeClass()

is useful, the class of x is more interesting.

--
John Skaller, mailto:ska...@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
voice: 61-2-9660-0850

Skip Montanaro

unread,

Dec 2, 1999, 3:00:00 AM12/2/99

to Greg Ewing

>> It isn't clear what syntax to use to specify an annotation

def frobnicate(x as int,
y as [string],
z as {(int,int):SomeClass}) as SomeOtherClass:

Looks interesting. I suggest two small changes:

1. Use the names from the types module to save having to declare a bunch
of new keywords (which would probably not pass Guido's scrutiny):

from types import *
...
def frobnicate(x as IntType,
y as [StringType],
z as {(IntType,IntType):SomeClass}) as SomeOtherClass:

If that's not acceptable, then perhaps enclosing the type specifiers
in quotes would do the trick:

def frobnicate(x as "int",
y as "[string]",
z as "{(int,int):SomeClass}") as "SomeOtherClass":

2. I'd add a few meta types that correspond to the API behaviors (e.g.,
"number", "sequence", "mapping", ...). That would allow the type
inferencer to make assumptions about the behavior of an argument
without the programmer enumerating all the possible objects that
could fill the bill. These fake types could be added to the types
module and set equal to None. All you'd really be interested in is
the names anyway (I think).

def frobnicate(x as IntType,
y as SequenceType,
z as {(IntType,IntType):SomeClass}) as SomeOtherClass:

Skip Montanaro | http://www.mojam.com/
sk...@mojam.com | http://www.musi-cal.com/
847-971-7098 | Python: Programming the way Guido indented...

Fred L. Drake, Jr.

unread,

Dec 2, 1999, 3:00:00 AM12/2/99

to Skip Montanaro

I presume you are aware of the Types-SIG? It appears to be
defunct. If anyone is interested in seriously persuing this topic,
you may want to grab hold of the SIG before it's deactivated.

-Fred

--
Fred L. Drake, Jr. <fdr...@acm.org>
Corporation for National Research Initiatives

Jeremy Hylton

unread,

Dec 3, 1999, 3:00:00 AM12/3/99

to Skip Montanaro

>>>>> "SM" == Skip Montanaro <sk...@mojam.com> writes:

SM> 2. I'd add a few meta types that correspond to the API
SM> behaviors (e.g., "number", "sequence", "mapping", ...). That
SM> would allow the type inferencer to make assumptions about the
SM> behavior of an argument without the programmer enumerating all
SM> the possible objects that could fill the bill. These fake types
SM> could be added to the types module and set equal to None. All
SM> you'd really be interested in is the names anyway (I think).

Here's a variation that I have been thinking about; not sure if it
makes sense. Instead of having types that correspond to sequence,
mapping, etc. use a set of micro-types that apply to specific API
calls. So a type might have __getitem__ and __len__. It's not
unusual to have a user-defined class that only implements a few of the
methods you'd expect a sequence to have.

Jeremy

skaller

unread,

Dec 4, 1999, 3:00:00 AM12/4/99

to

Jeremy Hylton wrote:

> Here's a variation that I have been thinking about; not sure if it
> makes sense. Instead of having types that correspond to sequence,
> mapping, etc. use a set of micro-types that apply to specific API
> calls. So a type might have __getitem__ and __len__. It's not
> unusual to have a user-defined class that only implements a few of the
> methods you'd expect a sequence to have.

This suggestion makes sense at one level: what you describe
seems close to 'signature matching', which is the basis of
structure typing in some languages.

One problem I have with this, is that the optimisations
that I perceive don't come from such high level dynamic
safety checks on user defined classes, but on knowing
that something is an 'int' so I can generate code that
doesn't do any dynamic lookup of an __add__ method,
but just generates the equiavelant of, say

x = y + 42;

in C. So it is actually the basic types like int,
float, string, list, tuple, that need typing
most, from an optimisation perspective: at best,
requiring an object have an '__len__' method
would allow it to be cached (lookedup once),
but it still has to be called .. by interpreting
some python code. This is so expensive anyhow,
that the caching is probably irrelevant.
But maybe not -- if the method itself is compiled.
Hmmm.

David Jeske

unread,

Dec 5, 1999, 3:00:00 AM12/5/99

to pytho...@python.org

> Jeremy Hylton wrote:
>
> > Here's a variation that I have been thinking about; not sure if it
> > makes sense. Instead of having types that correspond to sequence,
> > mapping, etc. use a set of micro-types that apply to specific API
> > calls. So a type might have __getitem__ and __len__. It's not
> > unusual to have a user-defined class that only implements a few of the
> > methods you'd expect a sequence to have.

This is a very interesting issue. On the one hand, it definetly makes
sense to let the code implicitly define the "micro-types" because that
is in fact what is actually required for the code to run correctly.

However, I'm not sure that it's incredibly useful to allow code based
on such fine grained types to static check correctly. Here is a
(somewhat long) example:

There is another goal of pylint.py, and that is to output a
ctags-esque list of what functions and variables are of what types.

(see "Theory of Operation: Stage 2" in
http://www.chat.net/~jeske/Projects/PyLint/download/pylint-19991121.py)

In this case, it seems far more relevant to upgrade to the proper type
than to provide the micro-type. For example, for the function below,
"A" seems more human readable than "B":

def sumListElements(a_list):
total = 0
for an_element in a_list:
total = total + an_element
return total

-- We then extract and output one of the following type signatures:

A) sumListElements( SequenceType of NumberType ) -> NumberType
B) sumListElements( MicroType{
__getitem__(NumberType) -> MicroType{__add__},
__len__() -> NumberType
} ) -> MicroType{__add__}

NOTE: I've left out the __coerce__, and the functional type signature
for __add__ just to make the type signature smaller. However, I hope
you get the idea.

Now that we have a picture in our heads of the "micro-type" signature
vs. the "proper type" signature of the same function.

-- Here is the starter question:

How relevant is it to do the static checking based on B instead of A?

Could you pass an object to sumListElements which only implemented
"__getitem__" and "__len__"? Yes, you could. However, it seems
terribly fragile to allow this kind of behavior to go unchecked. When
someone is coding inside of sumListElements in the future, it seems
reasonable for them to make the assumption that they are working with
a whole-type such as a Sequence (i.e. list). They might take a look at
the function and decide that they want to fix a bug which exists:

The old version of the function assumes that the elements which are
added are going to be numbers. This version allows there to be
strings, or other addable elements:

def sumListElements(a_list):
total = a_list[0]
for an_element in a_list[1:]:
total = total + an_element
return total

This changes the "proper type" signature to:

sumListElements( SequenceType of AddableType ) -> AddableType

However, in doing so, it also requires that the "a_list" argument
include __getslice__, another part of SequenceType. Making the
implicit type signature closer to:

sumListElements( MicroType{
__getitem__(NumberType) -> MicroType{__add__},
__len__() -> NumberType,
__getslice__(NumberType,NumberType) ->
MicroType{__getitem__(NumberType) -> MicroType{__add__}
} ) -> MicroType{__add__}

If you were coding the above function, I would gander that you would
consider this a reasonable change to have made. In the "proper type"
signature, it only became more general. No existing code would
break. However, if code was allowed to rely on the "micro-type"
signature and pass in objects which only implemented __getitem__ and
__len__, that code would now be broken.

The good news is that the static checker would catch the problem
anyhow. The bad news is that IMHO it seems fragile.

-- Here is the final question:

Does it seem more proper to:

I. Do all static checking based on the micro-types, and require
the function writer to define the type constraints if they want
larger granularity?

def sumListElements(a_list):
'pylint a_list (SequenceType,)' ## <- this is the syntax pylint.py
## supports currently for type
## declarations
total = a_list[0]
for an_element in a_list[1:]:
total = total + an_element
return total

II. Always "upgrade" types to proper-types during static checking.
If someone wants to have a fine-grained type which just includes
__getitem__ and __len__, then force them to at least declare
this as a proper type, like "SimpleSequenceType" and then
show the signature as:

sumListElements( SimpleSequenceType of AddableType ) -> AddableType

skaller

unread,

Dec 12, 1999, 3:00:00 AM12/12/99

to

David Jeske wrote:

> Does it seem more proper to:
>
> I. Do all static checking based on the micro-types, and require
> the function writer to define the type constraints if they want
> larger granularity?
>
> def sumListElements(a_list):
> 'pylint a_list (SequenceType,)' ## <- this is the syntax pylint.py
> ## supports currently for type
> ## declarations

Hmmm. A specialised docstring ..

>
> II. Always "upgrade" types to proper-types during static checking.
> If someone wants to have a fine-grained type which just includes
> __getitem__ and __len__, then force them to at least declare
> this as a proper type, like "SimpleSequenceType" and then
> show the signature as:
>
> sumListElements( SimpleSequenceType of AddableType ) -> AddableType

I think it depends on what you want to do. First, in the ISO C++
Standard, the same issue arose for algoithms that didn't need the
full functionality of some iterator kind, but the specification
requires the full kind all the same, because there is no
compact wording for anything else (and it allows implementors
more freedom).

In functional programming languages, signatures tend to be used.
[I.e. mico types].

In Viper, the issue will arise -- but when we're talking
about calling class methods, the overheads are so large
optimisation isn't relevant. I'm interested in knowing
whether something is an integer or not, or whether
it is precisely a Tuple or not: if a class emulating
a sequence is used, it's irrelevant because optimisation
is impossible [by type inference] in this case.

OTOH knowing _exactly_ which class is passed
to a function in a particular call is important,
since it allows the method calls to be inlined.