Type-profiling to aid static type-inference

7 views
Skip to first unread message

Togge

unread,
Aug 9, 2010, 2:09:44 AM8/9/10
to shedskin-discuss
Hi!

A while ago I played with/wrote a little tool that would type-profile
python projects. Specifically, the tool found the run-time type of
arguments given to python functions by inspecting a running python
program.

It just hit me that such a tool could be useful for improving static
type-inference. Maybe I'm wrong (I'm not well-versed in the field),
what do you think?

Or is this not what you want to do with Shed Skin? I recall reading
that you prefer inferring types completely statically and without type
annotations.

And oh, happy to see the 0.5 release!

/ Torgny

Mark Dufour

unread,
Aug 14, 2010, 5:17:05 AM8/14/10
to shedskin...@googlegroups.com
hello torgny,

A while ago I played with/wrote a little tool that would type-profile
python projects. Specifically, the tool found the run-time type of
arguments given to python functions by inspecting a running python
program.

I think we talked about this, or someone else wrote a similar program.
 
It just hit me that such a tool could be useful for improving static
type-inference. Maybe I'm wrong (I'm not well-versed in the field),
what do you think?

yes, this would help dramatically, because it would provide the (iterative) type analysis with a much better starting point.

a related approach is to store analysis results between compilation sessions, so only the first session is very slow, and more importantly, the analysis can scale to increasingly larger versions of a program. i think we could also generate increasingly larger versions automatically, so users would only see a single session.

because my main interest in this is static type inference in itself, the second approach is much more interesting to me. I'm actually quite excited about it, and would have probably implemented it already, if I knew of several interesting Python programs that are just too large for the current type analysis (and suitable for compilation, of course).


thanks!
mark.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

Togge

unread,
Aug 20, 2010, 8:10:33 AM8/20/10
to shedskin-discuss
> a related approach is to store analysis results between compilation sessions
I guess that that such approach would make it possible to load
analysis results from another tool too. Kind of writing the
intermediate representation of the analysis pass to disk. Sound very
interesting!

I wish I had more time and intelligence so that I could help in some
way. I tried to understand the source of shedskin a while ago, but
with out much luck... :)

/ Torgny

Mark Dufour

unread,
Aug 24, 2010, 11:36:59 AM8/24/10
to shedskin...@googlegroups.com
On Fri, Aug 20, 2010 at 2:10 PM, Togge <torgny.a...@gmail.com> wrote:
> a related approach is to store analysis results between compilation sessions
I guess that that such approach would make it possible to load
analysis results from another tool too. Kind of writing the
intermediate representation of the analysis pass to disk. Sound very
interesting!

yes, and I guess that having some kind of standardized format would also mean other programs than shedskin could make use of the information.. :-)

btw, for shedskin it would be more useful to have information about allocation sites, rather than function arguments: which kinds of objects end up as elements or attributes of containers or class instances. that is actually much harder to prove with type inference alone.

for example, for the following statements:

bla = []
mc = MyClass()

I'd like to know beforehand the types of objects that end up in list objects allocated at '[]', and in attributes of class instances allocated at 'MyClass()'.

I wish I had more time and intelligence so that I could help in some
way. I tried to understand the source of shedskin a while ago, but
with out much luck... :)


time is probably the more critical of the two. it just takes many hours to get your head around an unknown codebase, and shedskin is not particularly complex I think.. type inference is a pain to debug, but the rest is relatively straightforward.

in any case, I'd be happy to suggest some low hanging fruit to start out with, and/or explain any part of the code..

thanks,
mark.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

HartsAntler

unread,
Sep 9, 2010, 1:57:18 AM9/9/10
to shedskin-discuss
Why not try to tap into PyPy's flow graph and annotator? The
annotation information from PyPy can be saved into a pickle and loaded
by shedskin.
-Hart

On Aug 24, 8:36 am, Mark Dufour <mark.duf...@gmail.com> wrote:

Mark Dufour

unread,
Sep 9, 2010, 7:29:41 AM9/9/10
to shedskin...@googlegroups.com
On Thu, Sep 9, 2010 at 7:57 AM, HartsAntler <goatm...@gmail.com> wrote:
Why not try to tap into PyPy's flow graph and annotator?  The
annotation information from PyPy can be saved into a pickle and loaded
by shedskin.
-Hart


I will keep that in mind - thanks! perhaps sarvi will be right after all.. :-)


mark.
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

HartsAntler

unread,
Sep 9, 2010, 10:10:17 PM9/9/10
to shedskin-discuss
It would be a big step forward for RPython if shedskin could compile
RPython programs (RPython as PyPy defines it). Currently there are
some incompatibilities between RPython and ShedSkin-restricted-
Python. The main difference is shedskin not supporting: longer mixed
tuples, *args, and not supporting `assert isinstance`.

There is no comprehensive plain english list of what RPython is, here
is a draft.


RPython rule1: All globals and class-level attributes are constants.
. Muteable globals are not recommended by ShedSkin because if
used as an extension module changes to globals will not be reflected
in CPython.
. PyPy considers globals to always be constant, including class-
level attributes.

RPython rule2: Tuples may contain mixed types, but it should be
avoided.
. ShedSkin limits mixed tuples to length two
. PyPy can not use a loop to iterate over mixed tuples
* RPython1.+ should try to solve these problems in both ShedSkin and
PyPy

RPython rule3: Dicts and lists must contain compatible types
(homogeneous).
. ShedSkin and PyPy common rule

RPython rule4: Class attributes are accessed by class name.
. ShedSkin requirement

RPython rule5: No reflection (getattr, hasattr, etc..)
. PyPy allows for limited getattr, hasattr.., but only if the
string is concrete.
. ShedSkin has no support.

RPython rule6: No runtime evaluation.
. Neither ShedSkin or PyPy support eval or exec

RPython rule7: No **keywordargs
. Neither ShedSkin or PyPy support **kw

RPython rule 8: No passing of references to methods (functions are
ok).
. ShedSkin requirement

RPython rule 9: For instances to be of a compatible type, they must
inherit from a common base class.
. ShedSkin and PyPy requirement

RPython rule 10: For compatible instances stored in the same list, but
of different subclasses, to call methods with incompatible signatures
or access attributes unique to the subclass, the type must be asserted
first:
assert isinstance(a,MySubClass)
. PyPy requirement
. works with ShedSkin?

RPython rule 11: The *assert isinstance* statement acts like a cast in
C/C++, turning `SomeObject` into the real subclass instance. This
solves problems like in rule10, and in other cases where an instance
of uncertian type must be passed to a function that expects a certian
type.

RPython rule 12: No overloading, except for __init__ and __del__.
. ShedSkin allows for all overloading except for __iter__ and
__call__
. PyPy requirement
*RPython 1.+, would be nice to lift some of this limitation in PyPy,
how hard is this problem?

RPython rule 13: Base classes should define dummy functions.
. Prevents method `demotion` in PyPy

RPython rule 14: Attribute variables should be of consistent type in
all subclasses. Do not create a subclass that redefines the type of
an attribute.
. Required by PyPy

RPython rule 15: The calling and return signature of a function must
not change. The types of function arguments must always be the same
to each function call. Only `None` can be intermixed. Function
returns must always have the same type.
. Required by PyPy
. Required by ShedSkin?

RPython rule 16: Its not recommended to define subclasses with like-
named functions that have different signatures. Casting by `assert
isinstance` is the work around that allows this rule to be bent.
. Required by PyPy
. Supported by ShedSkin?


On Sep 9, 4:29 am, Mark Dufour <mark.duf...@gmail.com> wrote:

Mark Dufour

unread,
Sep 11, 2010, 7:26:30 AM9/11/10
to shedskin...@googlegroups.com
hi hart,

thanks for the overview of RPython restrictions. it's actually the first time I have a look at them, and I'm happy to see so many similarities with Shedskin!

longer mixed tuples are still on the roadmap, and I see no fundamental problem here. they're not supported yet, because it's almost always easy to work around them, using classes is better style anyway and probably leads to faster C++. I guess I've also been waiting for C++0x to spread, because it has some useful new template shortcuts that might help here.

Shedskin actually supported *args in the early days, and I guess with some work it could be supported again.. but again, using a list is an easy work around (the elements will probably have to be homogeneous anyway). there may be some speed penalty where using C varargs stuff is faster perhaps, but on the whole I'm not sure if it's worth the added complexity.

note that *args and **kwargs are supported when calling builtins and standard library modules, at least to the extent that most of them work as expected. it may be some of the support could be transferred to 'user space', but I would have to look into that.

again, I have this irrational hatred against type declarations, and so assert statements used in this way, but I agree they may be inevitable in some cases. question remains if it's really worth it to support such cases.. :-)

thanks,
mark.


--
You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To post to this group, send email to shedskin...@googlegroups.com.
To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.




--
http://www.youtube.com/watch?v=E6LsfnBmdnk

sarvi

unread,
Sep 13, 2010, 11:19:53 PM9/13/10
to shedskin-discuss
Mark,
I don't mean to push for unification, but I had another thought and
wanted to run it by you.

PyPy is foremost a language development framework.
It is about implementing the python interpreter in RPython, plus
additional hints to assist in JIT generation.

If the Python language implementation in RPython has enough
information to create a python interpreter and do JIT compilation.
I am thinking it should have enough information to generate shedskin C/
C++ code for the same subset of Python as Shedskin.

Basically use the bulk of Shedksin C++ code but use PyPy Language
Framework to implement the Python Compiler that shedksin implements?

Sarvi

In otherwords, can PyPy be the language framework in which Shedskin is
implemented in.
> > shedskin-discu...@googlegroups.com<shedskin-discuss%2Bunsubscrib e...@googlegroups.com>
> > .

Mark Dufour

unread,
Sep 14, 2010, 4:31:45 AM9/14/10
to shedskin...@googlegroups.com
hi sarvi,


In otherwords, can PyPy be the language framework in which Shedskin is
implemented in.

 
I guess theoretically it would be possible.. :-)


sarvi

unread,
Sep 14, 2010, 3:21:17 PM9/14/10
to shedskin-discuss
I am planning to fund some prize money for a Under/Graduate school
project back in India and am looking for ideas.

This means we would able to motivate a team of 2-5 smart young
engineers for about 6 months into something interesting for them but
beneficial for the python community.

One area I am obviously looking at is compiling Python code.

I was thinking the project could be to
1. take your C++ code under shedskin/lib as is
2. Have them implement your type inference engine in the PyPy
framework and create a PyPy backend that generates from c++ code from
your shedskin/lib

What would you think of such an idea.
Estimates? Feasibility?
Do you see any benefits to this work for Shedskin or PyPy or both?

Sarvi

Mark Dufour

unread,
Sep 15, 2010, 3:27:41 PM9/15/10
to shedskin...@googlegroups.com
hi sarvi,

On Tue, Sep 14, 2010 at 9:21 PM, sarvi <sarv...@gmail.com> wrote:
I am planning to fund some prize money for a Under/Graduate school
project back in India and am looking for ideas.

that's great!!

I was thinking the project could be to
  1. take your C++ code under shedskin/lib as is
  2. Have them implement your type inference engine in the PyPy
framework and create a PyPy backend that generates from c++ code from
your shedskin/lib

What would you think of such an idea.
Estimates? Feasibility?

Do you see any benefits to this work for Shedskin or PyPy or both?

again, I'm sorry but no, I think this would be a huge amount of work for no clear benefit.. :P

I'm sure if you ask some (Python) mentor organisations for last year's google summer of code (GSOC), they can give you long lists of ideas that would help their projects move forward. there should be many ideas to improve PyPy, Unladen Swallow, Cython and such. nothing really comes to mind at the moment that would help Shedskin, but I'm sure if you'd specifically like to help improve it, I can probably think of some things..


thanks!
mark. 
--
http://www.youtube.com/watch?v=E6LsfnBmdnk

Reply all
Reply to author
Forward
0 new messages